Friday, 18 March 2011

Another creationist doesn't understand science. Who knew?!

Nephilimfree, Youtube creationist extraordinaire and posterboy for the giant headphone lobby, has posted a new video entitled "Overlapping and Embedded Genes". Unfortunately for him, I am a lover of any and all things genetic. Also unfortunately for him, he doesn't have a clue what he's talking about. Fortunately for me, this provides me an opportunity to tear his video to shreds.

A little background may be necessary. Nephilimfree (who I will henceforth refer to as "Nephy", as I am an avid believer of making infuriatingly inane things tolerable by giving them cute names) made this video in response to a conversation he had with Sofiarune. He does not mention it, but I was also involved, as I am Sofiarune's go-to guy for molecular biology. I guess you can say I have a personal interest in this video, but I'd pick apart Nephy's claims in any event since they are, as usual, stupid on a Goats On Fire level.

First, the video. Be warned, it is rather long, running at just over 14 minutes. If you don't have the time (or the patience) to watch the whole thing, I'll give you the tl;dr (...or is that tl;dw...) version below. Go grab some coffee, because this post is gonna be a long one.

Nephy's argument is basically this: (i) the existence of overlapping genes is an obstacle to evolution because a single mutation will be deleterious to multiple genes and (ii) these gene pairs could not have possibly evolved and are thus evidence that they have been created. As we will see below, both of these points are patently false, and they demonstrate the extent to which Nephy is ignorant of both evolution and genetics.

Nephy starts the video off by mentioning his discussion with Sofiarune, and his insistence that genes overlap and that this is somehow proof that evolution isn't true. He states that "being the evolutionist that she is, she said 'No, no, no. Genes are read linear and they are not embedded'." Nephy is only telling a half truth here. In their original conversation, he presented "overlapping genes" in a very different manner than he does in this video, as the figure below demonstrates:

So Nephy broached the subject with a cryptic sentence about overlapping sequences "causing entropy" to "the code". Understandably, Sofia asked for clarification. In response, he talked about overlapping sequences in shotgun sequencing. This is entirely different. Overlapping sequences in shotgun sequencing and overlapping genes in the genome are not the same thing - one is a bioinformatic technique used to align sequence data in the proper order and the other is the idea that sequences can have multiple open reading frames (ORFs) coding for different gene products. He followed this up with a link to an article [not shown] which further obfuscated what he had meant. It talked about how, in the early days of molecular biology, there was debate over how the genetic code was read; were sequences read in one long linear manner, or were all three possible reading frames of a sequence read at once, resulting in overlapping sequences? It was determined early on that the correct answer was the former - codons were read one at a time in a linear fashion. In this context, Sofia was entirely justified in saying that genes did not overlap. Nephy seemed to have gotten all these concepts confused, but in his video, he presents it as if he gave Sofia a concrete definition of "overlapping genes". Leave it to a creationist to be vague, only to misrepresent your response.

Nevertheless, Nephy now has a concrete definition: overlapping genes are multiple coding ORFs contained within a sequence. Such overlapping genes do exist, primarily in RNA viruses. A study done by Chirico et al1 (which I will go into some detail below) found that 75%  of the 2000 or so known species of virus have some extent of gene overlap. This does not mean that it is a widespread phenomenon as Nephy seems to believe. He points to an article entitled Mammalian Overlapping Genes: The Comparative Perspective2 and is seemingly impressed by the numbers. It shows a total of 774 total overlapping genes in the human genome, and this might be impressive to Nephy, but it isn't to anyone who actually took the time to read the paper. This 774 is out of the 34,604 annotated gene sequences posted to the NCBI human genome assembly (build 33). That's a whopping 2.2% of genes.  So overlapping genes in humans are more the exception than they are the rule. This is the same for pretty much all eukaryotic organisms; overlapping gene sequences do exist, but they are pretty rare. They are mostly seen in viruses (indeed, overlapping sequences were first discovered in he late 1970s by researchers working with the phage φX174). Nephy talks about overlapping genes as if they are as numerous as the grains of sand on the beach, but they are actually quite infrequent. 

But Nephy isn't only concerned with the existence of overlapping genes. He claims that overlapping genes are a problem to evolutionists. He thinks this is because a mutation in the overlapping sequence would be deleterious to both genes and evolution could never favour such a situation. This is demonstrably false, and the evolution of overlapping genetic sequences can, and has, been explained. A paper by Rancurel et al.3 (2009) did just that. According to the authors, there are two characteristics of overlapping genes that alleviate evolutionary constraints on them: (i) overlapping proteins are full of amino acids with a high level of codon degeneracy and (ii) the regions of proteins encoded by overlapping sequences have a tendency towards structural disorder. It is worth going into some detail about these.

(i) To understand what the first point means, you will need to understand what is meant by "codon degeneracy". There are 64 different codons which can be constructed from the four nucleotides, but only 20 main amino acids which comprise any polypeptide. This means that there will be some amino acids that are encoded by multiple codons. Arginine, for example, has 6 different codons - CGU, CGC, CGA, CGG, AGA, and AGG. You'll notice that these codons are pretty similar to each other, and that is pretty handy for the cell. If, for instance, a mutation changes a CGU codon into CGC, CGG or CGA, it will still encode for arginine. The sequence has changed, but the output of the sequence remains the same. The mutation will have no effect on the protein sequence. This is what is meant by codon degeneracy. Amino acids with codon degeneracy will be able to tolerate some degree of mutation. This is important to overlapping genes, because the proteins produced by overlapping genes are high in amino acids that have codon degeneracy. What this means is that mutations to the sequence can be tolerated and won't have a deleterious effect. This is quite the opposite of what Nephy claims. He feels that a mutation will be doubly deleterious to overlapping genes, and he would be right if it weren't for the high frequency of codon degeneracy found in these proteins. In many cases, a mutation will have no effect at all on either protein in the pair.

ii) Structural disorder describes the extent to which the 3D structure of a protein is defined. Those proteins with a high degree of order have a strictly defined 3D secondary structure, while those with a high degree of structural disorder do not. They tend to rapidly change from one structural form to another - their secondary structure is not strictly defined. Since it is the amino acid sequence of a polypeptide that largely determines it's 3D shape and thus function, those with a high degree of structural disorder can tolerate some degree of mutation. Mutations will not affect the protein's 3D shape much because it doesn't have a definite shape to begin with. As the authors put it, "[d]isordered proteins are generally subject to less structural constraint than ordered ones". It should not be surprising, then, that the regions of proteins which are encoded by overlapping sequences tend to be structurally disordered. Those parts do not have a rigid secondary structure, so mutations will have less of a deleterious effect on these overlapping regions. Again, this is antithetical to Nephy's claim that mutations in overlapping genes are a death sentence for a cell.

As an aside, this point about structural disorder raises an interesting theological question: if overlapping genes were created by God as Nephy believes, then why did God decide to create proteins which have regions of structural disorder? If he created these proteins for a particular reason, then would he not have designed them all to have a definite 3D structure that belies their function? Did God just get lazy and figure "Eh, I'll just make some of these proteins structurally ambiguous. No one's gonna notice"? From a design perspective, it doesn't make sense, especially when the designer is perfect and all powerful. Why would he even design overlapping genes to begin with? Wouldn't he just stick to the good old "one gene = one gene product" plan? It's much simpler. It's a hallmark of poor design when things are more complicated than they need to be. How can a perfect God have an imperfect design, anyway?

So Rancurel and co. have thoroughly shown Nephy's point on mutation to be bunk. But they also give his claim that overlapping genes could not have evolved a thrashing. In fact, they do this in the very first paragraph of their paper! They state:
"Among several mechanisms, they [overlapping gene pairs] can be created by a process called "overprinting", in which a DNA sequence originally coding only one protein undergoes a genetic modification leading to the expression of a second reading frame in addition to the first one...The resulting overlap encodes and ancestral "overprinted" protein region and a protein region created de novo (i.e., not by duplication) called an "overprinting" or "novel" region" [See figure below, click to embiggen].
In other words, overlapping gene pairs can be explained easily by a natural process. Two genes exist with reading frames shifted relative to one another, with one upstream from the other. The loss of a stop codon in the upstream gene results in this gene being extended into the region of the second gene where it ends at a stop codon. The result, then, is two genes which share a region of sequence. This has an important evolutionary consequence. The novel region is free to take on novel cellular functions. For the reasons stated above, the novel region is under fewer constraints. In fact, the creation of proteins de novo in this manner has likely played a very important part in viral evolution4. In many viruses, the novel protein in overlapping protein pairs are virulence factors or encode enzymes that allow the virus to escape host defence mechanisms. De novo creation of novel proteins by overlapping genes represents another mechanism by which novel "information" is added to the genome, something creationists love to deny is possible. 

But not only can we explain how overlapping proteins pairs have evolved but we also have some good ideas why they would evolve. Chirico et al give four possible reasons why overlapping gene sequences may have evolved in viruses.

  1.  Mutation rates: given the high mutation rates seen in viruses, it makes sense that there is pressure on viruses to keep their genomes short. Longer genomes will accrue more mutations than shorter ones. Overlapping genes are a way for a virus to expand its repertoire of genes without extending its genome.
  2. Capsid size: some viruses have capsids of limited size. Increasing the size of these capsids to accommodate increased genome sizes is quite the undertaking and is likely to have a fitness cost. Thus it would be advantageous for the virus to have genes which overlap, since this takes up less space within the capsid. 
  3. Gene length: larger genomes tend to have less overlap than smaller genomes. This might be because there is more room in larger genomes for the genes. Viruses may have overlapping genes simply because they have small genomes and cannot fit all the genes linearly.
  4. Expression regulation: it is possible that overlapping genes evolved so that the genes in a gene pair can be regulated together.

The authors studied the sequence data from 62 different virus families to see which of the four possibilities best matched. They determined the likely cause for the evolution of overlapping genes in viruses was the constraint caused by capsid size. Nonetheless, it is entirely possible that any of the other three may have occurred in certain cases.

The reason I bring this up is because science has at least made an attempt to explain why overlapping genes exist at all. Nephy and his creationist brethren are content to claim that God created overlapping genes but they don't say a word about why he would have done so. The naturalistic explanation offered by evolution not only explains how but also why. The explanatory power of evolution is massive, while creationism offers little more than just-so stories and ad hoc justifications.

But despite all this, Nephy talks for 9 minutes about how overlapping genes are evolutionarily impossible, while saying absurd things like "Geneticists today are shying away from the use of the word 'gene' because it really doesn't describe what we observe in the genome of life. The preferred term is 'sequences' because it is difficult to say what code ends where and starts where in some cases". As someone with a degree in molecular genetics, this took me by surprise. No one sent me that memo! This statement makes little sense; genes and sequences are not equivalent, and it's actually pretty easy to determine where one sequence starts and one sequence ends. We've even been able to develop computer software that can predict such things in a matter of milliseconds. Given that Nephy has little scientific background and is definitely not part of the geneticist community, I'd like to know just where he got the idea that the term "gene" was passé. It's still very much a useful term and is used constantly in the scientific literature. 

He finishes off his video with a five minute snippet of a talk given by Hubert Yockey. I find it amusing that Nephy introduces this clip by saying "And now a few select moments from a lecture by a geneticist", followed immediately by a screen declaring "Biophysicist Hubert Yockey". Nephy, a biophysicist and a geneticist are not the same thing. Not even close. And Yockey isn't even a biophysicist to begin with; he's a physicist. He did some work on information theory and how it applies to biology, but this does not make him a biophysicist. I know these are big words for you, but do try to understand the distinction between them before you toss them around again. I won't actually go through the claims made by Yockey in the video (to be honest, I didn't even watch that part) because really, a physicist's claims about biology are about as meaningful as a butcher's claims about nuclear physics.

So once again, Nephilimfree has demonstrated that he has no idea what he is talking about. Overlapping gene sequences are NOT evidence of Divine creation - they make no sense if one assumes they were created, and they make perfect sense when interpreted in the light of evolution. We have a perfectly naturalistic explanation of their origin which conforms to experimental observation. Furthermore, mutations within overlapping genes are not always a bad thing, and have likely contributed to the evolution of pathogenicity in many species of virus. His claim that mutations in overlapping genes are always deleterious is patently nonsense. In fact, at one point in the video he claims that "all genetic mutations cause degretory[sic] effects to the genome's code", a claim that literally thousands of examples can disprove. He says that he has told evolutionists this for a long time and they "refuse to listen". There's a simple reason for that, Nephy - it's because you're flat out wrong.

Consider this a challenge, Nephy. Show me how what I have written is incorrect. Show me why the authors I have quoted are wrong - critique their work. Give me evidence taken from peer-reviewed scientific literature that backs up your claims, and not just idle speculation on your part. Give me a thorough refutation, thoroughly sourced. Because until you do your claims will be little more than verbal diarrhea.

1. Nicola Chirico, Alberto Vianelli and Robert Belshaw. Why genes overlap in viruses. Proceedings of The Royal Society Biological Sciences . 2010 . 277: 3809-3817

2. Vamsi Veeramachaneni et al. Mammalian overlapping genes: the comperative perspective . Genome Research . 2004 . 14:280-286

3. Corinne Rancurel et al. Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation . Journal of Virology . 2009 . 83(20): 10719-10736

4. F. Li and S. W. Ding . Virus counterdefence: diverse strageties for evading the RNA-silencing immunity. 2006 . Annual Review of Microbiology . 60:503-531


Eugene Gateley said...

How do you explain the USP6/ZNF232 gene overlapping in humans but not in chimps or gorillas? Or how about the CHRNE/MINK not overlapping in humans as it does in other placental mammals? I have already done an extensive search of the literature.

C.W.G.K said...

Hi Eugene,

I spent some time this morning looking into these genes, and this is what I have found:

From what I can tell, the USP6 and ZNF232 genes do not overlap in humans. Both are found in the same region of Chromosome 17, but are seperated by about 5200 basepairs. USP6 spans basepairs 5,031,687 through 5,078,324; while ZNF232 sits further upstream along the chromosome, from 5,009,031 to 5,026,397. They're close, but they don't overlap. Any regions that did overlap would likely be regulatory regions, which would only indicate that perhaps the two genes are regulated together. (Interestingly, there does not seem to be much known about ZNF232, and I could only find two papers that even mentioned it).

The situtation is the same in chimps. USP6 is located downstream of ZNF232, and theyre both located on Chromosome 17. I couldnt find the percise locations in the chimp geneome for these but they don't seem to overlap either (which you mentioned).

I've made an image indating the positions of both genes in humans and in chimps to illustrate that there is no overlap in either species: This image was created using NCBI's gene database, accessed this morning.

Concerning CHRNE and MINK; evidence indicates that the genes DO overlap in humans. Work done in 2002 by Kusumi et al showed that the 3' UTR of each gene overlaps. The team also discovered that there was no overlap between these genes in mice, or in pigs, while there is overlap in whales, cows, and rabbits. However, the extent of the overlap in these groups is different, and the authors speculate that the overlapping of the genes evolved independently in each case.

Nelson said...

While much of the science is beyond my capacity right now, the personality of Evan Phillips is not.
Evan, aka Nephyboy, NephilimFree, Sporty, and finally sugar britches, is beyond any hope of 'redemption.'
His idea that he, is the most intelligent and wise person on the face of the earth, and that the rest of the world, is not able to understand the meaning of any scientific paper or discipline is the mark of a narcissist and a true sign of one who could easily snap at the thought of being wrong.
Nephyboy, can not be shown, he has to show everyone that only he has a complete understanding of any scientific discipline. He thinks that he is a geneticist, biologist, biophysicist, nuclear physicist, linguist, geologist, paleontologist, and any other discipline that can be imagined. The reason for this is, that he spent 2 years researching evilution on the internet using as his resource Google. He pretends to 'school' any scientist in their respective field. Please note, any attention given to him only feeds his delusions of grandeur and his overly large ego. For future reference, anything you say to him will be determined to be wrong, even though you are the scientist. He is like feeding the bear at a national park, do it once and the bear thinks that he is entitled to be fed again.

C.W.G.K said...

You've hit upon a point that often troubles me, Nelson. You're right in comparing Neph and his ilk to a hungry bear, and therein lies the problem: people like Neph are spreading lies and distorting science to support their irrational beliefs, and this purposeful misinformation needs to be corrected; but at the same time, the very act of addressing his claims gives him reason to continue his nonsense. It's a Catch-22, and I dont know if there's any real way around it.

Eugene Gateley said...

Thank you for looking at this for me. I do appreciate your time on this so far. There are however two overlapping protein coding transcripts in the human genome for these two genes. Those transcripts are ZNF232-001 which is a reverse strand at 17: 5,008,930 - 5,026,397 and USP6-202 which is a forward strand at 17: 5,019,733 - 5,078,285
This shows an overlapping region of 6665 base pairs including 4 exons of USP6 being within the ZNF232-001 transcript in the overlapping region of the two genes in Human.
Here is the region in detail from the Ensembl genome browser:
And here is the same region in Chimp:
With the Chimp homologue for Human USP6 (which I have highlighted in light green) being Q865R0_PANTR.
As you have already mentioned, these two genes do not overlap in Chimp. But, they actually do overlap in Human. Hence my question.

This data is confirmed by the Havana genebuild and UniGene and is also listed on the NCBI MapViewer here:

So, it would appear that these two genes actually do overlap in Human and they do so in opposite directions since one is a forward strand and the other is a reverse strand.

But, they do not overlap in Chimp.

C.W.G.K said...

Hi Eugene,

You're right; there does appear to be some overlap between the USP6 and ZNF232 genes in humans which does not occur in other primates. However, the data from Ensembl shows:
i) The region of USP6 that overlaps ZNF232 is contained entirely within Intron 4 of ZNF232
ii) There are 4 predicted transcripts of USP6, 3 of which have no overlap with ZNF232
iii) The overlapped region occurs in the 5' UTR of both genes.

These data, then, indicate that the region of overlap is not translated, and thus not expressed in the final protein of either USP6 nor ZNF232. This does not pose a barrier to the evolution of such a system, since any mutation or chromosomal rearrangement that resulted in the overlap of UTRs would effectively be neutral.

As to why the genes overlap in humans and not in other primates I can only speculate. Perhaps a chromosomal deletion occured in the human lineage after it diverged from chimps, bringing the two genes into closer proximity. It is interesting to note that research by Paulding et al. in 2003 has shown that USP6 itself is a hominid specific gene, and arose from the chromosomal fusion of two ancestral genes, USP32 and TBC1D3. The region which contains these genes has also been found in a chromosomal translocation linked to Asperger Syndrome (Tentler et al. 2003), so it would seem that genetic rearrangements are not uncommon in this part of the chromosome.