During a genetics lecture in college, someone asked, “Do humans have the most DNA?”. I thought it was an inane question. I thought wrong. Our professor chuckled, “Not by a long shot. Actually, a Japanese flower has a genome that is 50 times larger than ours”.
The whole class, which was prepared to hear a resounding “yes”, cementing human glory, stared at her in disbelief. How and why would a plant have more DNA than humans, far more evolved and complex mammals? Is there any relationship between the amount of DNA an organism has and its complexity? In other words, does size matter?
The short answer is: yes, it does matter. But also no, it doesn’t.
Expectations vs. reality
One would expect that complexity requires more DNA to be generated. A larger genome, the term for the entire library of DNA in an organism, would be better equipped to support complex life. The evolution from microscopic life, like bacteria and archaea, to larger and more multiplex life, like humans, requires more information and therefore more DNA.
This is true.
Eukaryotes, organisms with a nucleus to house its genetic material (DNA), do have larger genomes compared to prokaryotes (organisms without a nucleus), such as bacteria. E.coli, every microbiologist’s best friend, has only 4 million base pairs. The fly, Drosophila Melanogaster, on the other hand, has 140 million base pairs.
However, this logic doesn’t hold when one compares the genomes of eukaryotes.
Consider humans and the marbled lungfish, placing second in the largest genome contest. Humans have 4 billion base pairs worth of DNA. The marbled lungfish has 130 billion base pairs.
Yet, humans are clearly more complex, with sophisticated physiological systems, and modes of communication, movement, etc. Why then does a fish have 40 times as much DNA?
In 1971, C.A. Thomas Jr., a geneticist, dubbed this problem the C-value paradox. The C-value is the amount of DNA in a haploid nucleus (measured in picograms), like sperm cells. Regular, autosomal cells have two sets of DNA—one maternal and one paternal. These diploid cells have two chromosomes that work together. In that case, an autosomal human cell has 6 billion base pairs worth of DNA.
C.A. Thomas, along with other geneticists, noticed an incongruity between genome sizes, even amongst closely related species. Frogs and salamanders, both of which fall under amphibians, show a 120-fold range difference in their genome sizes, favoring the latter.
The bet amongst scientists at the time the Human Genome Project was coming to a close was that humans would have somewhere between 30,000 and 100,000 genes. When in 2001, the preliminary results showed that the reality was between 20,000 and 30,000, everyone was left surprised.
For comparison, Trichomonas vaginalis, a unicellular parasite, is responsible for approximately 180 million urogenital tract infections, has approximately 60,000 protein-coding genes.
A single-celled parasite had beat humans.
Why does this paradox exist?
All DNA isn’t equal.
The entirety of the genome doesn’t hold information crucial for survival. A genome is made of protein-coding genes, non-coding regulatory DNA, and non-functional DNA.
Genes are the stars of the genome. They are the segments of DNA that code for parts of the cellular machinery, primarily proteins, and RNA. Every gene isn’t “on” at all times, i.e., all proteins don’t function in a cell all the time. To activate a gene only when the cell needs the resultant protein, certain DNA segments flank the gene, acting as a safety valve. These are the non-coding regulatory parts of DNA—promoters, terminators and enhancers.
The last category is non-functional DNA. As the name suggests, these segments of DNA don’t help the organism play the ‘survival of the fittest’ game. Susume Ohno, a geneticist with an impressive handlebar mustache, called it ‘junk DNA’. In eukaryotes, this supposedly useless DNA takes up a lot of space. Half the human genome is junk, while the marbled lungfish only needs 1/36th of its humongous genome.
What is Junk DNA?
Junk DNA, as mentioned before, is DNA that doesn’t hold any indispensable information for the organism’s survival and is a result of mutations. Some mutations are monumental, such as whole-genome duplication mutation, where the entire genome gets doubled.
Others are smaller mutations of only a few base pairs that accumulate over time (like insertion and deletion mutations, called indel mutations). Then there is the classic example of ‘selfish DNA’: the transposons also known as transposable elements (TEs) or jumping genes. TEs are little segments of DNA, a few thousand nucleotides long, that can multiply and proceed to insert themselves into other parts of the DNA. These can cause some of the most belligerent increases in genome size.
Mini- and microsatellites are short 1- to 5-base pair-long sequences of DNA repeated multiple times. They are called tandemly repetitive sequences. They are scattered across the DNA of both prokaryotes and eukaryotes. Because they are repetitive in nature, these locations are hotspots for mutations and especially the recombination of DNA.
Cells have several levels of protection against mutations. These mutations stop at the level of individual organisms if they occur in autosomal cells. They get “fixed”, permanently etched, if these mutations occur in gametes and persist. Many of these mutations are deleterious and disadvantageous for organisms.
Whether junk DNA has any role to play is debatable. In 2012, the Encyclopedia of DNA Elements, ENCODE project published papers effectively stating that junk DNA was no longer junk. Elizabeth Pennisi for Science called it the ‘eulogy of junk DNA’.
The ENCODE project showed that these so-called junk sites subtly influence how genes are controlled, switching a gene on or off, or enhancing its functions. They might also play a role in determining the structure of the nucleus, called the nucleoskeleton. This nucleoskeleton might not have any information in the classical sense of genes, but it does affect how this information is accessed in the cell. Researching the nucleoskeleton is an exciting field of cell biology.
In a genome dominated by functional DNA, the probability of a mutation occurring in an undesirable location and manner is higher. The mutational load, the probability of mutations occurring, is therefore also high. For prokaryotes, a high gene density (number of genes in the total amount of DNA) is acceptable, as they multiply very fast and produce large numbers. For eukaryotes, gene density is lower. Having “buffer” DNA (aka junk DNA) where a mutation won’t hurt the overall integrity of the genome is therefore beneficial.
The recent debate stands around whether this junk DNA should be called “junk” at all. Scientists like Sean R. Eddy and Ford Doolittle have pointed out flaws in attributing every little piece of DNA with a function. It brings up the debate of what is important in the genome and what isn’t. In his paper critiquing ENCODE’s conclusions, Doolittle writes, “In the end, of course, there is no experimentally ascertainable truth of these definitional matters other than the truth that many of the most heated arguments in biology are not about facts at all, but rather about the words that we use to describe what we think the facts might be.
However, that the debate is in the end about the meaning of words does not mean that there are not crucial differences in our understanding of the evolutionary process hidden beneath the rhetoric.”
So, what determines complexity?
If it isn’t the size of the genome, nor the number of genes, and we don’t exactly know the extent of junk DNA’s usefulness, then how are some organisms more complex than others? The answer to this lies in how organisms control and regulate their DNA.
Gene regulation is a complex interplay of many factors that lead to the specificity of when and which proteins (and RNA) need to be synthesized and how much. Eukaryotes have more mechanisms to fine-tune how they make proteins than prokaryotes. The amount of protein, when to produce it, and in which cell can change the end outcome of a cell. This is most notably seen during embryonic development, where the same gene can lead to different outcomes depending on the location and time it is secreted.
Prokaryotic genes are straightforward. They have a stretch of DNA that directly gets read and assembled into a protein product. Eukaryotes have interruptions within their genes called introns. Introns are short sequences of DNA within a gene that don’t hold any information. They are simply meaningless strings of letters within a sentence. Something like this, “This is sdjf;i asheri f;ioskjcf ahf;ah an article about kdsjf lhghkjfhhnf how complex DNA is.” The parts of the sentence that actually make sense are called exons. Eukaryotic cells remove the introns when they are reading the DNA (transcription). This is called splicing.
The cool thing about splicing is that there are different ways to section the DNA. This is called alternative splicing. Splicing out different parts of a gene will lead to different proteins altogether. Thus, one gene can yield multiple different proteins. Take the above sentence. I can splice it in two ways and it will mean two different things:
- This is an article about how complex DNA is.
- This is an article about DNA.
Scientists have found that more complex organisms have more ways to play around with their genes, so even though a small nematode might have more genes than a fly, the fly is still more complex.
Then there are the non-coding regulatory bits of DNA. These are like switches. When the right protein binds to it, the machinery is activated and that gene will get made into a protein (or not, depending on the protein). How and when these switches get flicked on (a highly complex and fascinating phenomenon called signal transduction) determine what will happen to a cell.
Epigenetics also controls how genes are “expressed”. Epi- means “on top” or “additional”. Therefore, epigenetics can be literally translated to ‘additional to genetics/DNA”. This is exactly what it is. They are chemical changes that occur on top of DNA that affect the switches (see above) of genes. Epigenetics, however, is a much wider topic than this article can cover. There are exciting discussions happening about whether epigenetic changes can be inherited. A great deal of recent evidence argues that it can be inherited, contrary to previously held beliefs in the genetics community.
Eukaryotes have larger genomes than prokaryotes (bacteria), but more complex eukaryotes do not have larger genomes than less complex organisms. This is called the C-value paradox, where the C-value means the amount of DNA in a haploid cell. This is because there is part of the DNA that does not have any informational function, called junk DNA (still debatable). Sometimes, this junk DNA litters an organism’s genome, making it much larger than it needs to be. More complex organisms achieve that level of complexity by regulating their genes better. They do this by alternative splicing, signal transduction and epigenetic mechanisms (amongst many more). This means that their genes have more functions, thus allowing for more complex expression and form.