A gene is a unit of inheritance. It is a sequence of DNA that can make a protein or RNA, and is flanked by regulatory regions at either end.
As far as four-letter words go, a gene is harmless, but unlike any other four-letter words, it has sparked a great deal of confusion and debate among the global scientific community. Since the term was coined in 1909, the concept of a gene has been perpetually remodeled and updated with every new genetic advancement.
A unit of inheritance
Genes began as a vague factor of inheritance, according to Gregor Mendel, the famous Father of Genetics. This mysterious factor was responsible for Mendel’s pea plants’ differing appearances. There was some thing that passed on a trait from parents to offspring.
Mendel didn’t know what this factor physically was and neither did botanist Wilhelm Johanssen when he replaced the nebulous term factor with ‘gene’ in 1909, after the rediscovery of Mendel’s work. A gene now meant a unit of inheritance. A trait was passed from parents to offspring through genes. Simple!
Well, not quite. As life sciences increasingly moved to the molecular and atomic levels, we developed a more clear image of a “gene” . In the span of 50 years, the idea of a gene was updated several times as life sciences matured through the work of biochemists and molecular biologists. The gene went from an indistinct unit of heredity to a particular point on a chromosome to a stretch of DNA that codes for a single protein.
One Gene, One Protein:
The “one gene, one enzyme” idea later changed to “one gene, one protein”, and seemed like a scientific jackpot. As a concept, it is simple and elegant, providing a satisfying statement for all life. It made life easier for everyone—scientists, teachers and students. Initially proposed by a British medical doctor, Sir Archibald Garrod, the “one gene, one protein” hypothesis slowly solidified with Crick’s ‘The Central Dogma’ and the description of what a gene looks like, by the 1970s. The gene had taken on a molecular physical definition.
A gene is a piece of DNA that codes for something (a protein). DNA, Deoxyribonucleic acid, is made up of patterns of nucleotides, the individual building blocks of DNA. There are four different nucleotides represented as four different letters – A for Adenosine, T for Thymine, C for Cytosine, and G for Guanine. These four “letters” of DNA pair up with each other (since DNA has two strands), such that A pairs with T and C pairs with G. DNA is made of a long sequence of these letters that look something like this:
Genes are long sequences of these letters flanked by certain ‘regulatory’ sequences. These regulatory sequences tell the cell where a gene starts and where it ends; essentially, it regulates how the genes work.
The cell makes proteins in a two-step process called transcription and translation. In translation, the cell reads the gene and makes an mRNA copy of the DNA. This mRNA copy then goes to a machine called a ribosome. The ribosome reads the instructions in the mRNA and produces a protein. This is translation.
The Split Gene
“All approaches at a higher level are suspect until confirmed at the molecular level,” Francis Crick famously said. The above picture of a gene is a simple continuous string of letters that gets read into a protein. This holds true for bacteria, but not for more complex life forms, such as hippos, snails or humans. Eukaryotes, the cells with a nucleus in them, don’t just have long sentences of information on how to make a protein. It turns out that there is a whole lot of gibberish added in between.
This gibberish came to be called introns, while the meaningful protein-coding parts of the gene came to be known as exons. Eukaryotic cells used molecular protein scissors to cut introns out of the mRNA so that the ribosome would only read the useful protein-making information. This process of snipping out the junk is called splicing.
This new molecular definition of a gene showed a sequence of DNA flanked by regulatory sequences that, for eukaryotic cells, included little meaningless interruptions between the actual protein-coding bits. The “one gene, one protein” concept still fits in this structure.
Genes, Genes, Everywhere:
After splicing and an understanding of split genes, molecular biologists began combing through the genes of different animals to study. What they found was either exciting (for science itself) and headache-inducing (for students of genetics).
As it turned out, these split genes could splice themselves in different combinations to make many different proteins. “Normal” splicing would cut all the introns out, glue the exons back together and voila, the final mRNA was complete. This is not the case in alternative splicing. Splicing machinery can remove and rearrange exons from one gene to make two or more different proteins. Consider this sentence:
“This ice cream is delicious” is a gene.
With exons and introns, this gene would look like this:
Thikjhghs ickjhdfe crejfhgkhme is delickjhsrfksrjjhioujfghks.
Splicing the gene, or the sentence, in this case, would give us “This ice cream is delicious.”
However, alternative splicing might give us all these sentences:
This is delicious. This is ice cream. Ice cream is delicious. This ice is cream. This delicious cream is ice.
This is one gene, but it can produce many different proteins. This is a more common phenomenon than one might think. In fact, it happens all the time.
There are also instances where a protein is made from two different mRNA strands from two different genes glued together. This is called trans-splicing. These two different genes are flanked by their own regulatory sequences, but by themselves, they don’t make a whole protein. So, there are half-genes that can make one protein (or more).
Additionally, there are genes within genes, genes lying within introns of other larger genes, and genes that overlap with one another. There are genes that jump from one location in the DNA to another (transposons), and pseudogenes that were once genes but are now useless. This begs the question, where does a gene start and where does it end? A gene can no longer be considered ‘a sequence of DNA that makes a protein flanked by regulatory sequences.’ Genes aren’t always in one place, nor are they even a single specific entity.
Even so, we’re still working with the assumption that a gene goes on to create a protein. In 2005, the FANTOM Consortium found that only 1% to 2% of the mRNAs go on to make proteins out of the 63% of the genome transcribed. Some of these transcripts code for a world of RNA that play their own unique parts in cells.
RNAs are single-stranded siblings of DNA. RNAs are popularly thought of as only being involved in making proteins, but they can control far more than that. A study published in 2005 implied that RNA is involved in inheritance in the plant Arabidopsis thaliana for information that wasn’t necessarily in the genome. This has since been recorded in other animals, such as mice. Therefore, a gene isn’t always the unit of inheritance. There are also RNAs that act as enzymes, and RNAs that are still perplexing researchers. With this new knowledge, genes code for both proteins and RNAs.
So, what is a gene?
The HUGO Gene Nomenclature Committee (HGNC) defines a gene as “a DNA segment that contributes to phenotype/function. In the absence of demonstrated function, a gene may be characterized by sequence, transcription or homology”. The Sequence Ontology Consortium defines a gene as a “locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions.”
Both definitions consider the weird gymnastics that a gene does within the genome. For scientists, though, using the word gene can be a land mine of assumptions. A gene is such a wimbly-wombly, catch-all molecular concept that scientists find it necessary to add qualifiers like protein-coding or replace the word with another, such as mRNA transcript or locus.
A hundred and eleven years after the ‘gene’ came into existence and 150 years since Mendel first pondered the unit of inheritance, both our notions of inheritance and its quantifiable unit have changed. In the future, don’t be surprised if there are more upheavals awaiting this mighty four-letter word.