DNA Structure

Both DNA and RNA are made up of the basic components called nucleotides. Each nucleotide is made up of 3 smaller parts. There is the nucleic acid often called the base, the ribose sugar and then the phosphate group.

The first part is the base which is made up of the 5 basic nucleic acids with Adenine, Guanine, Cytosine, Thymine and Uracil. There are a few attributes of nucleic acids for us to cover. The first thing about the 5 nucleic acids is they are made of Nitrogenous bases. All other structures we see in biology are made from carbon structures. The nucleic acids are made up of nitrogen. That makes them unique from other chemical structures. The next thing you will notice is some of these have 2 rings and the others have a single ring structure. The Adenine and Guanine have 2 rings and we call these the Purines. The Cytosine, Thymine and Uracil all have a single ring. We call them the Pyrimidines. There are a total of 5 nucleic acids but only 4 get used in DNA and 4 get used in RNA. Adenine, Guanine and Cytosine are used in both DNA and RNA. The major difference is that Thymine is used in DNA and Uracil replaces it in the RNA. If you look at the Thymine vs Uracil, they are actually very similar structures. The Thymine has an extra Methane group. This is the main difference between Thymine in DNA and Uracil in RNA.

One major difference I want to point out will be important later when we talk about base editing. Base Editing uses an enzyme called a Deaminase. This enzyme plucks off the Amino group of a nucleic acid. An amino group is an NH grouping. The Adenine only differs from Guanine by a lone NH2 group (amino group) at the top. If you pluck that off with a deaminase, you can facilitate the transition to a Guanine. The same applies to the Cytosine amino group that can be removed to transition to a Thymine. This is a concept I wanted to touch on in the structure section as we cover Base Editing later.

The next part of the Nucleotide will be the Ribose sugar. The ribose sugar comes with 5 carbons in its structure. The carbon actually looks like a Big Dipper formation. It is held into a pentagon shape by a lone oxygen at the top center. When dealing with a carbon structure like ribose, we number the carbons using the Prime designation which is represented with an '. So the 1 prime carbon (1') is the first carbon on the far right that the nucleic acid binds to. The 2' carbon is where we determine Ribose from Deoxyribose. The ribose has the hydroxyl (OH) group which makes it ribose. When the OH group is replaced with just a hydrogen, that is the Deoxyribose. The Deoxyribose is used in DNA as that OH group would interfere with the bonding of the base pairs in DNA. The last part of the Nucleotide is the Phosphate group.

The phosphate group is actually a charged molecule that wants to bind to another molecule because it carries a negative charge. The phosphate group binds to the 5' carbon. This leaves only the 3' and 2' carbons not used. Then it bonds again to the 3' carbon of the previous nucleotide. This creates a linker of phosphates from the 5' of one base to the 3' of the next base.

This basic formation is the basic repetitive group of DNA or RNA. The phosphate group binds to the 5' carbon of the ribose sugar then it will bind again to the 3' carbon of the previous nucleotide in the DNA structure. This binding of the phosphate group from the 5' carbon of one nucleotide to the 3' carbon of the previous nucleotide strings the nucleotides together into a long strand of DNA. This is how we number the DNA as it is read from 5' to 3' along that strand of DNA.

We know that DNA is made up of 2 strands. The first strand will run 5' to 3' while the other strand will run from 3' to 5' creating an exact opposite copy of the first strand. We call this antiparallel. This brings us to the concept of base pairing of the nucleic acids in the double stranded DNA structure. This is called the Chargaff's rule. He measured the amounts of each nucleic acid used in the DNA and found that the amount of Adenine was always equal to the Thymine. While the amount of Guanine was always equal to the amount of Thymine. This led to the conclusion that Adenine base paired with Thymine and Guanine base paired with Cytosine. One purine with one pyrimidine.

The Adenine base pairs with Thymine using 2 hydrogen bonds while the Guanine base pairs with the Thymine using 3 hydrogen bonds. This hydrogen bonding establishes a weak bond that can be opened when enzymes want to copy or replicate the DNA. The base pairing of the DNA with one strand running in one direction and the other strand running in the opposite direction gives us the double helix structure of the DNA. This is the double stranded DNA structure we associate with Human DNA.

The nucleic acids face the inside of the structure as they are electrostatically charged and bind to each other. The phosphates are on the outside and make up the phosphate backbone of the DNA double helix. This is because the phosphate will protect the DNA from the watery environment around it. When it comes to RNA, its structure is single stranded so it tends to fold up and base pair with matching nucleic acids forming Stem and Loop formations.

DNA Packaging

Now that we covered the basic structure of the DNA with nucleotides, base pairing, anti parallel and double stranding of the DNA. Next we want to look at the double Helix of the DNA and how DNA is packaged.

As the DNA is strung together into its base paired double stranded structure it begins to coil around. Each one of these twists in the DNA is about 10 nucleotides long. This twisting formation of the double stranded DNA structure creates 2 groves in the structure called the Minor Groove and the Major Groove. This can be important in some DNA concepts so understand the 2 different grooves.

The overall DNA strand will have a slight negative charge. We will begin to look at the basic component of DNA packaging called the Nucleosome. This is made up of a Histone protein and the DNA that gets wrapped around it. The full nucleosome includes the Histone and all the DNA wrapped around it including the DNA that links to the next nucleosome in the sequence.

The histone itself is made up of 4 basic proteins called H2A, H2B, H3 and H4. The Histone brings together each of these 4 proteins to make a circular structure that the DNA can wrap around. A complete histone is 2 of these wheel-like structures called an octamer. That allows the DNA to wrap twice around each histone octamer with it wrapping once around each set of histone proteins once. The histone octamer has a slight positive charge that allows it to bind to the slightly negatively charged DNA and create stable binding. The histone is a very important part of DNA packaging and Epigenetic regulation of genes. When the DNA is packaged on the histone, it is inactive as no enzymes that interact with the DNA can gain access to the DNA while it is packaged. The Histones block the binding of these enzymes. This is how cells can regulate which genes are active (unpackaged) and which ones are inactive (packaged). Only when DNA is unpackaged can it be accessed for transcription.

The vast majority of DNA in a cell will be packaged on the histones as it won't be necessary for that specific cell. On the histone proteins, there will be small protein tails. These tails are extremely important in gene regulation and expression. The histone tails control which genes get unpackaged from the histone and which do not. This is done by adding or removing Acetyl groups to the histone tails. When acetyl groups are added, the DNA will bind to the histone preventing gene activation. When the acetyl groups are removed, the histones will come off the DNA exposing it for transcription. There is a set of enzymes responsible for adding and removing the acetyl groups on the histone tails to regulate the packaging and unpacking of the DNA for transcription. These enzymes are called Histone Acetyltransferase (HAT) enzymes that add the acetyl groups and Histone Deacetylase (HDAC) enzymes that remove the acetyl groups.

The nucleosome is made when the DNA is wrapped twice around the 2 sets of histone proteins that make up the overall histone protein octamer. The nucleosome also includes the strand of linker DNA that spans to the next nucleosome in the sequence. This creates a string of beads like structure from the double stranded DNA. This is the first level of packaging called the 11 nanometer (nm). The next level of packaging is called the 30 nm level. The 30 nm level of packaging requires the H1 histone protein which acts like a spacer to bring the histones together into a tight formation.

The next level of DNA packaging is the 300 nm level of packaging. This uses a DNA structural protein and folds the DNA into loops down the side of the protein. The next level of DNA packaging is the 700 nm level of packaging. This takes the 300 nm that is looped down the side of the structural protein. It then wraps that structural protein into another coil further compressing it. The final stage of packaging is that of 1400 nm. This is the fully compressed DNA into the common chromosome structure we often associate with DNA. This level of packaging is only done when the DNA condenses during the cell cycle when it compresses to undergo mitosis.

When DNA is densely packaged it is called heterochromatin. When it is loosely packaged, it is called euchromatin. Euchromatin is transcriptionally active. Most of the time, the DNA is in the euchromatin state because it is actively being used. The heterochromatin is more densely packed and not transcriptionally active. Some parts of the chromosome are always densely packed like around the centromere and the telomeres. These spaces contain no active genes as they are never active.


The human genome is over 3 billion bases long in the DNA. This would be an unbelievably long strand of DNA. The human genome makes it more manageable by breaking it up into 23 chromosomes. Each chromosome contains specific parts of the overall genetic information. The first chromosomes are the largest and contain the most genes. As we go through the chromosomes, they get smaller and smaller except the last one. The first 22 are called the autosomes. They contain the exact same genes on each of these chromosomes. The last set of chromosomes are called the sex chromosomes and it determines the sex of the child. Each person gets 23 chromosomes from their mom and another 23 chromosomes from their dad for a set of 23 with 46 chromosomes in total. If the 23rd chromosomes are both an X chromosome, the child is female. If there is an X and a Y chromosome for this set, the child will be male. The very last set of chromosomes actually determines sex in human genetics. This is because the Y chromosome carries the genes that determine male characteristics.

The term ploidy is used in genetics to represent how many copies of each chromosome each species has. If a cell has only 1 copy of each chromosome, like our germline cells, that is haploid. Often shown as (N). If a cell has 2 copies of each chromosome, like our somatic cells, it's called diploid (2N). Some species have more chromosomes. The strawberry is octoploid with 8 copies of each chromosome. The understanding of the chromosomes, their structure and copy number is important in human disease. There can be chromosomal defects that can occur in the shape or even number of chromosomes. When cells end up with a wrong number in a specific chromosome, we call that aneuploidy. The typical cell gets 2 copies of each chromosome in humans. The term Monosomy means each cell only gets a single copy of a specific chromosome. Turner Syndrome is an example of a monosomy disorder where the person gets only 1 copy of the X chromosome. Trisomy occurs when a person gets 3 copies of a specific chromosome. There are a few disorders that stem from a trisomy of a specific chromosome. The most commonly known is Trisomy 21 which leads to Down Syndrome. The number of chromosomes can change in specific cells during cancer genomic instability.

Cancer can also lead to rearrangements of the chromosomes. These are called translocations. It occurs when a part of 1 or more chromosomes is cut off and placed on another chromosome. These can be reciprocal or nonreciprocal translocations. One of the most famous translocations in cancer is called the Philadelphia Chromosome. It is a reciprocal translocation where part of the 9 and 22 chromosomes swap a section of their DNA.

The next concept for chromosomes is the location of their Centromere. The centromere is a key part of the chromosome that plays a role in binding proteins to segregate the chromosomes during cell replication. There are 4 types of centromeres and they are named by their location in the chromosome. If the centromere is right in the center, it is called metacentric. If it is slightly off center, it is called sub metacentric. If the centromere is closer to the end of the chromosome, then it is acrocentric. If it is at the very end of the centromere, then it is telocentric. There are no natural telocentric human chromosomes.

You sometimes hear the term that a gene is located on the short or long arm of the chromosome. This refers to the smaller part of an acrocentric chromosome. The long arm would be the longer part of the acrocentric chromosome.

The last part of the chromosome is the Telomere. This is the cap on the ends of each chromosome. It plays a key role in cell division. The telomere is there to protect the ends of the DNA strands. Each time a cell copies its DNA for mitosis, the telomere gets a little shorter. This is called the Hayflick limit and averages 40 to 60 times a cell can go through mitosis before it hits this limit. Once any cell hits this limit it goes into senescence which it will no longer divide. This plays a role in many age related disorders. Stem cells express the gene for an enzyme that is called telomerase. The telomerase enzyme extends the telomere each time the cell divides making stem cells immortal. This leads to one of the hallmarks of cancer. Cancer cells turn on the gene for the telomerase enzyme allowing them to become immortal.

DNA Synthesis

There are about 3 billion nucleotides in the human genome across 23 pairs of Chromosomes. It would take a very long time to copy all of the DNA with just one set of replication enzymes. This means the DNA will synthesize using many points of replication. The site where the replication starts is called the origin of replication. There is one of these replication start points every several thousand nucleotides in the human genome. These are at sites rich with A and T pairs as they have less bonds which makes it easier for the replication enzymes to open up the DNA and get started.

The first enzyme will be Helicase which opens the DNA at the origin of replication. It unzips the DNA along the hydrogen bonds that hold the DNA together. The spot at which the Helicase unzips the DNA is called the replication fork. Since the DNA has an electrostatic charge, it will want to snap back closed. There is a set of proteins called Single Strand Binding (SSB) proteins that bind to each strand of the DNA and stabilize the charges. The DNA polymerase that is responsible for copying the DNA can only bind to double stranded DNA, but in this case the DNA is separated and each stand is alone. There is a RNA primase that comes in and lays in a few RNA primers into the starting spot so the DNA polymerase can attach.

Both strands are copied at the same time. Each original strand acts as a template while a new strand is synthesized from each of the original strands. This process leaves each new DNA with 1 original strand bound to 1 new strand. This process of taking one original strand and using it as a template to make a new strand leaves each new copy of the DNA with one original paired with one new strand. This is called the Semiconservative model of DNA replication.

The DNA polymerase has to read the DNA going from the 3' to the 5' direction because it creates DNA from the 5' to 3' direction. This makes one strand of the DNA easy to copy. The polymerase just binds to the 3' end and goes copying the DNA along the way as it is always moving forward. The stand that starts at the 3' end and continues with the flow of the replication fork is called the Leading Strand. The DNA polymerase just moves along this strand copying the template strand into a new strand as it goes.

The other strand runs in the opposite direction of this leading strand. That means the DNA polymerase has to work backward as that strand is going in the 5' to 3' direction. It does this by jumping several thousand nucleotides ahead and working backwards. These sections of DNA, which it copies by jumping ahead and working backwards, are called Okazaki fragments. Initially, they are not connected together. There is another enzyme that comes along and connects these Okazaki fragments together. This enzyme is called DNA ligase.

Because the DNA is wound together into a double helix, the unwinding of the DNA causes tension on the DNA. If that tension isn't released, it will build until it stops the helicase. There is another enzyme that binds to the double strands of DNA ahead of the helicase enzyme. As the tension on the DNA gets too high, the topoisomerase enzyme will break the bonds of the DNA and allow it to unwind the tension before binding it back together. The topoisomerase is a critical enzyme that is necessary for DNA synthesis to release the tension. There are cancer drugs that target this topoisomerase enzyme to prevent DNA synthesis in rapidly replicating cells. This causes the DNA to break from the tension and the cells undergo cell death.

The process of DNA synthesis occurs during the cell cycle in the S phase of the cycle. This is the only time the DNA gets copied before the cell undergoes mitosis or meiosis.


The DNA of the cell serves as a master blueprint. Every cell of our body has a complete copy of this DNA blueprint. Depending on the cell and its functions, it will use specific parts of the blueprint found in the DNA to produce the proteins and enzymes needed by that cell. The rule is that each gene contains all the DNA to create one fully functional protein or enzyme. The Central Dogma of Biology states that DNA creates RNA which creates proteins.

The gene itself has a few key regions that are important to be familiar with. The very first nucleotide to be copied is the start of the actual gene. The part of the DNA that gets copied is called the Open Reading Frame (ORF). The gene includes all the nucleotides that make up the actual messenger RNA. The next important part about the DNA is the promoter. The promoter starts about 50 to 100 nucleotides before the gene's starting nucleotide. The most basic part of the promoter is called the TATA box. It is a sequence of Thymine and Adenine nucleotides. The transcription factors will bind to the promoter region and build the initiation complex for gene transcription on the TATA box. There is actually a big complex of proteins that must bind to the promoter region before the RNA polymerase. This is called the initiation complex. It starts with the TATA binding protein (TBP) and builds this complex that recruits RNA polymerase.

The RNA Polymerase II enzyme will bind to the Initiation Complex and begin copying the gene starting with the first start nucleotide and proceeds till it reaches the termination point. Then the RNA Pol II will fall off and the primary transcript will be finished.

The next important part of a gene is the regulatory regions. These are located thousands of bases before the promoter region. This is because the DNA will flip over and the regulatory region will make contact with the promoter. The regulatory region can bind transcription factors called Repressors or Enhancers. The DNA will fold in a loop which allows these Repressors or Enhancers to come in contact with the promoter. This will allow the Repressor to block binding of the RNA polymerase. An Enhancer can bind to the regulatory region too. It will make contact with the RNA polymerase, but it will increase the activity of the RNA polymerase leading to increased gene expression. Once the transcription factors have bound to the promoter region, the RNA polymerase can bind to the transcription factors and begin copying the gene if there is no repressor on the regulatory element. It can also be increased by an enhancer.

The RNA polymerase opens up the DNA itself as it binds to the initiation complex. It doesn't need any assistance like the DNA polymerase did. It will begin copying the DNA using one strand of DNA as a template and create a RNA strand in the 5' to 3' direction. The RNA polymerase must use the same rules as the DNA polymerase. It reads the DNA from 3' to the 5' direction so it can create the RNA in the 5' to 3' direction.

The initial strand of the RNA is called the primary transcript. It will undergo three modifications before it can leave the nucleus of the cell. The first is it will undergo splicing by the spliceosome complex. The vast majority of the gene is made up of non coding information called introns. The actual coding part of the gene is called the exons. The spliceosome will follow splicing marks on the primary RNA and remove all the introns and paste the exons back together. This also leads to the ability of alternative splicing by cells to use a gene differently. One cell might use Exons 1, 2, 4, and 5 while another might use Exons 1, 3, 4 and 5. That is the concept of alternative splicing. There are about 22,000 genes, but we know there are over 100,000 proteins in the human body. Most of this variation is done by alternative splicing. Different cells will use different splicing to create different variants of a protein.

After the splicing of the RNA is complete, there are enzymes that will add a Guanine cap to the 5' end of the RNA. This serves to allow the RNA to exit the nucleus and assists in loading the RNA into the Ribosome. The last process is the addition of the poly A tail. This just adds about 250 or more Adenine nucleotides to the end of the RNA. This signifies the end of the RNA when it is read by the Ribosome.

The final RNA product is now called a messenger RNA (mRNA). It is built to take the DNA information and deliver it to the Ribosome for production of a protein. The concept of a fully processed mRNA is important. In Virology, you will see many RNA based viruses. You might think, since it is RNA, it can go right into the cell and start making proteins. That isn't always true. Many of them are not properly formatted into the mRNA format necessary to load into the Ribosome.

When a strand of DNA or RNA runs in the 5' to 3' direction, we call that the positive or sense strand. When it runs in the opposite direction from 3' to 5' direction, we call that a negative or Antisense strand. The messenger RNA that is created by transcription is positive or sense strand. Many viruses are negative strand or antisense. They have to be translated into the opposite form before they can be used by the Ribosomes.


When the messenger RNA (mRNA) exits the nucleus, it will be loaded into an organelle called the ribosome. These little factories take the RNA blueprint and use it to build a protein from amino acids. This is where capping of the 5' end of the mRNA becomes important. That Guanine cap is designed to help the mRNA load into the Ribosome for translation.

Here we introduce the concept of the Codon. The DNA is made up of 4 bases with Adenine, Guanine, Cytosine and Thymine. They have to encode 20 different amino acids. To do so, the DNA uses a codon. It is a combination of 3 nucleotides. The DNA only contains 4 basic bases, but it has to code at least 20 amino acids. This only works with at least 3 bases making up a single codon to encode an amino acid. If you look at the math, 4*4*4 = 64 possible combinations. There are only 20 amino acids. This leads to many of the combinations coding the same amino acid. This level of redundancy in the code allows for some variation without bad effects.

The start codon is always AUG (Adenine, Uracil and Guanine) and codes for the amino acid Methionine. Each 3 bases of nucleotides is one codon and encodes one amino acid until it reaches the stop codon. There are actually 3 different stop codons with UAA, UAG and UGA. The Ribosome takes the RNA and begins with coding the start codon with Methionine and continues to read each and every codon until it reaches the stop codon. It will use transfer RNAs (tRNAs) to build a chain of amino acids from the mRNA.

The Transfer RNA (tRNA) has an anticodon on one end and the amino acid on the other end. The tRNA will match the mRNA codon with its equal opposite anticodon. When it finds a match, it will add its amino acid to the chain of amino acids being built. The Ribosome will continue to progress along the mRNA template matching the anticodons of the tRNA to the codons of the mRNA. It will add the amino acids to the chain of amino acids it is building until it reaches the stop codon.

Mutations in the DNA can cause some very serious damage to the final product which is the protein. Since 3 nucleotides come together to make one codon and produce one amino acid, you can get some dramatic changes from even a single nucleotide mutation. These changes in the DNA can be called Point Mutations or Single Nucleotide Polymorphisms (SNP). The big difference to which term we use is the result. We typically use SNP when it is a benign mutation. Point mutation tends to be used when change in the DNA causes disease. Point mutations can come in two kinds. The first is the kind that does not cause any change. Some changes might just end up coding the same amino acid. These mutations that lead to no change are called Silent Mutations.

When a single point mutation causes the Ribosome to code a different amino acid, we call these Missense mutations. These are the mutations that often lead to disease. Sickle Cell disease is one example of a single nucleotide mutation leading to a changed amino acid. This missense mutation leads to another amino acid being coded. The two different amino acids can have dramatically different behaviors. Such as, the one is hydrophobic while the other is hydrophilic. That simple change will change the entire shape of the protein. In proteins, shape determines function. When you change the shape of the protein due to the change of a single amino acid, you can end up with a dramatically different protein in function.

The next type of point mutation is the nonsense mutation. This is where a mutation changes the codon from encoding an amino acid to a stop codon. This terminates the production of the protein early. A nonsense mutation will make a truncated version of the protein. In some cases, the shorter proteins are still functional or partially functional. In many cases, they lose complete function of that protein. The insertion of a stop codon into a gene has actually become a tool used today in gene editing to insert a stop codon early into the gene sequence which acts like a gene suppression.

The last mutation is called the frameshift mutation. That is when a single nucleotide gets inserted or deleted (Indel). That causes every codon in that gene to get shifted. None of the codons will be right when they all get shifted by one extra or one less nucleotide. These mutations tend to happen from errors in DNA synthesis where the polymerase slips or skips a base. It is also a big concern when DNA has a Double Stranded Break. The repair machinery can add or remove bases during the repair.

Protein Folding

The ribosome creates the initial peptide of amino acids. It starts out as just a single long strand of amino acids. Then the peptide will undergo folding. There are various proteins and enzymes that assist in the protein folding process. There are 4 stages to protein folding with the Primary structure which is just the starting peptide. Then there are the secondary structures like alpha helices and beta pleated sheets. Then comes the Tertiary structure which is the final 3D protein. Some proteins will combine together to form what is called a Quaternary structure. These tend to be multi protein structures like Hemoglobin and Antibodies. The folding of a protein is regulated by the many electrostatic and hydrostatic bonds of the amino acids. Some of the Amino Acids will have electrostatic bonds and will want to fold and bind to each other. Others will have hydrophobic properties which will make them turn into the center of the protein while hydrophilic amino acids turn outward toward the watery environment.

The first structure is the starting Primary Structure. This primary structure is just the sequence of amino acids and their place in the overall chain. This often just looks like a string of beads. The Secondary structures will form into alpha helix and beta pleated sheets. The alpha helix is just a coil shape and these play a huge role in many protein structures. The part of a protein that spans across the membrane of a cell will be an alpha helix in many cases. The alpha helix has about 3.6 amino acids per turn. They are often held together by the bonds between amino acids. This isn't the only helix as there is a phi helix out there with 4.1 amino acids per turn. The alpha helix is used over and over in proteins. The alpha helix even makes up the binding regions of antibodies and T cell receptors. It is a critical structure to understand in biology.

The beta pleated sheet looks like a radiator structure where the strands alternate back and forth to make a sheet like structure. The different amino acids in these structures will bind to each other firming up the overall structure. These are key structures in many proteins.

The Tertiary Structure is the final 3 dimensional shape of the protein. You will hear them called globulins which means protein. The final structure of the protein will determine its function. When it comes to proteins, shape determines function. This is a very important concept as a single change in a single base can code a different amino acid. That can completely change the shape of the protein and ultimately its function. A fully folded protein will have both Alpha helices and Beta Pleated sheets.

The Quaternary structure is when multiple proteins come together to form a larger structure like in Immunoglobulins or Hemoglobin. In these structures several proteins are coming together to build a larger protein structure that functions as one. Typically, the individual proteins in a quaternary structure are called chains. When we talk about an antibody, we talk about the heavy chain and light chain. They are separate proteins that are part of a larger protein structure.


Most cells spend their lives in the G0 phase of growth. This means they are fully mature and go about their normal functions. They do not replicate often or at all, but some cells are created frequently like red blood cells and neutrophils. Different cells of the body will have different turnover rates at which they replicate. Cells like the eye or heart cells don't ever go through mitosis. Others will only enter mitosis after injury or damage. Some cells turn over very often like cells of the GI tract. The process of cell growth is highly controlled and regulated as Cancer is the process of uncontrolled cell growth. When a cell gets the right growth signals, it will enter the Cell Cycle. This begins mitosis which is the regulated splitting of 1 cell into 2.

When a cell gets the signal to replicate, it will go from G0 phase into G1 phase. This is called the growth or gap phase. In the G1 phase of cell growth, the cell will build up resources and grow so that it is ready to split in two. It will duplicate all its organelles in preparation for division. After reaching the end of the G1 phase, there is a checkpoint for the cell. It is called the G1 to S phase checkpoint. This is where all the DNA is inspected to ensure its good before it gets copied. If it passes this checkpoint, it moves into S phase.

Then it moves into the S phase which stands for Synthesis. That is where the DNA gets synthesized so that every chromosome is copied. Once the cell has 2 copies of every chromosome called sister chromatids, it will advance into the G2 phase of growth.

In the G2 phase it will check all the DNA and verify there are no errors, and the cell will be ready to divide. Once all the DNA has been checked to ensure no errors, it will face another checkpoint at the end of the G2 phase. The entire cycle up this point from G1 to S phase to the end of G2 is called "Interphase". When it is complete, there is one last checkpoint called the G2 to M phase checkpoint. This ensures everything is ready to begin the process of dividing up all the DNA, organelles and splitting the cells. When it is all complete, the cell advances into the final M phase which stands for Mitosis.

The process of Mitosis is 5 steps with Prophase, Prometaphase, Metaphase, Anaphase and Telophase. The first step in mitosis is called Prophase. In this phase, the centrosomes form and move to opposite sides of the cell and begin to produce protein strands called tubulin. The cell will enter Prometaphase of mitosis where the nucleus of the cell will dissolve. Then the Tubulin strands will bind from each centrosome to each of the centromeres of the Chromosomes. During Metaphase, the tubulins will bind to each and every chromosome and pull them tight. Since both centromeres are pulling on every chromosome, this puts tension on them and lines them up at the center of the cell. This lineup is called the Metaphase Plate. In anaphase the enzyme separase will go down the line and cut the proteins that hold the sister chromatids together allowing the tension of the centromeres to pull apart the chromosome taking 1 copy to each side. Then the cell will undergo cytokinesis which is the actual splitting of the cells into two. Each new cell will then form a new nucleus around its DNA completing the cycle of mitosis.


Epigenetics means on top of genetics. The study of Genetics is all about DNA. Genetics is about how DNA is structured and packaged. Epigenetics is about how Genes are expressed and regulated. We are born with our Genetics, but our Epigenetics is acquired by experiences and environmental exposure. I don't want to get too deep into epigenetics, but I think there are a few concepts that are important to understand.

The first one is the process of gene methylation and gene silencing. Each gene has a promoter in front of the gene where the transcription factors bind and activate transcription of that gene. The promoter of the gene will often have Cytosine and Guanine rich regions called CpG islands. That stands for Cytosine, Phosphate, and Guanine. These CpG rich regions can become methylated. Methylation of these CpG regions of the promoter can come from environmental factors like UV exposure, chemicals, radiation, smoking and so many other things. This exposure can cause the methylation of the promoter and eventual silencing of the gene. The silencing of a gene plays a big role in understanding tumor genesis in cancer. The loss of tumor suppressor genes don't always come from a mutation of the DNA which renders them ineffective. It will often come from gene silencing by epigenetic forces.

The second concept of epigenetics is the Acetylation and Deacetylation of the Histones which package the DNA. I mentioned these enzymes before in DNA packaging. We learned in the packaging section that the DNA gets wrapped twice around each histone. While the DNA is packaged like this, it is transcriptionally inactive. The proteins and enzymes that do transcription can not access packaged genes. For a gene to be transcribed, it has to be exposed to the transcription machinery. This is controlled by acetylation or deacetylation of the histones. When the acetyl groups are added the DNA opens up. The DNA normally has a slight negative charge. The histone has a slight positive charge. They like to electrostatically bond to each other. By adding or removing an acetyl group to the tails of the histone, the charge can change allowing the DNA to be unwound. This is an important concept to understand as a gene needs to be exposed to be active. Some areas of the DNA are always inactive and densely packed like round the centromere and the telomeres. There are no genes encoded in this region.

One cell might have a gene active as it uses it all the time while another cell will keep that gene packaged as it never uses that gene. Each cell only uses genes specific to that cell's role and functions. Some oncology drugs will target the acetylation or deacetylation of histones to suppress the transcription of genes in cancer. Some other oncology drugs will target the demethylation of the DNA to attempt to remove the suppression of the tumor suppressor gene.

Transmission Genetics

Transmission Genetics is all about the study of how genetic information is passed from one generation to the next. It is the study of how DNA gets copied and placed in Gametes for the purpose of reproduction. It is the study of the patterns of inheritance of traits from parents to children. With patterns of inheritance, we study how a genetic disease can be passed from one generation to the next. Transmission genetics comes with a bunch of terms that we will have to go over.

Our genetic information is encoded in linear segments of DNA called chromosomes. Each chromosome contains a specific part of the overall genetic information. We get 23 chromosomes from each of our parents for a total of 46. You get a chromosome 1 from your mother and a chromosome 1 from your father. They encode the exact same genes, but not exactly the same genetic information. These two same chromosomes are called a Homologous pair. The homologous pair contains the same genetic information. The first 22 chromosome pairs are called the autosomes. They are the same for every single person and encode all the same genes. The last pair (23rd pair) are the sex chromosomes. This determines the sex of the offspring. If both of the 23rd chromosomes are X chromosomes, the child is female. If it is 1 X and 1 Y chromosome for the 23rd pair, the child is male. The Y chromosome determines male as it encodes all the genes to drive male characteristics. Males get 1 X and a Y chromosome. When they end up with a defective X chromosome, they have no spare to help them because the X and Y chromosomes encode different genes. This leads to some of the genetic disorders we hear about every day. Since females have 2 X chromosomes, one of them will be deactivated in the fetus. It happens early in fetal development, and they end up with about 50% of their cells using 1 X chromosome and the other 50% using the other X chromosome. This is called X chromosome inactivation. This is done to keep the gene dosing correct since males have only 1 X chromosomes and females get 2. One of them gets inactivated in every cell to keep the level of proteins produced equal in both males and females. The effects of X inactivation can be noticed by the remnant of the inactivated X chromosome which is called a barr body. They can be noticed in the neutrophils of the immune system.

Each chromosome has a centromere that is the center of that chromosome. Not all centromeres are at the exact center of the chromosome. It tends to vary with each chromosome. These centromeres play a key role in cell division. This is where the sister chromatids will be linked together after DNA synthesis. It is also where the tubulin strands from the centrosomes bind to pull apart the sister chromatids during mitosis and meiosis.


The term locus is the location of that specific gene on the chromosome. This is like the gene's address on that Chromosome. There are some common gene loci that we refer to all the time. We talk about the different locations (locus) of genes on different chromosomes. Let us look at a few so you get an idea of how genes are located on chromosomes. One such example is the T cell receptor alpha chain locus or the TRAC locus. This is referred to when inserting a CAR receptor into the exact location of the T cell receptor. The gene for the Alpha T cell receptor is located on chromosome 14. We know right where to look for it. Some other important locations of genes are: MHC is located on chromosome 6 and the Heavy Chain of antibodies and T cell alpha chain are both on chromosome 14. The light chains of antibodies are located on chromosome 2 and 22 for the kappa and lambda light chains respectively. The beta chain of the T cell receptor is located on chromosome 7. As you can see the locus of a gene is its location on a specific chromosome. You know where to look for it.

An allele is the variation across a population for a specific gene. Each locus will encode the exact same gene like eye color or hair color, but there can be many different alleles of that gene. You can have blue eyes, brown eyes, and even green eyes. These are different alleles. This leads us to two other definitions of genotype and phenotype.

The genotype is what alleles you actually have at the location of a specific gene. You might have gotten brown hair allele from mom and the blond hair allele from dad. That makes your genotype brown/blond. The term phenotype refers to the actual physical expression of that gene. If you had brown hair gene from mom and blond hair gene from dad, what color hair you have would be your phenotype. That means if you have brown hair that is your phenotype. So in this example the genotype is Brown/Blond, but the phenotype ends up being brown. Why does this happen? This brings us to the concept of dominance. Before I jump into Dominance, I got one other important concept here on genes with homozygous and heterozygous genes. When both genes have the exact same allele, it is called homozygous. When they are different alleles, it is called heterozygous. Homo meaning the same and Hetero meaning different.

The study of dominance in genes is one of the hallmarks of transmission genetics. How some genes are expressed over others. The principle is that a dominant gene gets expressed over a recessive gene. There are several forms of dominance. The first is complete dominance. That means if you get an allele for blue eyes and an allele for brown eyes, the dominant gene would be expressed as the phenotype. Let us assume brown eyes are dominant over blue so the phenotype would be brown eyes. The blue eye gene is recessive and can be passed to offspring, but it won't be expressed over the brown eyes. The second is codominance. This is best expressed with the MHC. You get 3 MHC class I antigens from mom and 3 MHC class I antigens from dad. Each and every cell will express all six of these antigens. No one antigen will be expressed more or less than the others. Then comes incomplete dominance. This is where the phenotype is a blend of the two genes. This concept is best shown from examples in flowers. If you have a red flower and a white flower and decide to crossbreed them, your result would be a pink flower in a heterozygous offspring.

The next concept we must cover is that of Epistasis. Epistasis is the concept of one gene regulating the expression of another. The first gene may create pigment for hair color while the other determines how pigment for hair color gets expressed in the hair. Let us say Gene #1 is either Black or Brown for the allele possibilities. You can either get Black or Brown hair. Black is dominant over Brown so if you get one of each allele, you will have Black hair. Now the epistatic gene regulates how much of that pigment gets expressed into the hair. You can have Gene #2 which expresses the pigment as On or Off as the allele. Even if you have a Black or Brown pigment gene, if you get an Off gene for expression, you end up being blond. You can produce the pigment, but the epistatic gene prevents it from being expressed.

The last concept of dominance will be penetrance. This is how much a gene might be expressed in the population. This often applies to genes that lead to disease. The basic concept is not every person who gets the gene for a disease will get the disease. Some diseases are 100%. If you get that gene, you definitely will get the disease. Other diseases will have only a percentage of people that get the gene develop the disease. The best example here is polydactyly. This is a gene that runs in families where they can get more than 5 fingers or toes. It only has a penetrace of 43%. That means only 43% of people who get this gene are born with an extra digit.


We will start with a grandparent generation and see how those chromosomes get passed to parents then to children. This will help demonstrate the way chromosomes get passed from generation to generation. Inheritance patterns follow how chromosomes are passed from parents to children. This is a cool concept if you are trying to figure out if you got blue eyes from grandma or a dimple in your chin from grandpa, but it becomes very important when studying genetic disorders.

Each of your parents got 46 chromosomes with 23 coming from grandma and 23 coming from grandpa. Those chromosomes were randomly passed to mom or dad. Your mom and dad are a random collection of both grandma's and grandpa's chromosomes. Your mom could end up with Grandma's Chromosome 1, 2, and 5 while getting Grandpa's chromosome 3, 4 and 6. The process of chromosomes being deposited into any gamete is 50/50. This is called the law of independent assortment. A person has 2 of each chromosome and each gamete has the same odds of getting a copy of each, it is only important that the gamete only gets 1 copy. Whether or not it is mom's chromosome 1 or dad's chromosome 1 does not matter as long as one of each of these chromosomes ends up in a single gamete.

This random assortment is to promote variation in our species which helps with evolution and survival. This makes each of your parents 50% grandma and 50% grandpa since they get 1 of each chromosome from each of them. Then those chromosomes from your mom and dad will then get randomly assorted again and you will end up with 50% of your mom's and 50% of your dad's chromosomes. You could end up with those blue eyes from grandma and those dimples from grandpa.

This study of inheritance was developed by Gregor Mendel and is often called Mendelian Genetics instead of Transmission Genetics. It tracks how these possible combinations of genes from the parents get passed to the children. Mendel did all his genetic research using pea plants. He laid all the groundwork and rules for the passing of genetic information from one generation to the next.

The monohybrid cross is a tool that allows us to predict the distribution of the gene possibilities. When we look at a specific gene, we will label all of the possible alleles. For a monohybrid cross, we make a box with 4 squares in it. On the left side we put both of mom's genes. Along the top, we place both of dad's genes. Then we match the left to the top and place the 2 genes into that square. This gives us the 4 possible outcomes of those genes. We use a letter to represent a gene type like B would be black dominant hair and b would be brown recessive hair. If mom has genes for Black/Black, she is homozygous dominant. Every block would get at least 1 capital B from mom. That means every one of her children will have black dominant hair as their phenotype and all will be at least heterozygous for black hair. If dad has 2 recessive b for brown hair genes, then we place a small b in every square. Quickly, we can see all their children will end up heterozygous with B/b genotype and black hair for phenotype.

Let us do another example with a monohybrid cross. Let us say mom is heterozygous for B for dominant Brown eyes and b for recessive blue eyes. Dad is heterozygous for B for dominant Brown eyes and b for recessive blue eyes. This means mom will have B and b on the left side of our 4 squares, and dad will have B and b across the top. The first square will be a match for BB from mom and dad. The second square will match B from mom with little b from dad. The third square will be little b form mom and big B from dad. The last square will be both little b's from both parents. That means 1/4 of the children will be homozygous dominant for B/B. There will be 2/4 or half with heterozygous B/b. The final 1/4 of children will have both b/b for homozygous recessive blue eyes. For phenotypes, 3 out of 4 children have a dominant gene and will express the dominant brown eyes phenotype.

Gene Linkage

The rules of Mendelian Genetics treat every gene as if it were on a separate chromosome. Every gene has a 50% chance of making it into any gamete. The truth is genes are on chromosomes and many genes can be on the same chromosome. This brings up the concept of gene linkage. One such example is the genes for Major Histocompatibility Complex (MHC) on Chromosome 6. These genes are linked and you inherit the whole group of them from the same chromosome. You get 1 chromosome 6 from mom and 1 from dad. You will have the exact same genes for each of these genes as your parents on that chromosome. These linked genes are called a haplotype. This means that groups of genes are all inherited together. This leads us to the concept of crossing over. This is a process during meiosis. This is where the sister chromatids line up and synapse.

During the process of synapsis, you can get the crossing over of parts of the chromosome for sister chromatids. This is where parts of mom's chromosome 1 might exchange segments of DNA with dad's chromosome 1. The rate and locations on chromosomes for crossing over is different for each and every chromosome. Some chromosomes or segments of chromosomes will allow a lot of exchange of DNA while others won't ever have crossing over. Because of this process, the distance between 2 genes plays a role in gene linkage. The distance between 2 genes is measured in centimorgans (cM). The process of crossing over is called recombination and it is used to measure the distance between two different genes.

By measuring the rate of recombination occurring between 2 genes, we can determine the distance between those 2 genes in centimorgans. There is a whole chapter in transmission genetics on recombination with single crossing over, double crossing over and triple crossing over of genes. When the distance between 2 genes reaches a distance more than 50 cM, those 2 genes are considered to be treated as if they were on 2 different chromosomes. This is because a 50 cM distance equates to a 50% chance of recombination happening between those 2 genes. That 50% puts them equal to the 50% rule for independent assortment. This process of recombination and crossing over of DNA between 2 sister chromatids only happens in Meiosis. This does not occur during mitosis.


The pedigree is a diagram that is used to track a trait across a family tree. It might be cool to track the trait of red hair in your family, the pedigree is critical for tracking a genetic disease through a family to determine the risk of inheritance. There are some basics to drawing a pedigree. The lines connect lineages with the males being the squares and the females being the circles. The lines connect 2 mates and show a tree of all their children. The roman numerals on the left mark each of these generations. Some pedigrees will use a slash / to cross out deceased people in the pedigree.

Anyone who has the trait gets a small colored circle in the center. They carry the gene for the disease, but they are not affected. Those that have the actual disease get fully colored in. The lines link the children from each generation and their mates. The children for each set of parents gets listed from left to right in the order of birth.

By looking at the patterns of inheritance on a pedigree, we can quickly see how the trait is passed. If the trait affects both males and females, it is an autosomal trait. If it only affects males or females, then it is a sex linked disease. If the disease affects every generation, it is a dominant trait. An autosomal dominant trait will affect about 50% of both males and females from an affected parent who is heterozygous. Recessive traits tend to skip generations and can often skip several generations as many parents get the trait, but not the full disease. The pedigree is a great tool for genetics to track a trait across generations to get valuable insight into how it is passed from one generation to the next.


Meiosis is a 2 stage process for the formation of gametes which are used to pass on the genetic information for reproduction. The first cell cycle is called Meiosis I and it is the duplication of the chromosomes into homologous chromatids. During Meiosis I the process of all 46 chromosomes getting copied by DNA synthesis is the same as in mitosis. You end up with 46 homologous chromatids. What happens when the cells actually split, is very different. The first major difference between mitosis and meiosis is how the chromosomes line up. In mitosis they all line up in a row with the tubulin binding to each and every copy. In meiosis the chromosomes pair up. The mom and dads chromosome 1 will pair together and synapse. This allows for the process of crossing over. This is where parts of the homologous chromosomes will swap some of their DNA segments. This means parts of the chromatids from moms' #1 chromosome and parts of the chromatids of dad's #1 chromosome will swap parts of their DNA. This is called crossing over or recombination. The term recombination is used whenever DNA recombines in new ways. After the crossing over in Meiosis I, the tubulin will reach out and bind one of the chromosomes in each pair. That means one new cell will get mom's #1 homologous chromatids and the other cell will get dad's #1 homologous chromatids. This is often called the reduction phase as the new cells go from Diploid with 2 copies of each chromosome to Haploid which is just 1 copy of each chromosome. You can get a new cell with mom's #1, #2 and #5 chromosomes and dad's #3, #4 and #6 chromosomes.

After the cell splits into 2 new cells, each new cell will have a full set of chromatids for each chromosome. Some of the chromosomes will be from mom and some of them from dad. Then the process of Meiosis II will begin to break up the chromatids. The DNA does not get copied again in meiosis II. It only goes through the process of splitting the DNA and creating 2 new cells. This will go like Mitosis where the tubulin will bind to each side of the sister chromatids for each chromosome. Then the separase enzyme will cut apart the sister chromatids, and then a copy of each chromosome will move to each new cell. This process starts out by replicating the chromosomes into chromatids, then it sorts them into new cells, and finally it splits the chromatids.

This process starts out by replicating the chromosomes into chromatids, then it sorts them into new cells, and finally it splits the chromatids. This leaves each of the final gametes with exactly half the genetic information in the form of just 1 copy of each chromosome. Some of them will be from mom and some of them will be from dad. This random shuffling of the genetic information gives variation to the species.

* I am not a doctor. This is not designed to be Medical Advice. Please refer to your doctor for Medical Decisions