CRISPR

Intro to CRISPR

What does CRISPR stand for? It stands for Clustered Regularly Interspaced Short Palindromic Repeats. This was discovered by scientists studying bacteria. Bacteria get infected by viruses called Bacteriophages. They attach to the bacteria and inject their viral DNA. Scientists noticed that the bacterial genome had a sequence of repeats. It showed a small sequence of DNA then a repeat followed by another small sequence and another repeat. They named this area of the bacteria genome the CRISPR region. This is the bacteria's immune system.

It works through type 1 CRISPR enzymes like CAS1 and CAS2, chopping up the bacteria DNA. Then a small sequence of the viral DNA called the protospacer is inserted into the CRISPR region of the bacteria genome. What happens next is very awesome. The bacteria transcribes the entire CRISPR region into one long messenger RNA called the pre-transcript. This gets chopped up so each DNA sequence that matches the virus' protospacer becomes the CRISPR RNA (crRNA) and the repeat becomes the trans activating RNA (tracrRNA). The repeat segments become the tracrRNA that will get loaded into an enzyme called CAS9. CAS9 stands for CRISPR Associated Protein 9. The CAS9 will recognize the tracrRNA and bind to it. The crRNA will then bind to the tracrRNA to make a complete guide RNA. The two RNA segments bind together by complementary base pair binding by hydrogen bonding. Once the CAS9 is loaded with the complete guide RNA, it will go and find any DNA that has the matching protospacer for the loaded guide. The guide is a complete complementary sequence of RNA that matches that protospacer in the Viral genome.

The actual guide sequence is about 20 nucleotides long. Once it finds the matching viral DNA, the CAS9 enzyme will cut that DNA. This destroys the viral DNA. I know what you are thinking. If the DNA sequence copied into the CRISPR region of the Bacteria matches the viral DNA, how does the CAS9 not cut up that bacteria's DNA too? The answer to that is the PAM sequence of the CAS9 enzyme. This stands for Protospacer Adjacent Motif. This is a small segment of nucleotides in the DNA that is recognized by the CAS9 enzyme which is N-G-G. The N stands for any nucleotide while the G stands for Guanine. This basically means the CAS9 enzyme not only needs to match the guide RNA sequence to the protospacer, but it also needs to match its PAM to the 2 Guanines near the protospacer. If both don't exist, the CAS9 will not cut. As you can guess, the CRISPR region of the bacteria genome would not contain the PAM sequence so the CAS9 would be selective for targeting the viral DNA, but not the bacteria's DNA.

Types of CRISPR

The CRISPR system was discovered in 2012 by two scientists by the names of Jennifer Doudna and Emmanuelle Charpentier. While studying the CRISPR immune systems of the bacteria Streptococcus Pyogenes, they discovered a system that could be adapted for human genome editing. CRISPR stands for Clustered Regularly InterSpaced Palindromic Repeats. It turned out this system was the immune system for the bacteria.

The actual CRISPR enzymes that are adapted for human genome editing are called CRISPR Associated (CAS) Enzymes. The first one discovered was CAS9. These enzymes contain nucleases which are domains that allow for DNA cutting. The CAS9 has 2 of these nuclease domains to cut both strands of DNA. The CAS9 enzyme is guided by a strand of RNA that matches to the desired sequence in the DNA. This becomes a search function for finding and cutting the DNA at a specific location. Since the discovery of the first CAS9 nuclease, scientists and companies have been out there searching the microbe world for new CRISPR systems and nuclease enzymes. We are going to look at them all.

CAS enzymes are broken down into two classes with Class 1 enzymes and Class 2 enzymes. These two classes are further broken down into Types and Subtypes for the CAS enzymes. This gets pretty complicated so I will give you the general differences. The Class 1 CAS enzymes are made up of multiple units called subunits. They have to work together to function. One such example is CAS3 which works with CAS8 and CAS11 to edit RNA. Class 2 CAS enzymes are one large structure that contains everything in one. This includes many of the current CAS enzymes like CAS9, CASX, CAS12, CAS13 and MAD7. All of these come from different microbes but work similarly.

All of them contain the RuvC nuclease domain to cut the DNA or RNA. The CAS9 is the only one I know that also contains the NHN nuclease domain as a second nuclease. CAS enzymes are further broken down into what their nuclease cuts. Some of them are designed to cut RNA, and they are being built for RNA editing like CAS13. Most are designed for DNA editing like CAS9, CAS12, MAD7 and CASX.

The major advance in CAS enzyme was the discovery of other CAS enzymes in other microbes. There has been an explosion of new CAS enzymes that we can use to develop editing tools. CAS9 was originally discovered in Streptococcus Pyogenes. Not long after Cpf1 (now known as CAS12) was discovered in 2015. This quickly grew in popularity. Then came the discovery of CASX in E-coli from Berkeley which is now used by Scribe Therapeutics. MAD7 was discovered in a bacteria in Madagascar which gave it the name MAD7. It is now used by Inscripta, and they license it to many companies. This works very much like CAS12 in its function. Both CAS14 and CAS-Phi were discovered by Mammoth Bioscience and are being developed by them. These are much smaller versions of the CAS enzymes. I call them the CAS mini. Both of them are about half the size of CAS9. CAS13 is the first RNA editor being developed by the Zhang lab. It was discovered from a bacteria called Leptotrichia wadei. The last one is CAS11 which is part of the CAS3 system. It is being developed for applications in gene knockout and possibly RNA editing.

CAS9

The CAS9 system comes with 2 components. The first is the guide RNA and the second is the CAS9 enzyme itself. The CAS9 is made up of a few key domains with the two nucleases that are the HNH and RuvC domains. Both the nucleases will activate and make a double stranded break at the exact same location in the DNA on both strands. Some CAS9 enzymes have one of these 2 nucleases mutated to be inactive. These are called a Nickase as they will only cut 1 strand of DNA. The next key domain of the CAS9 is the PAM sequence. This matches a specific sequence of the DNA. Each CAS enzyme has a different PAM site which it recognizes. The CAS9 PAM will recognize the sequence N-G-G. This stands for Anything-Guanine-Guanine.

The CAS9 nucleases will cut about 3 to 4 bases away from its PAM sequence. This is important as this short distance makes it difficult for CAS9 to do multiple edits. Once an edit occurs, the site for the guide and PAM matching will be disrupted by the edit.

The last part of the CAS9 system is the guide RNA (gRNA). If you recall, the wild type guides were made up of 2 segments in the crRNA and the tracrRNA. They would come together to form the complete guide RNA. This would be too complex for use in gene editing so they use a single guide RNA which basically links them together. This is called a single guide RNA (sgRNA) and just basically makes the 2 different guides into one guide. They do this with a little linker loop at the end of the 2 guides to attach them.

The sgRNA is installed in the delivery vector as RNA, but the CAS9 is often installed into the vector as a messenger RNA. Some of the newer RNP packaging will include the CAS9 as a full enzyme. When it is used as a mRNA, it is about 4,100 bases in length and encodes an enzyme about 1,368 amino acids long. This is a very big package for most traditional vectors. Most companies have moved toward LNP technology for delivery. The CAS9 will load the guide RNA when inside the cell. It enters the nucleus where it will bind to the DNA. The CAS9 opens the DNA and runs the 20 base guide along the DNA until it finds its match. Once the guide and PAM have a match, the nucleases will cut the DNA.

This system can be used for Disruptions, Deletions or Insertions of genes. The disruption is done by causing Double Stranded Breaks at the target site to cause mutations that prevent reading of that gene. This effectively knocks out the gene. This works really well in-vivo as we have some very strong clinical data so far. The second use is gene deletion. This can be created using 2 CAS9 enzymes along with 2 guides to attempt to cut and remove an entire section of the DNA. I don't think this has ever been attempted in-vivo, and I don't think I would want it to be. The last use is gene insertion using homology directed repair using a template strand that gets included with the CAS9 and guide RNA at the time of delivery. The CAS9 does the DNA cutting and the template strand guides the repair.

The biggest concern with the CAS9 system is the Double Stranded Breaks. When DNA is cut like this, it can repair in unexpected ways. You can end up with insertions or deletions (Indels) of bases. The other big concern is chromosomal rearrangements when used for multiple edits at once. Typically they have to do an edit and run the cell through the cell cycle to ensure the first edit was good before they can do the next.

CAS12

The CAS12 system comes with 2 components. The first is the guide RNA and the second is the CAS12 enzyme itself. The CAS12 is made up of one key domain with a single cutting nuclease with RuvC. When the CAS12 enzyme cuts, it makes a staggered break in the DNA by about 5 nucleotides between cut sites. This is much preferred for DNA repair. It also cuts much further away from its PAM sequence. The next key domain of the CAS12 is the PAM sequence. This matches a specific sequence of the DNA. Each CAS enzyme has a different PAM site which it recognizes. The CAS12 PAM will recognize the sequence T-T-T-V. This stands for Thymine-Thymine-Thymine-Anything not another T. The CAS12 nucleases will cut about 18 to 23 bases away from its PAM sequence. This is important as this longer distance makes it good for CAS12 to do multiple edits. This CAS12 enzyme can sit at the PAM and make multiple edits without disrupting its own activity.

The last part of the CAS12 system is the guide RNA. The CAS12 enzyme does not require the tracrRNA. It only uses the crRNA which makes it much smaller for the guide RNA. The crRNA used by the CAS12 enzyme will only be about 42 nucleotides long. The guide RNA is installed in the delivery vector as RNA, but the CAS12 is often installed into the vector as a messenger RNA. Some of the newer RNP packaging will include the CAS12 as a full enzyme. When it is used as a mRNA, it is about 3,800 bases in length and encodes an enzyme about 1,270 amino acids long. This is still a very big package for most traditional vectors. Most companies have moved toward LNP technology for delivery.

The CAS12 will load the guide RNA when inside the cell. It enters the nucleus where it will bind to the DNA. The CAS9 opens the DNA and runs the 20 base guide along the DNA until it finds its match. Once the guide and PAM have a match, the nuclease will cut both DNA strands. Due to the fact that CAS12 makes a staggered cut of about 5 bases, it makes it a bit easier for the repair of the DNA without as many insertions and deletions (Indels).

This system can be used for Disruptions, Deletions or Insertions of genes. The disruption is done by causing Double Stranded Breaks at the target site to cause mutations that prevent reading of that gene. The second use is gene deletion. This can be created using 2 CAS12 enzymes along with 2 guides to attempt to cut and remove an entire section of the DNA. The last use is gene insertion using homology directed repair using a template strand that gets included with the CAS12 and guide RNA at the time of delivery. The CAS12 does the DNA cutting and the template strand guides the repair. One other use that CAS12 has found is in developing diagnostics. Companies use CAS12 to do diagnostics. This uses the CAS and guide as a search function to find DNA sequences. Then it cuts a reporter molecule which gives a readout.

The biggest concern with the CAS12 system is the Double Stranded Breaks. When DNA is cut like this, it can repair in unexpected ways. You can end up with insertions or deletions (Indels) of bases. The other big concern is chromosomal rearrangements when used for multiple edits at once. Typically they have to do an edit and run the cell through the cell cycle to ensure the first edit was good before they can do the next.

MAD7

The MAD7 enzyme is a variant of the CAS12 which was found in bacteria from the Island of Madagascar. It shares a large amount of homology to the original CAS12 from which it most likely derived. The MAD7 system comes with 2 main components with the guide RNA and the Nuclease. The MAD7 has only 1 nuclease with the RuvC just like with the CAS12 enzyme. It even makes the staggered double stranded cut like CAS12. The next key domain of the MAD7 is the PAM sequence. This matches a specific sequence of the DNA. Each CAS enzyme has a different PAM site which it recognizes. The MAD7 PAM will recognize the sequence T-T-T-V. This stands for Thymine-Thymine-Thymine-Anything not another T. It also has the ability to recognize C-T-T-V which stands for Cytosine-Thymine-Thymine-Anything that is not another T. The early data from Inscripta shows both these PAM sequences work equally as well. The MAD7 nucleases will cut about 20 to 50 bases away from its PAM sequence. This is important as this longer distance makes it good for MAD7 to do multiple edits. This MAD7 enzyme can sit at the PAM and make multiple edits without disrupting its own activity. The last part of the MAD7 system is the guide RNA. The MAD7 enzyme does not require the tracrRNA. It only uses the crRNA which makes it much smaller for the guide RNA. This is the same as for the CAS12 enzyme. The MAD7 is just a few amino acids smaller than the CAS12 enzyme with about 1263 amino acids. This is about 3,790 bases for a messenger RNA transcript.

The MAD7 system is owned and Licensed by Inscripta. They let anyone use the technology for free for research. They license it for a small royalty for commercial use. I know a few companies that license it from them. It works pretty much like CAS12 with some minor differences.

CASX

The CASX enzyme was discovered by Berkeley like many of the CAS enzymes we covered in the past. They do a ton of research into this space. What I know about CASX comes from papers I read from them and Scribe Biosciences which is deploying this CASX enzyme.

This CRISPR enzyme has some attributes of the CAS9 and some of the CAS12. It uses a single guide RNA (sgRNA). This includes the crRNA and a tracrRNa combined into a single guide. This part is like CAS9. The CASX enzyme has the RuvC domain which is its nuclease. This makes a staggered double stranded break like all the other single RuvC nucleases. This part is like CAS12. The other domain of the CASX is its PAM. This PAM looks for the sequence T-T-C-N which means Thymine-Thymine-Cytosine-Anything. This is a unique PAM sequence from other editors. The CASX will cut 1 strand about 12-14 nucleotides from the PAM and the other strand about 22-25 nucleotides from the PAM. This leaves a slightly bigger potential staggered cut around 10 bases long. The size of the CASX is thought to be under 1,000 amino acids in length. This makes it smaller than other CAS enzymes. I have seen some data for it used in cells showing it could do editing.

So far they have not developed any clinical programs with this editing technology, but I suspect it won't be too long before we start to hear more about CASX or some form of base or writer system built off CASX.

CAS13

This was developed by the Zhang Lab. Most of my information on this CAS13 enzyme comes from the papers published from the Lab. This system looks and operates much like CAS9, but it works on single stranded RNA. It uses a single stranded guide RNA. The data I have seen shows the guide is typically in the 30 bases to 50 bases range. It includes an RNase for cutting of the RNA. This is programmable as cutting RNA is not always the best option. I didn't see any mention of a PAM site or any PAM sequence. I am not sure if it was just not disclosed or if it doesn't use any. This is being developed with several functions in mind. The first and obvious use is to target and destroy mRNA using this system to knock down unwanted proteins. This would compete with RNAi. I don't think that is what they want to do as RNAi works really well. It would also compete with gene knockout from editors.

The first good use is to develop a version of base editing from the CAS13 enzyme so it can be used to alter pathogenic messenger RNA's before they can be translated by the Ribosome. This allows for correction without permanent gene editing. This uses ADAR to flip an A in the RNA to an I which gets read as a G. This system is called RNA editing for programmable A to I replacement (REPAIR). They are developing another system called RNA Editing for Specific C to U Editing (RESCUE). This takes the ADAR and modifies it for application of C to U editing.

The other big use is to use the CAS13 as a diagnostic. This can be used as a test for viral RNA detection. The CAS13 and guide work like a search function. When the correct sequence is found, the RNase will cut. They include a reporter molecule which would also get cut by the RNase and have a fluorescence to give a read out. This is being deployed by Sherlock Bioscience as part of their SHERLOCK system. This stands for Specific High-sensitivity Enzymatic Reporter unLOCKing (SHERLOCK). This could be a very big use for the CAS13. So far, it is the only working RNA editor I have seen data on.

CAS14 and Phi

Both of these CAS enzymes are being developed by Mammoth Bioscience. I will focus mainly on CAS14 as I have some published data on this enzyme. I can also add Phi later if I find some good papers on it. I call these the CAS mini enzymes. Their main claim to fame is they are about half the size of the original CAS enzymes. CAS9 and CAS12 were about 1,300 amino acids in length. The CAS14 is just 500 to 700 amino acids long. That is less than half the size of the first generation CAS enzymes.

The CAS14 enzyme uses the full single guide RNA which consist of the crRNA and a tracrRNA. The CAS enzyme works on single stranded DNA. It has a RuvC nuclease domain that cuts single stranded DNA. The one thing about CAS14 is that it seems to have no PAM site. After looking at the testing, it seems to work independent of any PAM sequence. I am sure that I will need more clarification later. CAS14 is being deployed as a diagnostic tool. The CAS14 and guide RNA works like a search function. This will find the desired location in the DNA. The RuvC nuclease will cut the strand at that location. They include a reporter molecule that will also get cut by the RuvC. The reporter has fluorescence which will light up and highlight the sequence that has been found. This is used as a diagnostic tool for infectious disease and other applications.

CAS-Phi was discovered in Bacteriophages. They use it as a competitive system to attack and cut up the DNA of other bacteriophages that might be competing for the same bacteria. This has been adopted for a potential DNA editing tool. The CAS-phi is much smaller than the original CAS9 and 12 enzymes, but bigger than CAS14. There is no mention of a PAM sequence for CAS-phi. It seems like it might have a PAM, but it hasn't been fully clarified yet. It does a staggered cut with about 8-12 bases apart like other CAS enzymes. That is about all I could find so far on CAS14 and CAS-phi. I am sure we will get more data from these programs as they develop.

Base Editing

Base Editing takes the CAS9 system and takes it to the next level for targeted DNA editing. This takes the enzyme deaminase and tethers it to the CAS9 system. This CAS9 uses a Nickase which has one of the nucleases deactivated. This actually creates 2 different new systems with Adenosine Based Editors (ABE) and Cytosine Based Editors (CBE). Both of these work by removing the amino group from either the Adenine base or the Cytosine base. By removing the amino group from these Adenine or Cytosine bases, it facilitates their transition to Guanine or Thymine. The base editing technology can only do transitions. It can not do a Transversion. It can't take a 2 ringed base and turn it into a 1 ringed base or vice versa. This ability to change a single base at a time requires no Double Stranded Breaks which is the biggest benefit of base editing. Double stranded breaks can cause mutations in the DNA with Indels or trigger p53 and apoptosis which leads to lower editing efficiency.

This system starts by inserting the CAS9 which has the Deaminase tethered to it. This will load the guide RNA. The CAS9 along with the guide RNA act like a search and find function to find the right location in the DNA to do base editing. Then the Deaminase works within a 4 - 5 base window where it can edit. The Deaminase will make the base modification from an A to a G for a C to a T. After, there will be a miss match base where the edit was made. The nickase will cut the unedited strand which will trigger DNA repair. This will allow for the correction of the miss match using the edited strand as a template.

One of the issues with base editing is the enzyme Uracil DNA Glycosylase (UNG). This enzyme is designed to find and fix deamination events in the DNA that result in Thymine being turned into Uracil. This would undo the effects of the CBE editor. To fix this issue the CBE editor includes an inhibitor of UNG. This is called Uracil Glycosylase inhibitor (UGI). This blocks the function of the UNG for a while to allow base editing to occur. The Adenine Based Editing actually turns the base into an Inosine which gets read as a Guanine. This is why the ABE base editor doesn't need the UGI enzyme to block the repair machinery of the cell.

The biggest risk to Base Editing is what we call bystander edits. Since the window for the deaminase is 4 to 5 bases wide, what happens if more than 1 A or C shows up in the window? Which one would the deaminase change? The answer to this question is it would be, it is random. There is no way to control which base would get changed if more than 1 shows up in the window. Shifting the window can sometimes fix this issue. Other times, the solution is to test the bystander edits to ensure that change would be benign. The technology of base editing was developed from the Dr Liu lab.

It is currently being developed by Beam Therapeutics and Verve Therapeutics. This technology offers a very safe alternative to Double Stranded breaks which could address a large portion of the thousands of diseases which stem from a single base mutation.

Prime Editing

Prime Editing is a 2nd generation editor as it takes the CAS9 system and tethers to it a Reverse Transcriptase that can take a RNA template and copy it into the DNA. This concept has been used by Retroviruses for millions of years. To make this work the guide RNA gets bigger. The original single guide RNA (sgRNA) gets an additional segment which serves 2 functions. First it has a sequence that binds to the DNA after its cut to stabilize it. Then it has a section where it acts as a template for the Reverse Transcriptase. This new bigger guide RNA is called the pegRNA. This allows the Reverse Transcriptase to copy the RNA template into the DNA by adding new bases to the 3' end.

In the single Prime editing, the guide RNA also includes a nickase which cuts the other strand to initiate homology directed repair after the flap is created. The twin Prime Editing uses 2 CAS9 systems working on both strands of the DNA. Each of these will create a single stranded break and create a new flap. The space between them can be just a dozen or so bases difference. The Reverse Transcriptase only has strong efficiency at around 10 to 30 bases from the PAM site. This system works well to allow editing of both strands of DNA while not technically creating a double stranded break.

There are some concerns with the Twin Prime system. The first is the Reverse Transcriptase makes mistakes. It reads the template from the guide RNA, but it can insert the wrong base. This occurs about 1 in 100,000 bases. The RT does not have proofreading functions like a DNA polymerase. 1 in 100,000 doesn't sound like a lot of mistakes, but when editing millions of cells in a human, it can be hundreds of errors. This system also suppresses the natural DNA mismatch repair systems (MMR). I have seen people be very dismissive of the risk of introduction of mutations into the genes. I think it is very concerning that they would expose cells to potentially thousands of mutations without defining the risk.

The other concern with Prime Editing is it can still cause Insertions and Deletions (Indels) at the site of the flap repair. This might be lower than trying to edit with a Double Stranded break, but it is still a risk. The Twin Prime System has a lot of potential to change the way we can edit DNA without double stranded breaks, but it introduces new risks that have yet not been quantified. The Prime Editing Technology was developed by the Dr Liu lab and is being deployed by Prime Medicines.

PASTE Editing

PASTE stands for Programmable Addition via Site-specific Targeting Elements. This is being developed by MIT and there is not a ton of information on it at this point. I will go over the basics. This starts with the CAS9 system and adds to it a tethered Reverse Transcriptase and an Integrase. An integrase is an enzyme that homes to specific sequences in the DNA where it will insert itself. These are called landing sites. The concept is the CAS9 and Reverse Transcriptase work just like Prime Editing. It uses a pegRNA to find the site in the DNA for desired editing. The Reverse Transcriptase encodes a landing site into the DNA at that location. Then the integrase will use that landing site to insert the new gene or genetic information. This is a very complex system which I am sure will need extensive testing to work out all the bugs.

The one major drawback of PASTE is it can only insert. It can not change or edit the DNA beyond the creation of the landing site. This means the location of the target has to be very specific to ensure it does not cause any problems. You would not want to insert this just anywhere as it does not replace any DNA. This is just a site targeting insertion of a genetic payload. They are currently working on several integrases with this technology. I saw mention of Bxb1 and TP901 in the papers. There are a lot of different integrases and they all have different landing sites for the sequence of bases they recognize. I am sure they will be able to develop several of them for this purpose in PASTE.

* I am not a doctor. This is not designed to be Medical Advice. Please refer to your doctor for Medical Decisions