5.1.2 Nucleic acid amplification

In the absence of mature and powerful single-molecule sequencing technology, the biggest problem in conducting single-cell research is the problem of substrate (nucleic acid) amplification, because amplification errors often lead to deviations in the final sequencing results, which makes us unable to get the target nucleic acid sequence. This problem is particularly prominent when performing DNA sequencing, because there is only one DNA molecule available for sequencing. The biggest problem with DNA sequencing is the coverage of sequencing. The amplification technology based on PCR technology can obtain high coverage, but it will bring about problems of uneven amplification and incorrect amplification. If error correction is to be performed and a single nucleotide variant is found, this requires additional statistical methods. For single-cell sequencing, error correction is more difficult because of the lack of good controls, and we simply don't know how many variations there will be between individual cells.

For RNA molecules, the biggest problem is how to ensure the initial (abundance) proportional relationship between these molecules during the amplification process. The first step in RNA amplification is to use reverse transcriptase (RT) to obtain complementary DNA (cDNA). This is the most critical step in single-cell sequencing. The efficiency of the RT reaction directly determines how much RNA can be sequenced in the cell. The RT enzyme originally came from mammalian cells infected with picornavirus. This enzyme is very efficient. Even if there is only one copy of viral RNA in the cell, it can synthesize full-length viral nucleic acid. Although this RT enzyme does not show too strong a continuous synthesis ability in vitro experiments (the probability of synthesizing a full-length product is less than 10%), but after optimization, its continuous synthesis ability can reach 90%. The RT after mutation can synthesize a longer cDNA product. This mutant RT enzyme is more suitable when the RNA concentration is not good.

Single-cell PCR technology allows these RNA-derived cDNA molecules to expand exponentially. Although PCR technology is used to construct sequencing libraries in many studies, we should also be aware that the low response efficiency of PCR for certain sequences (such as high GC content or stem-loop structure, etc.) will also be amplified exponentially. Therefore, most researchers will try to reduce the number of PCR reaction cycles as much as possible in order to reduce the errors caused by this aspect. However, because this kind of amplification error mainly originates from a specific sequence, and the expression level of the gene is ever-changing, it is difficult to estimate how big the error is. Although the transcription efficiency of some sequences is not too high, and it will generate shorter amplification products or delete some sequences, but the linear expansion based on in vitro transcription of cDNA into amplified RNA (aRNA) Increasing technology (Linear amplification) can still solve this kind of error caused by amplification to a certain extent. If the purpose of our research is to quantify RNA only, rather than studying indirect mutants, then the problem of generating shorter RNA transcripts is not too large. The serially diluted control transcribed RNA was sequenced and the sequencing results were analyzed by Poisson distribution. The results proved that the resolution of this aRNA amplification method can achieve quantitative analysis of 2 to 4 molecules, but Test results will also be affected by amplification and recovery efficiency.

One strategy for solving this amplification bias is to incorporate a specific sequence tag during the synthesis of the first strand of cDNA. Since we have a large number of molecular tags to choose from, every cDNA molecule derived from each RNA molecule can be tagged with a unique tag. During PCR amplification, the deviation will not affect these label molecules (unless the label molecules are lost), so there will be no problem of amplification deviation, and the number of label molecules can accurately reflect the number of original RNA molecules in the cell. However, this marking technology is still very complex and is still being optimized.

5.1.3 Dynamic range and cell number

It is currently estimated that in a typical mammalian cell, about 5,000 to 15,000 different genes are transcribed and expressed. If we think that the situation of each gene is different, then to determine the covariance of the transcriptome, the ideal state should be 10 to 30 times more than degrees of freedom. If the changes between these genes are non-linear and more complex, then the number of tests should be more. No one currently knows how much freedom a single-cell transcriptome has, but there will be at least thousands, which means that at least tens of thousands of cells need to be sequenced. Research work of this scale is already in progress, but only for a few specific target molecules, and the coverage of sequencing is very low. Therefore, when studying single-cell transcriptomes, if you want to obtain sufficient transcriptome coverage, how many cells need to be sequenced, this is also a very important topic.

Multiple studies have suggested that the genes with the highest cell expression have an average of about 3,000 to 5,000 transcripts. But by consulting the literature and our own experience in the laboratory, we found that within the cell, about 90% of the transcriptome products are less than 50 molecules. This raises the question: can such a low expression level determine the phenotype and function of cells? We all know that many genes have two states of "on" and "off", and the switching state of these genes is different in a group of cells, and there are many genes with low expression levels. It is impossible to discover at work. There are many very important factors in the complement of genes with less than 50 molecules of transcripts in these cells, such as transcription factors and signal transduction molecules.

5.2 Space issues

Fluorescence in situ hybridization (FISH) is also a technique for studying RNA molecules in cells. At present, FISH technology usually uses a variety of short-segment fluorescently labeled probes. These small molecule probes can freely enter the interior of tissues and cells and bind to target RNA fragments. Because FISH technology has a very large improvement in sensitivity, it is difficult to perform selective hybridization like a chip, and we do not know how much RNA is available for hybridization experiments after cell cross-linking. More importantly, these fluorescent molecules with a wide range of emission and a limited number of fluorescent molecules cannot be tested at the same time (ie, various RNA molecules are detected at the same time). According to reports, it is now possible to detect about 30 different mRNA (fluorescent probe) molecules in the cell at the same time. This is a considerable improvement compared to the previous FISH technology, but this is not enough.

Several research groups are developing in situ sequencing technology and combined labeling technology, but even if all RNA molecules in the cell are spatially equidistant, our existing microscopic resolution (a Standard 20 × 20 micron mammalian cell tissue sections can only resolve up to about 13,000 color points (pixels, each pixel represents an RNA) under a 250-nanometer optical microscope), and at least 10 cells 10,000 to 300,000 mRNA molecules. However, this study of the spatial distribution of RNA molecules in cells also helps us understand the function and phenotype of cells.

To be continued in Part XI…

Author's Bio: 

CD Genomics