From Human to Data: How DNA is Sequenced
Spit. Spit. Spit. It’s harder than you expect to fill the little sample tube with enough saliva to reach the indicated line; your mouth goes dry twice before you’re done. But once the mildly disgusting task is finished, you can stick the tube back in the return envelope and move on to something else. A lab somewhere will do the hard work of sequencing your DNA and analyzing the data; all you have to do is wait a few weeks for the results.
But for your sample, the journey is only just beginning.
A quick trip through the postal system later, the envelope winds up in the hands of a bored processing technician—the eighty-third sample they’ve worked on today. The tech pulls out the paperwork and spends the next ten minutes typing all the information into a database, then a few more minutes double and triple-checking everything to make sure they didn’t make any mistakes. This is the only information that’ll identify the sample as it makes its way through the system; whatever documentation you included with your spit kit goes in the trash afterwards. Finally, they print out a barcode sticker, slap it on the side of the tube, and stick the tube in the blue wire test-tube holder labeled “SEQ” for “sequencing.” Then they shut down their computer and go eat lunch.
#
Early the next morning, another tech—this one from the extraction team—grabs the entire test tube rack and carries it back to the lab, where another tech has already pulled up the spreadsheet listing the day’s samples. Each tube has to be scanned, its entry on the database updated from “received” to “processing,” and moved to the right spot in a new, labeled test-tube rack (Row F, column eighteen, in this case). From now on, its position on the rack will correspond to its eventual placement on a 96 well plate—an index-card-sized plastic container with just shy of a hundred individual wells, each capable of holding a few scant drops of fluid. It’s a tedious task, but there’s a reason for the extensive care. This is the last time your sample will be in a readily-identifiable tube, and the last chance to catch any errors in the intake process.
One by one, a tech opens each tube and sucks out some of the blue saliva-preservative mix with a micropipette—sort of a space-age version of a syringe or plastic dropper, capable of transferring incredibly precise volumes with a single press of the plunger. Your sample’s days of having its own individual tube are over. From now on, it’ll live in a single tiny compartment (in your case, F18), a 96-well plate, along with the rest of today’s samples. Loading the plate is a two-person job—not because it’s difficult, but because it’s important to have a second set of eyes to catch mistakes. One tech opens the sample tube and calls out where it’s going; the other does the pipetting. Then five hundred milliliters of the pre-purchased mixture is added to each well, and the plate is sealed and placed on a heating block. Time, warmth, and the chemicals the tech just added will break the cheek cells from your sample apart, freeing the strands of DNA from the confines of the nucleus and turning neatly organized cells into a sort of biological soup.
But while releasing the DNA is straightforward, isolating it is anything but. When done by hand, it’s a long and fiddly process that takes several hours, where the sample is passed from tube to tube as everything that’s not DNA is slowly filtered away. And human error can occur at any one of the endless steps, which is why clinical labs prefer to use robots whenever possible. In this particular facility, the machine in question is a white block the size of a washing machine called a Chemagic 360. The techs place the plate of samples onto the machine’s turntable, along with six other 96-well plates—one filled with a suspension of tiny magnetic beads, one filled with a special (and generally proprietary) solution known as elution buffer, and four that are completely empty—and a rack of 96 sterile plastic tips for the machine’s probes. The humans dutifully set up their robot, double-check that its supplies of buffers and other solutions are full, and press the “start” button on the controller software.
And then, because the machine will take more than an hour to complete its extraction, they’ll go eat lunch.
#
Meanwhile, the chemagic sets to work, using a set of 96 tiny magnetic rods to process all samples at once. Its first move is to use those rods to transfer magnetic beads from their initial plate to the sample, where—when a magnetic field is activated—they stick to DNA molecules. After a thorough mixing, the machine uses the same magnetic rods to lift the bead-DNA conglomerates out of the original plate and into a new one. Once there, the magnets will shut down and leave everything floating freely in a saline solution, where careful stirring will wash the DNA strands clean. After four repetitions of the process, so the manual claims, the DNA is squeaky clean. The final stop is the plate full of elution buffers—specially-designed mixtures to keep the DNA safe and stable for the rest of its journey—where the magnetic rods are finally removed.
And when the techs return with full stomachs, they’ll be greeted by a neat 96-well plate of purified DNA.
Well, theoretically purified. Like usual, they can already tell that some of the magnetic beads were left behind, forming little brown drifts at the bottom of each well. The techs will have to exercise even more care than usual as they transfer the DNA to yet another 96-well plate. If they touch the bead deposits with the tip of their pipette, they’ll stir everything back up, and have to wait for the beads to settle again before attempting another transfer. The good news is that when they’re done, they’re…well, they’re not done with the process, but this plate will be the DNA’s permanent home. Its layout will live on in the computerized database; and any time someone needs to work with your sample, they’ll take a tiny volume from here.
The afternoon is filled with quality assurance tests. DNA extraction is never perfect, and the final solution will almost always have small traces of contaminants—strands of RNA that were magnetized along with the DNA, individual proteins still stuck to the strands, even a stubborn magnetic bead or two. The techs will have to measure the purity; if it’s not up to snuff, they’ll have to go back to the original sample tube and hope there’s enough left for a second extraction.
To do so, they transfer tiny volumes of extracted DNA to a new plate—sigh—add a number of controls and standards, and stick it in a new machine, this one called a spectrophotometer. When they push the start button, the smaller-but-still-boxy device will measure how much light passes through the sample in each well, and how much is absorbed—and more importantly, what wavelengths pass through or are absorbed. DNA molecules absorb slightly different types of light than things like RNA do, and comparing the ratio of one wavelength to another gives the tech a good idea of the sample’s purity. Comparing the absorbance to that of the standards—solutions with known concentrations of DNA—also lets the techs determine the sample’s concentration.
Two birds, one extremely expensive stone.
Once measured, the techs are faced with the slowest and most exacting of the day’s tasks—going through the sample plate well-by-well, pipetting just enough of the sample and just enough buffer to a second plate so that each well has the same amount of DNA and total liquid. After breathing a sigh of relief, they seal both plates and stick them in the refrigerator. The original plate will be frozen as soon as they can be confident they won’t need any more DNA from it; the new plate will go to the sequencing lab.
And then everyone goes home for dinner, because they’ve probably already been working for more than eight hours to process a single plateful of samples.
No matter how effective the DNA extraction process is, there simply won’t be enough DNA in any one sample for effective sequencing. On the second day, a new set of sequencing techs will have to use a technique known as polymerase chain reaction (PCR) to copy the original strands many hundreds of times. Large quantities of loose nucleotides—the As, Ts, Gs, and Cs that make up the genetic code—are mixed with each sample, along with an enzyme called DNA polymerase. Samples are then heated until the DNA’s characteristic double helix unzips and separates into two corresponding strands. They’re then allowed to cool to a temperature where DNA polymerase can work. The enzymes work their way down each separate strand of DNA, grabbing the matching nucleotide from the pile of loose molecules, sticking it to the template strand, and connecting it to the last nucleotide it added. Eventually, the enzymes detach, leaving complete double helixes in their wake.
And then the sample is heated again. The new helixes fall apart just like the originals did; DNA polymerase builds the matching strands, and the process begins again. And again, and again—PCR typically involves thirty to forty cycles of heating and cooling. But by that point, hundreds of millions—if not billions—of copies of the original DNA have been made.
Of course, this means that the techs will have to go back to the spectrophotometer to measure concentration and purity a second time. That gets plugged into the database too.
The next step is to label each sample. Not with a barcode or database entry this time, but with a “tag”— short sequences of meaningless DNA. Each sample gets its own unique label—yours, for instance, is labeled “ATTGATGCAAT.” Once everything has been sequenced, once the complexity of the genome converted into a string of A’s and T’s and C’s and G’s, the presence of that little string will let the techs tell one sample from another. Everything that ends “ATTGATGCAAT” came from you.
It’s a surprisingly straightforward process. The tech grabs a new plate and mixes a few microliters of DNA sample, a few microliters of a DNA tag (labs generally buy ready-made plates of DNA tags, with a full 96 different sequences already sorted into each well), and a few microliters of a third solution containing an enzyme called DNA ligase. Under the right conditions, the little proteins will find two strands of DNA with matching ends and, essentially, glue them together into a single longer strand.
At this point, the techs have long, lovely strands of carefully-labeled DNA. So, naturally, the next step is to chop them up into little pieces. The process can be done by simply shaking or stirring the samples hard enough, but this particular lab prefers to use a more biochemical solution. Each well of labeled sample gets a squirt of solution containing a new type of enzyme—restriction enzymes. The new proteins will seek out specific DNA sequences—GGGGTTTT, perhaps—and snip the strand clean in half. By the time the enzymes are done, those lovely double helixes have been reduced to tens of thousands of short, single-stranded segments of DNA. The techs methodically go through the plate and transfer each sample to a single well on a big plastic container called a sequencing cell, which comes pre-loaded with loaded with all of the many solutions the sequencer will need. After all, this lab plans to take advantage of the (relatively) new technology of massively parallel sequencing.
Sequencing cells are slotted into the sleek white NovaSeq sequencers, reserves of saline washes are topped off, and “start sequencing” buttons are clicked. The machines—each of which costs hundreds of thousands of dollars—begin to hum and whir; the humans disperse to finish updating the databases and gather their belongings to head home.
Meanwhile, inside the sequencer, untold billions of DNA fragments wash over a specially-made surface where one end sticks fast, leaving the other to dangle helplessly—and ready to accept new additions. The machine then adds a flood of fluorescently-labeled “A” nucleotides (adenine). Because DNA synthesis must begin at one end of the strand and continue in a single direction, without skipping any bases, only fragments whose first base is “T” (thymine) will manage to grab one of the new fluorescent nucleotides. Ultra-sensitive cameras take note of each spot where a fluorescent molecule was just added, and the excess is washed away. Then the process repeats—As, Ts, Gs, Cs, As, Ts, Gs, Cs, and on and on until each DNA fragment is complete, and the machine knows the exact order of the bases attaching to each of the fragments.
That data, in and of itself, isn’t very useful. Knowing that the mix contained fragments ATGCTAGCAT, CGATATGCAT, ATCGTGCAGC, and CCGTAGCTAA is nice, but there’s nothing about those sequences to tell us what order they go in, much less which sample they originally belonged to.
Except—restriction enzymes generally don’t make nice, neat sequences like those above, and they certainly don’t cut every copy of every strand in the same way. In reality, each sample of DNA has been chopped up a million different ways, and fragments of the same genome will overlap with one another. With the guidance of a human programmer, a computer can take those fragments, find those overlapping sequences, and connect piece A to piece B to piece C. Do this enough times—and there are billions of fragments to work with—and eventually it’s possible to reconstruct the entirety of the original, un-fragmented DNA strand.
Including the unique label.
And there, when the progress bar is full and the program has run its course, is your complete genetic sequence. The answers to all your genetic questions are just a few control-F’s away.
References
“Collecting a Saliva Sample.” Ancestory.com. https://support.ancestry.com/s/article/Activating-an-AncestryDNA-Kit?language=en_US
“chemagic™ DNA Blood200 kit Instructions for use.” PerkinElmer. https://resources.perkinelmer.com/lab-solutions/resources/docs/IFU_3095-0030_chemagic_DNA_Blood200_Kit_English_13908624-5.pdf?_ga=2.235070145.1585621881.159690042-1128693930.1620991572
“Massive Parallel Sequencing.” Encyclopedia.pub, Sep 29, 2022. https://encyclopedia.pub/entry/27973#:~:text=Massive%20parallel%20sequencing%20or%20massively,)%20or%20second%2Dgeneration%20sequencing.
“What are Magnetic Beads and How Do They Work for Isolation of Biomolecules?” MagBio.com, Feb 22, 2022. https://www.magbiogenomics.com/latest/post/what-are-magnetic-beads-and-how-do-they-work-for-isolation-of-biomolecules#:~:text=How%20does%20magnetic%20bead%20DNA,the%20magnetic%20field%20is%20removed.
“Polymerase Chain Reaction (PCR) Fact Sheet.” National Human Genome Research Institute. https://www.genome.gov/about-genomics/fact-sheets/Polymerase-Chain-Reaction-Fact-Sheet
Personal Experience: Worked as a DNA extraction technician, June 2019-May 2020.
“DNA and RNA Labeling.” Revvity.com. https://www.perkinelmer.com/lab-products-and-services/application-support-knowledgebase/radiometric/dna-rna-labeling.html#:~:text=Top-,DNA%20and%20RNA%20labeling%20techniques,the%203’%20end%20is%20labeled.
Shokralla, S., Porter, T., Gibson, J. et al. “Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform.” Sci Rep 5, 9687 (2015). https://doi.org/10.1038/srep09687