Inside extremophiles: using computers in DNA sequencing

Kathryn Fromson Science Correspondent

Todd Lowe ’92 spoke to students and faculty for a Computer Science and Biology Seminar on Friday, March 7, in Thompson Biology. Lowe, currently heading a research lab at UC-Santa Cruz, discussed his work and the connections between biology and computer science in “Decoding Archae Genomes: Using Computation Analysis and DNA Microarrays to Understand Life in the Extreme.”

We all learned the five kingdoms of life in school: animals, plants, fungi, protista and monera (bacteria). But what we were not taught is that there is yet another type of organism, called archae. For years, archae have been considered part of the bacterial kingdom. This seemed to fit, as both are prokaryotic – that is, their DNA is not contained in a membrane-defined cell nucleus. All the other kingdoms, protists included, are eukaryotic: their nuclei are bound by a membrane and all their other organelles are also compartmentalized.

In the late 1970s, researchers began comparing the sequences of small strands of RNA from bacteria, archae and eukaryotes. They found that archae’s sequences were as different from bacteria as they were from eukaryotes, suggesting that perhaps archae are not truly a subset of bacteria.

Both the archae and bacteria groups contain organisms known as extremophiles: those that thrive in environmental extremes such as high or low temperatures, high or low pH, high pressure, lack of nutrients, presence of toxins, extremely salty conditions or a low availability of water. Any student who has studied or performed polymerase chain reactions to replicate large quantities of DNA has heard of Taq., the bacteria whose maximal growth is at 95 degrees Celsius (the human body temperature is 37 degrees Celsius). Taq. is an example of a hyperthermophile, an extremophile that loves high temperatures.

There are actually more species of bacterial and archeal mesophiles, organisms that prefer standard living conditions, than extremophiles, but much research is centered on extremophiles. Scientists are interested in them for commercial uses – to detoxify extremely toxic environments, for energy production, as antifreeze proteins to protect frozen organs and as the substances in detergents that break up stains – but also for what they add to our knowledge of the origins of life on earth and the possibilities of life on other planets. Given that these organisms can survive and even prefer such extreme surroundings, Lowe said they provide strong evidence that “life can live just about anywhere.”

Lowe became interested in extremophiles while working with the ribosomal RNA of bakers’ yeast. There were many sites on the rRNA where proteins had been modified after they were transcribed. These sites, however, did not appear to have a common regulatory sequence associated with them that would code for the modifying agent. What they did have in common were relatively small sequences (four to seven base pairs) that act as flags for snoRNAs, which associate with those sequences and guide over the modification machinery.

Lowe examined many species of extremophiles and mesophiles and found that maximum growth temperature decreases as do the number of protein modifications performed by the snoRNAs. He also found that in hyperthermophiles, the snoRNAs were responsible for modifying transfer RNAs as well as proteins. Hyperthermophiles must have highly thermostable proteins and a thermostable transcription and translation process to survive in their difficult environments, and that stability is supported by the modifications performed by the snoRNA.

All this research was facilitated by the use of computer programs. Traditional computational programs could not find the tiny flag sequences that call for snoRNAs because the programs were written to look for protein-coding sequences, which include specific start and stop sequences. Lowe wrote a new program that uses probability rules that guess how often these sequences might occur to find them. The program was highly successful and found about two dozen new genes, all of which had been previously missed because researchers had been ignoring non-coding regions and looking only at regions that code for proteins.

The next question Lowe asked was, “How consistent are the snoRNAs between distant and closely related organisms?” Here again, he used a combination of genomic research methods and computer science. A microarray is a way of exploring gene function, and it is ideal for archae research because most archae genomes have not been catalogued and contain few recognizable homologus with bacteria or eukaryotes.

The basic array procedure uses florescent dyes to label the messenger RNA of either two different organisms or an organism grown two different ways and then places that mRNA on an array of genes, one gene to each dot on the matrix. The intensity of the flourescent dye at each dot shows whether the gene in that spot has been activated or repressed and allows for comparisons between the functions of the two mRNAs.

Using the microarrays in conjunction with computer programs has allowed Lowe to explore the previously unknown archae genomes in depth, but microarrays are a difficult tool to use. Lowe has been working on controlling the quality of the arrays and the data they produce. First, he wants to establish standard algorithms for using microarrays so that different labs can replicate each other’s experiments.

He also wants to decrease the variance between duplicate spots on the arrays and find a way to know the error bars on each pixel of the array instead of merely on the data as a whole. These changes should allow researchers using microarrays to “lower the noise” and pick up subtler differences between genomes that arrays are currently unable to detect.

Lowe also discussed his lab’s long-term goals. As well as a standard procedure for microarrays, he would like to develop other shared resources for functional genomics such as a common data repository, computational tools for integrating sequential and array analysis and a cross-species genome viewer that displays genomes graphically instead of in a huge text file.

These would create a framework with which to integrate information gained from comparing DNA and RNA sequences to improve the prediction of gene function and allow researchers to work together more effectively. He would also like to create full-genome arrays for an additional four to six archeal species.

Biologists and computer scientists must work closely together in Lowe’s lab. He requires his biology students to not be afraid of computer programs and his computer science students to understand the biological relevance of his experiments. At Williams, he majored in biology and was just two courses short of a computer science double major. “Don’t let people tell you it’s either this track or that track,” he said. “Make your own track.”

Those interested in seeing the genome browser Lowe’s lab has created as well as additional aspects of the project should go to: http://lowelab-browser.ucsc.edu/goldenPath/help/hgTracksHelp.html.