Back to Home
Home >> Genomics and Bioinformatics >> Genome Sequence Compilation
Back to Home

Genome Sequence Compilation-Genome sequencing projects necessitated the development of high throughput technologies that generate data at a very fast pace. This has brought about the recruitment of computers to manage this flood of information; this has given birth to a new discipline called bioinformatics.

Bioinformatics deals with storage, analysis, interpretation and utilization of the information about biological systems. For example, it includes activities like compiling genome sequences, identification of genes, assigning functions to the identified genes, preparation of databases, etc.

In order to ensure that the nucleotide sequence of a genome is complete and error free, the genome is sequenced more than once. For example, the genome of the bacterium Pseudomonas aeruginosa was sequenced seven times using the shotgun method to make sure that the sequence was accurate and free from errors. But the assembler software recognized 1,604 regions that required further clarification.

These regions were reanalyzed and resequenced to complete the genome sequence. The accuracy of the shotgun method was compared with the sequence derived by the clone by clone method of two widely separated genomic regions of P. aerugionsa. These two regions together were 81,843 nucleotides long.

The sequences obtained by the two methods were in perfect agreement. This test revealed the accuracy of the shotgun method of genome sequencing. This also exemplifies the precautions taken in genome sequencing projects. This level of care is not unusual and similar precautions are used in all genome projects.

The Human Genome Project sequenced the 3.2 billion base pairs of the human genome a total of 12 times. The Celera Genomics, U.S.A. used a strategy of sequencing from both ends of human DNA fragments; it sequenced the human genome 35.6 times. Although a draft of the human genome sequence is finished, several other tasks are yet to be completed.

These include obtaining the remaining sequence and correcting errors (proofreading the genome), filling sequence gaps and then sequencing the 7-15 per cent of the genome that contains heterochromatin.

Heterochromatic regions of the genome were not sequenced initially because they contain long stretches of repetitive DNA sequences. Further, it was initially considered that heterochromatin does not contain genes. But the genome sequence of Drosophila revealed that heterochromatic regions do contain a small number of genes (about 50 in Drosophila).

As a result of this discovery, heterochromatic regions of the human genome have to be sequenced to ensure that all the genes in the human genome are identified. Once the genome of an organism is sequenced, compiled, and proofread, the next stage of genomics, viz., annotation, begins.