Back to Home
Home >> Genomics and Bioinformatics >> DNA Sequences
Back to Home

DNA Sequences-The four bases are denoted by single letters A (for adenine), C (cytosine), G (guanine) and T (thymine). But often sequence data contain ambiguities in that it is not clear as to which of the four bases is present at certain positions. For example, the sequence data may indicate that the base present at a specific position may be either G or A, i.e., it is a purine; this situation is denoted by R (derived from purine).

Similarly, if a position may have either C or T, i.e., a pyrimidine, it is depicted by Y (derived from pyrimidine). Similarly, single letter codes have been developed to denote other situations (Table 15.4). The ambiguities in the DNA sequences are resolved by repeated sequencing of the concerned DNA segment.

The base sequences of the two complementary strands of a DNA molecule are represented using the same symbols. Even those positions that exhibit ambiguity can be represented by this system of symbols. For example, letter R denotes A or G. Its complementary symbol will be Y, which depicts T or C (G pairs with C, and A pairs with T).

Similarly, the complementary symbol for H will be D. But some symbols are complementary to themselves, e.g., Sand W. S represents G or C, and G pairs with C; therefore, both strands of a DNA duplex will have S at the given position.

In databases, base sequence of only one strand is listed. This sequence runs from the 5'- to the 3'- direction, i.e., the 5'-end is at the left extreme end the 3' -end is at the right extreme of the sequence.

The base sequence of the complementary strand is easily derived either manually (for short sequences) or by using an appropriate software package. In case of an RNA sequence, symbol U (for uracil) occurs in the place of T.