Home  > What is bioinformatics ? > A short introduction to bioinformatics

Whole genome sequencing

Translated from "Donner un sens au génome", La Recherche, n° 332, June 2000

Sequencing a genome means finding out the sequence of nucleotides which make up the DNA macromolecule. Each nucleotide is referred to by the first letter of the name of its nitrogen base, and the information carried by the genome is contained in the long text - nearly 4 billion characters for the human genome - written in an alphabet of these four letters, A, C, G, and T. The efficiency of a sequencing project is measured in kilobases per day, and is determined by the number of machines used. As these can only sequence relatively short sections of the DNA molecule at a time, powerful computers have to be used to put the partially overlapping sub-sequences obtained into the correct order, to reconstitute the complete genome sequence. As well as point mutations, where one base is missing or has been replaced by a different one, there may be errors in the order of the sub-sequences. What is more, certain parts of the DNA molecule are more difficult to sequence, and obtaining a sequence covering 100% of a genome is particularly expensive. This is why the part of the human genome which is now available, or the part which should soon be, is called a "working draft" It is thought that it will be at least two years before a complete, high quality sequence is available.