Structure and function of DNA
DNA molecules are incredibly long, but also very thin. One DNA molecule from the chromosome of a mammal may be about 1 m long when unraveled. However, it has to fit in a nucleus of some 5-6 orders of magnitude smaller and is folded up in chromosomes in a highly organized manner. DNA is a linear polymer that is composed of four different building blocks, the nucleotides. It is in the sequence of the nucleotides in the polymers where the genetic information carried by chromosomes is located. Each nucleotide is composed of three parts: (1) a nitrogenous base known as purine (adenine (A) and guanine (G)) or pyrimidine (cytosine (C) and thymine (T)); (2) a sugar, deoxyribose; and (3) a phosphate group (see pp. 20-22 of Molecular Biotechnology for molecular structures of DNA and its components). The nitrogenous base determines the identity of the nucleotide, and individual nucleotides are often referred to by their base (A, C, G, or T). One DNA strand can be up to several hundred million nucleotides in length. T can form a hydrogen bond with A, and C with G; two DNA strands wind together in an antiparallel fashion in a double-helix.
Inside the cell, the DNA acts like an "instruction manual": in its sequence, it provides all the information needed to function, but the actual work of translating the information into a medium that can be used directly by the cell is done by RNA, ribonucleic acid. The structural difference with DNA is that RNA contains a -OH group both at the 2' and 3' position of the ribose ring, whereas DNA (which stands, in fact, for deoxy-RNA) lacks such a hydroxy group at the 2' position of the ribose. See http://www.ch.cam.ac.uk/magnus/molecules/nucleic/sugars.html. The same bases can be attached to the ribose group in RNA as occur in DNA, with the exception that in RNA thymine does not occur, and is replaced by uracil, which has an H-group instead of a methyl group at the C-5 position of the pyrimidine. The molecular structures of uracil and thymine are compared at http://www.ch.cam.ac.uk/magnus/molecules/nucleic/bases.html. The RNA has three functions: (a) it serves as the messenger that tells the cell (the ribosomes) what protein to make (messenger RNA; mRNA); (b) it serves as part of the structure of the ribosome, the protein/RNA complex that synthesizes proteins according to the information presented by the mRNA (ribosomal RNA; rRNA); and (c) it functions to bring amino acids (the constituents of the proteins) to the ribosome when a specific amino acid "is called for" by the information on the mRNA to be put in into the protein that is being synthesized; this RNA is called transfer RNA (tRNA).
An important point of emphasis should be that all vegetative cells of one organism contain the same genetic information. Upon division, each daughter cell obtains an "exact" copy of the DNA of the parent (see http://accessexcellence.org/AB/GG/dna_replicating.html). However, the specific genes that are expressed at specific times may be very different between different tissues. These differences in gene expression allow for the regulation of development of the organism, and for the development of different tissues. For the most part, DNA-binding proteins (encoded by the DNA) play an important role in the regulation of expression of genes encoded on the DNA. A very important "chicken-and-egg" problem.....
The messenger RNA (mRNA) serves as an intermediate between DNA and protein. Parts of the DNA are "transcribed" into transcripts (single-stranded RNA molecules) that are processed to mRNA. In prokaryotes the transcript generally does not need to be processed, and can serve as mRNA right away. Transcription starts at a specific site on the DNA called a promoter. Each gene or operon has its own promoter(s). Transcription ends at a terminator sequence on the DNA. The transcripts usually are 300-50,000 nucleotides long, and contain the information to make protein. In eukaryotes (organisms with cells containing a nucleus; in fact, any higher organism) generally the transcripts needs to be processed before they can serve as a blueprint for a protein. The processing involves the removal of intervening sequences (introns) in the gene. The introns may be anywhere between 50 and 10,000 nucleotides in length. The coding regions of the mRNA are called exons. There may be up to 100 introns in a single gene. The introns are spliced out by small ribonucleoprotein particles (consisting of RNA and protein), which appear to pull the two ends of the intron together. However, there are also introns that splice out without the need of a protein: the RNA sequence itself appears to contain sufficient information to know where to splice out the intron. In addition to the removal of introns, a poly-A sequence is added to the 3’ end of the transcript. The processed transcript is the mRNA, and the information in the mRNA can be used to be "translated" into a protein of specific sequence. However, in prokaryotes introns are rare and mRNA generally does not get processed before translation.
The intron splicing process provides an opportunity to increase the amount of usable genetic information without increasing the genome size of the organism: Alternative splicing of a particular transcript can occur. Alternative splicing means that introns may be recognized in different ways in different molecules of the same primary transcript, and the result is that one gene can give rise to different mRNAs and thereby to different proteins. Note that this process is largely limited to eukaryotes as introns in prokaryotes are rare.
Ribosomal RNAs (rRNAs) are essential components of an important part of the protein synthesis machinery: the ribosomes. In addition to rRNA, there are some 70 different proteins in a ribosome. There are hundreds of copies of rRNA genes per genome, thus making the production of lots of rRNA possible. There are four different rRNAs, each with a different size. Each ribosome contains one molecule of each of the four rRNA types. In prokaryotes, ribosomes bind to the mRNA close to the translation start site. This ribosome binding site is referred to as the Shine-Dalgarno sequence or as the ribosome recognition element. In eukaryotes, ribosomes bind at the 5' end of the mRNA and scan down the mRNA until they encounter a suitable start codon.
Transfer RNA (tRNA) carries amino acids to the ribosomes, to enable the ribosomes to put this amino acid on the protein that is being synthesized as an elongating chain of amino acid residues, using the information on the mRNA to "know" which amino acid should be put on next. For each kind of amino acid, there is a specific tRNA that will recognize the amino acid and transport it to the protein that is being synthesized, and tag it on to the protein once the information on the mRNA calls for it.
All tRNAs have the same general shape, sort of resembling a clover leaf. Parts of the molecule fold back in characteristic loops, which are held in shape by nucleotide-pairing between different areas of the molecule. There are two parts of the tRNA that are of particular importance: the aminoacyl attachment site and the anticodon. The aminoacyl attachment site is the site at which the amino acid is attached to the tRNA molecule. Each type of tRNA specifically binds only one type of amino acid. The anticodon (three bases) of the tRNA base-pairs with the appropriate mRNA codon at the mRNA-ribosome complex. This temporarily binds the tRNA to the mRNA, allowing the amino acid carried by the tRNA to be incorporated into the polypeptide in its proper place. Thus, the sequence of the codon (three bases) in the mRNA dictates the amino acid to be put in in the protein at a specific site. The "dictionary" of codons coding for amino acids is called the genetic code. A summary of the amino acids that the 64 possible codons encode can be found at http://molbio.info.nih.gov/molbio/desk.html (choose "Table of Standard Genetic Code" for a codon table, and "Amino Acid Structure and Properties" for information regarding the amino acids). The three codons for which there is no matching tRNA (UAA, UGA, and UAG) serve as "stop-translation" signals at which the ribosome falls off.
After having discussed DNA and the various RNAs, the stage has been set for protein synthesis. The basic reaction of protein synthesis is the controlled formation of a peptide bond between two amino acids. This reaction is repeated many times, as each amino acid in turn is added to the growing polypeptide. Protein synthesis starts when the mRNA binds to a small ribosomal subunit near a AUG sequence in the mRNA. The AUG codon is called start codon, since it codes for the first amino acid (a methionine) to be made of the protein. The AUG codon base-pairs with the anticodon of tRNA carrying methionine. A large ribosomal subunit binds to the complex, and the reactions of protein synthesis itself can begin. The aminoacyl-tRNA to be called for next is determined by the next codon (the next three bases) on the mRNA. Each amino acid is coded for by one or more (up to six) codons. Of course, it would be more straightforward to have each amino acid coded for by only one codon, but nature appears to have chosen a more complex route. The reason for this in part is that there are 20 different amino acids, and 4x4x4=64 different combinations possible in a codon. When the ribosome reaches one of the three codons for which there is no matching tRNA, the ribosome falls off and the synthesized protein is released. The degeneracy of the genetic code for certain amino acids could have a function in regulation of translation; any idea how? The process of protein synthesis has been summarized on pages 34-38 of Molecular Biotechnology, and can also be found on the web at http://accessexcellence.org/AB/GG/protein_synthesis.html translation (in conjunction with transcription) and http://accessexcellence.org/AB/GG/dna_molecule.html.
Amino acids represent quite a broad spectrum of different chemical structures. The web address http://www.ch.cam.ac.uk/magnus/molecules/amino/ provides the structure of all amino acids. With the generation of a protein with a specific amino acid sequence using essentially the genetic information present in the DNA, the link between genetic and functional information is complete.
Over the last several years, it has become obvious that the sequence present in DNA does not always dictate literally the sequence of the protein. In a number of instances "RNA editing" has been observed (particularly in the small genomes present in mitochondria and chloroplasts), in which transcripts are chemically modified (for example, some Cs are changed to Us) by enzymes before translation takes place. Thus, the DNA sequence in such cases does not precisely correlate with the sequence of the gene product (the protein). One thus needs to compare sequences from DNA and protein (or from DNA and processed RNA) if one suspects that RNA editing can occur. The function of RNA editing has not been elucidated yet.
Questions, Chapter 3
|1.||Just to provide yourself with a perspective on how much genetic flexibility there is, calculate how many different sequences of 150 nucleotides long could exist that would code for a short, 50-amino-acid protein. And how many different ways are there to make a short protein of 50 amino acids? What would be the answer if you had a large, 6000 nucleotide long sequence that coded for a large protein of 2000 amino acids?|
|2.||Some prokaryotes grow at very high temperature (70-100 °C) and are called thermophiles. Organisms living in the deep sea where the pressure (and thus the boiling point of water) is very high grow at even higher temperatures and are called extremophiles. A group of these prokaryotes, named archaea, have now been found to contain DNA-binding proteins, whereas other thermophilic prokaryotes are found to have a very high GC content (and thus a low AT content) in their DNA. Can you explain why this would be?|
Return to Contents
Center for Bioenergy & Photosynthesis
Arizona State University
Room PSD 209
Tempe, AZ 85287-1604
14 August 2006
phone: (480) 965-1963
fax: (480) 965-2747