CHAPTER 2 The Cellular and Molecular Basis of Inheritance
The hereditary material is present in the nucleus of the cell, whereas protein synthesis takes place in the cytoplasm. What is the chain of events that leads from the gene to the final product?
The Cell
Within each cell of the body, visible with the light microscope, is the cytoplasm and a darkly staining body, the nucleus, the latter containing the hereditary material in the form of chromosomes (Figure 2.1). The phospholipid bilayer of the plasma membrane protects the interior of the cell but remains selectively permeable and has integral proteins involved in recognition and signaling between cells. The nucleus has a darkly staining area, the nucleolus. The nucleus is surrounded by a membrane, the nuclear envelope, which separates it from the cytoplasm but still allows communication through nuclear pores.
The cytoplasm contains the cytosol, which is semifluid in consistency, containing both soluble elements and cytoskeletal structural elements. In addition, in the cytoplasm there is a complex arrangement of very fine, highly convoluted, interconnecting channels, the endoplasmic reticulum. The endoplasmic reticulum, in association with the ribosomes, is involved in the biosynthesis of proteins and lipids. Also situated within the cytoplasm are other even more minute cellular organelles that can be visualized only with an electron microscope. These include the Golgi apparatus, which is responsible for the secretion of cellular products, the mitochondria, which are involved in energy production through the oxidative phosphorylation metabolic pathways, and the peroxisomes (p. 180) and lysosomes, both of which are involved in the degradation and disposal of cellular waste material and toxic molecules.
DNA: The Hereditary Material
Structure
For genes to be composed of DNA, it is necessary that the latter should have a structure sufficiently versatile to account for the great variety of different genes and yet, at the same time, be able to reproduce itself in such a manner that an identical replica is formed at each cell division. In 1953, Watson and Crick, based on x-ray diffraction studies by themselves and others, proposed a structure for the DNA molecule that fulfilled all the essential requirements. They suggested that the DNA molecule is composed of two chains of nucleotides arranged in a double helix. The backbone of each chain is formed by phosphodiester bonds between the 3′ and 5′ carbons of adjacent sugars, the two chains being held together by hydrogen bonds between the nitrogenous bases, which point in toward the center of the helix. Each DNA chain has a polarity determined by the orientation of the sugar–phosphate backbone. The chain end terminated by the 5′ carbon atom of the sugar molecule is referred to as the 5′ end, and the end terminated by the 3′ carbon atom is called the 3′ end. In the DNA duplex, the 5′ end of one strand is opposite the 3′ end of the other, that is, they have opposite orientations and are said to be antiparallel.
The arrangement of the bases in the DNA molecule is not random. A purine in one chain always pairs with a pyrimidine in the other chain, with specific pairing of the base pairs: guanine in one chain always pairs with cytosine in the other chain, and adenine always pairs with thymine, so that this base pairing forms complementary strands (Figure 2.2). For their work Watson and Crick, along with Maurice Wilkins, were awarded the Nobel Prize for Medicine or Physiology in 1962 (p. 10).
Replication
DNA replication, through the action of the enzyme DNA polymerase, takes place at multiple points known as origins of replication, forming bifurcated Y-shaped structures known as replication forks. The synthesis of both complementary antiparallel DNA strands occurs in the 5′ to 3′ direction. One strand, known as the leading strand, is synthesized as a continuous process. The other strand, known as the lagging strand, is synthesized in pieces called Okazaki fragments, which are then joined together as a continuous strand by the enzyme DNA ligase (Figure 2.3A).
DNA replication progresses in both directions from these points of origin, forming bubble-shaped structures, or replication bubbles (Figure 2.3B). Neighboring replication origins are approximately 50 to 300 kilobases (kb) apart and occur in clusters or replication units of 20 to 80 origins of replication. DNA replication in individual replication units takes place at different times in the S phase of the cell cycle (p. 39), adjacent replication units fusing until all the DNA is copied, forming two complete identical daughter molecules.
Chromosome Structure
The packaging of DNA into chromosomes involves several orders of DNA coiling and folding. In addition to the primary coiling of the DNA double helix, there is secondary coiling around spherical histone ‘beads’, forming what are called nucleosomes. There is a tertiary coiling of the nucleosomes to form the chromatin fibers that form long loops on a scaffold of non-histone acidic proteins, which are further wound in a tight coil to make up the chromosome as visualized under the light microscope (Figure 2.4), the whole structure making up the so-called solenoid model of chromosome structure.
Types of DNA Sequence
DNA, if denatured, will reassociate as a duplex at a rate that is dependent on the proportion of unique and repeat sequences present, the latter occurring more rapidly. Analysis of the results of the kinetics of the reassociation of human DNA have shown that approximately 60% to 70% of the human genome consists of single- or low-copy number DNA sequences. The remainder of the genome, 30% to 40%, consists of either moderately or highly repetitive DNA sequences that are not transcribed. This latter portion consists of mainly satellite DNA and interspersed DNA sequences (Box 2.1).
Nuclear Genes
It is estimated that there are between 25,000 and 30,000 genes in the nuclear genome. The distribution of these genes varies greatly between chromosomal regions. For example, heterochromatic and centromeric (p. 32) regions are mostly non-coding, with the highest gene density observed in subtelomeric regions. Chromosomes 19 and 22 are gene rich, whereas 4 and 18 are relatively gene poor. The size of genes also shows great variability: from small genes with single exons to genes with up to 79 exons (e.g., dystrophin, which occupies 2.5 Mb of the genome).
Multigene Families
Many genes have similar functions, having arisen through gene duplication events with subsequent evolutionary divergence making up what are known as multigene families. Some are found physically close together in clusters; for example, the α- and β-globin gene clusters on chromosomes 16 and 11 (Figure 2.5), whereas others are widely dispersed throughout the genome occurring on different chromosomes, such as the HOX homeobox gene family (p. 87).
Classic Gene Families
Examples of classic gene families include the numerous copies of genes coding for the various ribosomal RNAs, which are clustered as tandem arrays at the nucleolar organizing regions on the short arms of the five acrocentric chromosomes (p. 32), and the different transfer RNA (p. 20) gene families, which are dispersed in numerous clusters throughout the human genome.
Gene Superfamilies
Examples of gene superfamilies include the HLA (human leukocyte antigen) genes on chromosome 6 (p. 200) and the T-cell receptor genes, which have structural homology with the immunoglobulin (Ig) genes (p. 200). It is thought that these are almost certainly derived from duplication of a precursor gene, with subsequent evolutionary divergence forming the Ig superfamily.
Gene Structure
The original concept of a gene as a continuous sequence of DNA coding for a protein was turned on its head in the early 1980s by detailed analysis of the structure of the human β-globin gene. It was revealed that the gene was much longer than necessary to code for the β-globin protein, containing non-coding intervening sequences, or introns, that separate the coding sequences or exons (Figure 2.6). Most human genes contain introns, but the number and size of both introns and exons is extremely variable. Individual introns can be far larger than the coding sequences and some have been found to contain coding sequences for other genes (i.e., genes occurring within genes). Genes in humans do not usually overlap, being separated from each other by an average of 30 kb, although some of the genes in the HLA complex (p. 200) have been shown to be overlapping.
Extragenic DNA
Tandemly Repeated DNA Sequences
Minisatellite DNA
Telomeric DNA
The terminal portion of the telomeres of the chromosomes (p. 32) contains 10 to 15 kb of tandem repeats of a 6-base pair (bp) DNA sequence known as telomeric DNA. The telomeric repeat sequences are necessary for chromosomal integrity in replication and are added to the chromosome by an enzyme known as telomerase (p. 32).
Hypervariable minisatellite DNA
Hypervariable minisatellite DNA is made up of highly polymorphic DNA sequences consisting of short tandem repeats of a common core sequence. The highly variable number of repeat units in different hypervariable minisatellites forms the basis of the DNA fingerprinting technique developed by Professor Sir Alec Jeffreys in 1984 (p. 69).
Microsatellite DNA
Microsatellite DNA consists of tandem single, di-, tri-, and tetra-nucleotide repeat base-pair sequences located throughout the genome. Microsatellite repeats rarely occur within coding sequences but trinucleotide repeats in or near genes are associated with certain inherited disorders (p. 59).
This variation in repeat number is thought to arise by incorrect pairing of the tandem repeats of the two complementary DNA strands during DNA replication, or what is known as slipped strand mispairing. Duplications or deletions of longer sequences of tandemly repeated DNA are thought to arise through unequal crossover of non-allelic DNA sequences on chromatids of homologous chromosomes or sister chromatids (p. 32).
Nowadays DNA microsatellites are used for forensic and paternity tests (p. 69). They can also be helpful for gene tracking in families with a genetic disorder but no identified mutation (p. 70).
Highly Repeated Interspersed Repetitive DNA Sequences
Long Interspersed Nuclear Elements
The function of these interspersed repeat sequences is not clear. Members of the Alu repeat family are flanked by short direct repeat sequences and therefore resemble unstable DNA sequences called transposable elements or transposons. Transposons, originally identified in maize by Barbara McClintock (p. 10), move spontaneously throughout the genome from one chromosome location to another and appear to be ubiquitous in the plant and animal kingdoms. It is postulated that Alu repeats could promote unequal recombination, which could lead to pathogenic mutations (p. 22) or provide selective advantage in evolution by gene duplication. Both Alu and LINE-1 repeat elements have been implicated as a cause of mutation in inherited human disease.
Mitochondrial DNA
In addition to nuclear DNA, the several thousand mitochondria of each cell possess their own 16.6 kb circular double-stranded DNA, mitochondrial DNA (or mtDNA) (Figure 2.7). The mtDNA genome is very compact, containing little repetitive DNA, and codes for 37 genes, which include two types of ribosomal RNA, 22 transfer RNAs (p. 20) and 13 protein subunits for enzymes, such as cytochrome b and cytochrome oxidase, which are involved in the energy producing oxidative phosphorylation pathways. The genetic code of the mtDNA differs slightly from that of nuclear DNA.