Site	Internet Address
NCBI¹ Genetic Disease Websites
GeneTests, GeneReviews²	http://www.ncbi.nlm.nih.gov/sites/GeneTests/
OMIM³	http://www.ncbi.nlm.nih.gov/omim/
NCBI¹ Genome Data Websites
NCBI¹ homepage (Entrez)	http://www.ncbi.nlm.nih.gov/
dbGaP Genotypes and Phenotypes	http://www.ncbi.nlm.nih.gov/gap
dbSNP (SNP database)	http://www.ncbi.nlm.nih.gov/snp/
Other Genome Data Websites
Ensembl Human Genome Browser	http://uswest.ensembl.org/index.html
HUGO⁴	http://www.genenames.org/index.html
DOE⁵ Genomics Websites, includes Human Genome Project	http://genomics.energy.gov/
UCSC Genome Bioinformatics⁶	http://genome.ucsc.edu/

Molecular Basis of Heredity

Modern theories of molecular biology hold that all information needed for function of cells and organisms is contained in macromolecules composed of simple repeating units. The flow of genetic information is (almost) exclusively unidirectional: DNA to RNA to protein. That is, the sequence of deoxyribonucleic acid (DNA) specifies the synthesis and sequence of ribonucleic acid (RNA) by a process known as transcription. Messenger RNA in turn specifies the synthesis and sequence of polypeptides, which are the building blocks of proteins, by a process known as translation. Other forms of RNA function independently. This theory is the central dogma of molecular biology. Accordingly, we begin with a review of the structure and function of these three macromolecules, and continue with reviews of the processes involved in gene and protein expression, including gene structure and organization, RNA processing, and epigenetics. Epigenetics refers to modification of genes other than changes in the DNA sequence, especially by addition of methyl groups to DNA, which alters gene expression. The two most important epigenetic changes found to be relevant to clinical disorders to date are imprinting and X-inactivation.

Structure and Function of DNA

DNA is a large polymer or macromolecule composed of linear sequences of simple repeating units. The specific sequence of these units contains all of the genetic information of an individual cell or organism. The structure of DNA in its native state was deduced by Watson and Crick in 1953 [Watson and Crick, 1953]. The basic repeating unit of DNA is the nucleotide, which consists of a five-carbon sugar known as deoxyribose; a phosphate group; and a nitrogen-containing base, which may be either a purine or a pyrimidine (Figure 30-1A). In DNA, the purine base may be either adenine (A) or guanine (G), and the pyrimidine base may be either thymine (T) or cytosine (C). Nucleotides polymerize into long chains by formation of phosphodiester bonds between the 5′ carbon position of one deoxyribose molecule and the 3′ carbon of the preceding deoxyribose molecule (Figure 30-1B).

Fig. 30-1 The chemical structure of DNA.

A, The four bases of DNA. B, The sugar-phosphate backbone and 3′–5′ phosphodiester bonds.

Each DNA molecule consists of two strands of nucleotides that are held together by weak hydrogen bonds between pairs of bases: A pairs only with T, and G pairs only with C. These paired units are known as basepairs (bp). In the native state, the two strands wind around each other to form a double helix that resembles a right-hand spiral staircase, with two unequal grooves known as the major and minor grooves (Figure 30-2). A single turn of the helix measures 3.4 nm and contains ten nucleotides. Each strand has a directionality imparted by the deoxyribose sugar backbone. Adjacent nucleotides are linked by phosphodiester bonds between the 5′ and 3′ carbon atoms of the sugar residues, so that one end of the DNA strand has an unlinked 5′ carbon (the 5′ end) and the other end of the strand has an unlinked 3′ carbon atom (the 3′ end). The two strands are antiparallel – that is, they run in opposite directions so that the 5′ end of one strand is paired with the 3′ end of the other. Within living cells, DNA is associated with proteins and supercoiled into more complex structures known as chromosomes, which are described later in the chapter.

Fig. 30-2 Packaging of DNA by structural proteins.

A, The right-handed double helix of DNA. B, This wraps around a histone core to form nucleosomes. C, The nucleosomes are packed into a solenoid structure. D, Loops of solenoids compose an interphase chromosome.

(Modified from Thompson MR et al. Genetics in medicine, 5th edn. Philadelphia: WB Saunders, 1991.)

Thus, when the sequence of one DNA strand is known, the sequence of the opposite or complementary strand may be predicted. Precise replication of DNA is therefore possible, a process that involves initiation, elongation, and termination stages. The process begins with recognition of an “origin of replication.” Such points of origin are specific DNA sequences, recognized by a protein complex known as the primosome, that occur every 50–300 kilobases (kb) of DNA; the unit kb refers to 1000 sequential nucleotides. The two parental DNA strands must first be separated by helicase, an enzyme that unwinds the supercoiled DNA helix to create a replication fork. The process of elongation occurs at the site of the replication fork or replisome. Synthesis of new strands begins with the addition of approximately ten RNA bases by a protein complex known as primase, and then continues with chain elongation using the original strands as templates. This process is known as semiconservative replication. Both initiation or RNA priming and chain elongation involve large protein complexes that include several DNA polymerases.

Five distinct DNA polymerases have been isolated in mammalian systems, including human cell cultures (Table 30-2). They are able to copy DNA only by adding nucleotides to the 3′ end of the growing chain, so DNA can elongate only in the 5′ to 3′ direction. Thus, the template DNA can be read only in the reverse, or 3′ to 5′, direction. As DNA is unwound, the replication fork necessarily unwinds one strand in the 3′ to 5′ direction and the other in the 5′ to 3′ direction. The 3′ to 5′ or leading strand is replicated in a continuous fashion at the replication fork by DNA polymerases α(I), which primes the reaction, and δ(III), which synthesizes the DNA chain. The new strand is complementary and so elongates in the opposite, or 5′ to 3′, direction.

Table 30-2 DNA Polymerases in Mammalian Systems

The 5′ to 3′, or lagging, strand cannot be copied continuously because this would require synthesis of the complementary new strand in a 3′ to 5′ direction, which is not possible, because DNA polymerases are able to synthesize DNA only in the 5′ to 3′ direction. Thus, the lagging strand must be copied by DNA polymerases α(I) and δ(III) in small segments of 100–1000 bp in the opposite direction from the replication fork. These small DNA molecules are known as Okazaki fragments. DNA replication is described as semidiscontinuous because of the continuous replication of the leading strand and the discontinuous replication of the lagging strand. The Okazaki fragments are then joined by another enzyme, DNA ligase. DNA replication is a long process, requiring about 8 hours in most human cells in culture. Thus, the function of DNA is reliably to encode and store the genetic information needed for the cell and organism to function. It has no direct functions itself but rather acts by directing synthesis of both RNA and protein.

Structure and Function of RNA

RNA differs chemically from DNA in the substitution of ribose for deoxyribose in the sugar backbone of the molecule, and of uridine (U) for thymine as one of the pyrimidine bases. Also, RNA normally exists as a single-stranded rather than double-stranded molecule. Recent advances have demonstrated far more diverse functions for RNA than were previously appreciated, particularly involving genes that produce functional RNA products that do not code for proteins. These probably represent at least 5 percent of all human genes, as suggested by current knowledge [Strachan and Read, 2010]. Several distinct classes of RNA molecules have been recognized, most of which are involved with regulating or assisting gene expression.

MicroRNA

MicroRNAs (miRNAs) are another class of small noncoding genes that regulate the expression of protein-encoding genes at the post-transcriptional RNA level [Denli et al., 2004]. The process begins with transcription (synthesis) of primary RNA transcripts that range in size from several hundred to several thousand kb. These transcripts are recognized and cut into precursor miRNAs in the nucleus by a protein known as Dicer, moved to the cytoplasm, and processed into mature miRNAs. The mature miRNAs join the RNA-induced silencing complex (RISC), which recognizes and cleaves (or otherwise silences) a target gene. This process has been demonstrated in many organisms, including mammals, and appears likely to play a key role in regulation of many genes.

Structure and Function of Polypeptides and Proteins

Proteins are composed of one or more polypeptide chains. Polypeptides are large polymers or macromolecules composed of linear sequences of repeating units known as amino acids, which are more complex than the repeating units of DNA or RNA. Amino acids consist of a three-carbon backbone, with an amino group attached to carbon 1 and a carboxyl group to carbon 3. They differ in the composition of a side chain attached to carbon 2. With rare exceptions, all polypeptides and proteins in nature are built from different sequences of 20 amino acids (Table 30-3). The side chains may be neutral and hydrophobic, neutral and polar, basic, or acidic. The simplest amino acid is valine, which has a hydrogen ion as the side chain.

Table 30-3 Classification of Amino Acids by Side Chain

Amino Acid	3-letter Code	1-letter Code
Neutral and Hydrophobic
Alanine	Ala	A
Isoleucine	Ile	I
Leucine	Leu	L
Methionine	Met	M
Phenylalanine	Phe	F
Proline	Pro	P
Tryptophan	Trp	W
Valine	Val	V
Neutral and Polar
Asparagine	Asn	N
Cysteine	Cys	C
Glutamine	Glu	Q
Glycine	Gly	G
Serine	Ser	S
Threonine	Thr	T
Tyrosine	Tyr	Y
Acidic
Aspartic acid	Asp	D
Glutamic acid	Glu	E
Basic
Arginine	Arg	R
Histidine	His	H
Lysine	Lys	K

The process of information transfer from RNA polypeptides to proteins is known as translation. It relies on the genetic code, the system by which the nucleotide sequence of mRNA specifies the amino acid sequence of a polypeptide chain. In this nearly universal code, each set of three adjacent bases in the mRNA transcript constitutes a codon, and different combinations of bases within the codon specify the individual amino acids (Table 30-4). The small tRNA molecules serve as the molecular link between mRNA codons and amino acids. One segment of each tRNA transcript contains a three-base anticodon that is complementary to a specific codon on the mRNA, whereas another segment contains a binding site for one of the 20 amino acids.

Table 30-4 The Nuclear Genetic Code

With a total of only 20 amino acids and 64 possible codons, most amino acids are specified by more than one codon. For some of the different amino acids, the base in the third position in the triplet may be either of the purines, either of the pyrimidines, or sometimes any of the four bases. For this reason, the third position in the codon sometimes is called the wobble position. Arginine and leucine are each specified by six codons, whereas only methionine and tryptophan are specified by a single codon. Three codons signal termination of translation and accordingly are called stop codons.

Transcription

The process of information transfer from DNA to RNA is known as transcription. Synthesis of RNA begins at a specific transcription start site and continues in a 5′ to 3′ direction with regard to the RNA product. The DNA strand that corresponds to the RNA sequence is known as the coding or sense strand. This strand, however, is not used as the template for synthesis of an RNA molecule. Rather, the complementary DNA strand, known as the noncoding or antisense strand, actually serves as the template and is read in the 3′ to 5′ direction. The RNA product is known as a transcript.

Translation

The process of information transfer from RNA to polypeptide or protein is known as translation. This process takes place in the cytoplasm on small structures known as ribosomes, macromolecules composed of the four species of rRNA noted earlier. They function like small migrating factories that travel along an mRNA template, engaging in rapid cycles of peptide bond synthesis. The process consists of initiation, elongation, and termination stages.

The ribosome contains a large site that binds about 35 bp of mRNA, and two adjacent sites for binding the smaller aminoacyl-tRNA molecules. The first is the acceptor or A site, which holds the incoming aminoacyl-tRNA. The second is the donor or P site, which is occupied by a tRNA carrying the growing polypeptide chain. Translation begins with mRNA binding to the ribosome at the site of the first AUG base triplet, which specifies the amino acid methionine, and also serves as the start signal for synthesis of the polypeptide chain and establishes the reading frame of the mRNA.

The mRNA and tRNA then move in the same direction along the ribosome, with the tRNA moving from the “A” site to the “P” site, and the mRNA sliding over three bases, allowing recognition of the next codon. Bonding between the mRNA codon and tRNA anticodon brings the appropriate amino acid into position on the ribosome to form a new peptide bond to the carboxyl end of the growing polypeptide chain. As part of this reaction, the polypeptide chain is released from the tRNA at the “P” site, but remains bonded to the tRNA at the “A” site. The tRNA and mRNA then move another 3 bp along the chain, and the process is repeated. This reaction continues until one of the stop codons is reached. Thus, proteins are synthesized from the amino to the carboxyl terminus, which corresponds to translation from the 5′ to the 3′ end of the mRNA molecule, and methionine is always the first amino acid of each polypeptide chain, although it usually is removed before protein synthesis is completed.

Gene Structure and Organization

As noted earlier, a gene traditionally has been defined as a unit of genetic information. This concept has gradually progressed to a more useful definition, which states that a gene is a sequence of DNA on a chromosome that is required for production of a functional product, which can be either a protein or a functional RNA molecule [Nussbaum et al., 2007]. By convention, genetic information is always read in the 5′ to 3′ direction, whether encoded in DNA or RNA – in an upstream to downstream direction. The nomenclature regarding the 5′ and 3′ positions of the sugar backbone can be confusing. The 5′ carbon of the first nucleotide of a sequence is joined by a phosphodiester bond to a nucleotide not involved in the sequence, whereas its 3′ carbon is joined to the 5′ carbon of the second nucleotide, and so on. The last nucleotide of the sequence has a 3′ carbon, which joins another uninvolved nucleotide.

Genes

Genes are composed of a continuous length of DNA with definable start and end points, which include the sequence that codes for the RNA or polypeptide product and is thus known as the coding region. It has become clear, however, that the structure of a gene is complex and includes much more than the coding sequence of the protein. All genes include additional sequences on either end of the coding region – designated the 5′ and 3′ UTRs – that do not code for an RNA product or polypeptide. These regions function to regulate transcription and RNA stability. The gene is considered to include the entire sequence represented in the RNA product because some mutations within noncoding regions can impair gene function.

A model of a typical human gene is shown in Figure 30-3. Promoter sequences required for regulation and initiation of RNA transcription (red diamonds in Figure 30-3) are present at the 5′ end of the gene, such as the CAT and TATA boxes whose sequences are tightly conserved among many different genes and species. Downstream from the promoter sequences is a specific sequence that signals the start of transcription. A short way further downstream is an initiator codon, AUG, which codes for methionine. This triplet is the translation start site, which signals the start of the coding sequence for the polypeptide product. The region between the transcription and translation start sites is the 5′ UTR.

Fig. 30-3 The structure of a typical human gene.

The gene includes a primary regulatory region known as the promoter just upstream of the transcription start site that is required for binding of both DNA and RNA polymerases (red diamonds), as well as several types of distant regulatory elements that protect the gene from regulation of other nearby genes (insulator), increase or decrease gene expression (enhancers and silencers), or regulate several genes in the region (locus control region).

The next segment of the gene is the coding region. The coding regions of most genes in prokaryotes and lower eukaryotes are colinear, which means that the coding sequence corresponds exactly to the sequence of amino acids in the polypeptide. By contrast, most higher eukaryotic genes, including human genes, contain additional sequences that lie within the coding region, interrupting the sequence that represents the polypeptide. The regions that code for the final polypeptide (or functional RNA) product are known as exons, whereas the regions that are missing from the final mRNA product are introns. The removal of introns from the final mRNA product is known as splicing, a complex process that is regulated by a large number of proteins and functional RNA transcripts.

The coding sequence ends at one of three specific stop codons: UAA, UAG, or UGA. The last segment of the gene is the 3′ UTR, which contains a polyadenylation signal and presumably a signal to end transcription, although no transcription stop sequence has been identified. The length of a gene may vary, ranging from less than 1 kb to several hundred kb. The longest gene known, which codes for dystrophin, spans more than 2000 kb of genomic sequence, although this is not the largest protein produced in the cell.

Regulatory Regions

Many genes have highly conserved sequences, a longer distance upstream and downstream of the transcribed gene, that are involved in regulating expression, including enhancers, silencers, locus control regions, and insulators (see Figure 30-3). Enhancer elements function to increase gene expression, while silencers reduce gene expression. Locus control regions may regulate expression of several genes within a chromosome region, while insulators prevent co-regulation of more distant genes and gene regions. All of these are sequences that bind proteins called transcription factors, which can be ubiquitous, tissue-specific, and/or temporally expressed. Promoters are located immediately 5′ of the gene and bind to RNA polymerase II, a necessary step for transcription. Other transcription factors bind upstream of the promoter and activate transcription. Enhancers and silencers are often located at a distance from the promoter, and increase or decrease transcription in a tissue-specific or temporal manner. Overall, the transcription of each gene is tightly regulated, with multiple transcription factors involved.

RNA Processing

Transcription of DNA gives rise to a precursor RNA that corresponds exactly to the genome sequence but must be modified in several ways to become functional, especially for mRNA. The first modification to mRNA is the addition of a CAP structure to the 5′ end and this is followed by the removal or splicing of introns. The mechanism of mRNA splicing depends on the specific nucleotide sequences at the exon/intron boundaries called splice junctions (Figure 30-4). The most important of these is the GT-AG rule: introns almost always start with GT (actually GU, because this occurs in RNA), which is therefore called the splice-donor site, and end with AG, which is called the splice-acceptor site. Several additional specific sequences are also needed, including sequences within the intron just after the GT splice-donor site, at a highly conserved branch site located about 40 bp before the end of the intron and just before the AG splice-acceptor site. The splicing mechanism produces the following:

1. cleavage at the 5′ donor site splice junction just before the invariant G

2. nucleolytic attack by the terminal G of the splice-donor site at the invariant A of the branch site to form a “lariat”-shaped structure

3. cleavage at the 3′ splice-acceptor site at the 3′ splice junction, leading to release of the intronic RNA as a lariat or loop, and splicing of the two exons.

Fig. 30-4 Consensus sequences at the splice-donor, branch, and splice-acceptor sites in introns of higher eukaryotes.

The GT dinucleotide at the start of the intron, the A near the end of the branch site, and the AG dinucleotide that ends the intron are invariant, whereas most others represent only the most common nucleotide. When two nucleotides are depicted at a single position, no preference is shown as to which is listed on the top or on the bottom. Abbreviations: A, adenine; C, cytosine; G, guanine; N, any nucleotide; T, thymine.

(Modified from Strachan T, Read AP. Human molecular genetics. New York: Wiley-Liss, 1996.)

These reactions are catalyzed by large complexes composed of snRNA and specific proteins. The snRNAs involved have specific sequences that allow binding with conserved intronic sequences or the recognition sites of other snRNAs. The snRNA–protein–target RNA complexes form large particles known as spliceosomes. Once a 5′ splice site is recognized, the complex scans the RNA sequence until it encounters a branch site, which aids in identifying the nearby 3′ splice-acceptor site. This process does not necessarily happen in linear order along the RNA. Rather, the order likely is determined by the vagaries of RNA folding. The last steps involve cleavage of part of the 3′ UTR, which occurs at a specific point downstream from the end of the coding sequence, and addition of a long sequence of adenosine nucleotides that is called the polyA tail. The site of the polyA tail is specified in part by the sequence AAUAAA, which is located within the 3′ UTR.

Imprinting and X-Inactivation

Several regions of the genome are subject to inactivation under special circumstances, with no changes to the DNA sequence. The processes involved thus represent a form of “epigenetic” modification. The two processes reviewed here, imprinting and X-chromosome inactivation, both can result in a phenotype when disrupted.

Imprinting

The process by which certain genes in specific chromosomal regions are expressed from only one chromosome, depending on the parental origin of the chromosome, is known as “imprinting.” Although the mechanism is only partly understood, a key component involves allele-specific DNA methylation, found predominantly at the carbon 5 position of about 80 percent of all cytosines that are part of symmetrical cytosine-guanine (CpG) dinucleotides [Jiang et al., 2004; Strachan and Read, 2010; Weksberg et al., 2003].

This process is controlled by regulatory imprinting “centers,” located nearby on the same chromosome as that of the silenced or “imprinted” gene. In effect, then, two alleles of the same gene that are identical in nucleotide sequence but derived from opposite parents are regulated differently in the same nucleus. This process is reversible, so that the silent, imprinted allele can be reactivated and the active allele silenced when passed through the germline of the opposite-sex parent. Most imprinted genes are found in large clusters of greater than 1 Mb (megabase pairs) in length. Imprinted clusters have been identified in chromosomes 6q24, 7p11.2, 11p15.5, 14q32, 15q11–q13, and 20q13.2, and others may exist as well [Cavaille et al., 2002; Gardner et al., 2000; Hall, 1990; Jiang et al., 2004; Weksberg et al., 2003; Wylie et al., 2000]. Imprinted regions share several common characteristics, including differential DNA methylation, allele-specific RNA transcription, antisense transcripts, histone modifications, and differences in timing of replication.

X-Inactivation

In mammalian cells with two (or more) X chromosomes, all but one undergo widespread gene silencing by methylation. This phenomenon, known as X-chromosome inactivation (Xi), causes one of the two X chromosomes in cells of female mammals to become transcriptionally inactive early in embryonic development, a phenomenon known as the Lyon hypothesis [Lyon, 1961, 2002]. In mutant cells with more than two X chromosomes, all but one become inactivated. This has the effect of balancing gene dosage of X-linked genes between male and female cells. The process of Xi is random, so that on average the maternally and paternally derived X chromosomes are each inactivated in approximately 50 percent of cells. Changes in this pattern are seen in female carriers of some X-linked diseases, resulting in skewing of Xi. This alteration can be favorable, with decreased severity of the phenotype, or unfavorable, with increased severity of the phenotype [Dobyns et al., 2004].

Cell Cycle and Chromosomal Basis of Heredity

Current knowledge regarding the chromosomal basis of heredity and that concerning the cell cycle are inextricably linked because the intracellular structures now known as chromosomes were first seen in cells undergoing cell division. The existence of chromosomes was foreshadowed by Gregor Mendel’s work. For years after he described independent sorting of genetic traits, occasional exceptions to Mendel’s law of segregation were discovered. Certain traits were found that were typically inherited as a group. These observations were eventually explained by the discovery of chromosomes. The nuclear material of a cell, or chromatin, appears homogeneous during most of the cell cycle, but condenses into distinct rod-shaped organelles during cell division. These tiny structures were called chromosomes because they stain darkly with various biologic dyes.

Cell Cycle

Humans begin life as a single diploid cell or zygote, which gives rise to all of the cells of the body by a combination of cell growth and cell division, with the latter including both asexual (mitosis) and sexual (meiosis) cell division. The life cycle of somatic cells is divided into four stages. After cell division, the cell enters the G₁ (gap 1) resting phase, during which DNA synthesis does not occur. Some differentiated cells, such as neurons, stop growth in a modified G₁ phase known as G₀. Late in G₁, the cell passes a critical point, after which it proceeds through the rest of the cell cycle at a standard rate. G₁ is followed by the S phase, during which DNA synthesis or replication occurs. The genetic material is duplicated in the form of two chromatids (future chromosomes), joined by attachment to a single centromere. The cell then enters the G₂ (gap 2) resting phase, which is much shorter than G₁. The G₁, S, and G₂ phases together constitute interphase.

Mitosis

Somatic cell division, or mitosis, is an elaborate mechanism that distributes one chromatid of each duplicated chromosome to each of the two daughter cells. The process is continuous but has been divided into the following five stages: prophase, prometaphase, metaphase, anaphase, and telophase (Figure 30-5).

Fig. 30-5 Diagram of mitosis demonstrating two chromosome pairs.

In prophase, the chromatin begins to condense, the nucleolus disappears, and the mitotic spindle begins to form. Prophase is followed by prometaphase, during which the nuclear membrane disappears, allowing the chromosomes to disperse in the cell and attach to the spindle by paired kinetochores located at the centromere. In metaphase, the chromosomes are maximally contracted and arranged at the equatorial plane of the cell. In anaphase, the replicated chromosomes separate at the centromere, allowing the two chromatids to become daughter chromosomes, which move to opposite ends of the cell. In telophase, the chromosomes decondense, the nuclear membrane reforms, and the nucleus returns to the interphase appearance. Shortly afterward, the cytoplasm divides to form two daughter cells. For routine studies, chromosomes are examined during metaphase. For high-resolution studies, they are examined before the point of maximal contraction, during prophase or prometaphase.

Meiosis

Reproductive cell division, or meiosis, is an even more complex mechanism in which two successive cell divisions, known as meiosis 1 and meiosis 2, give rise to the haploid germ cells (Figure 30-6). Meiosis is of critical importance in understanding many of the methods of modern molecular genetics and the pathogenesis of many genetic diseases.

Fig. 30-6 Diagram of meiosis depicting two chromosome pairs.

In meiosis 1, the chromosome number is reduced from the diploid to the haploid number. The key step consists of close pairing of homologous chromosomes during prophase 1, which is further divided into several stages. During leptotene, the chromosomes first become visible, with homologs located close together. During zygotene, the homologs begin to pair closely along their entire length, held together by a thin protein-containing structure known as a synaptonemal complex. During pachytene, synapsis or pairing is completed, and the homologs appear as a bivalent. Pachytene is the stage during which exchange of homologous segments between nonsister chromatids occurs, which is known as recombination or crossing over. The remaining steps are similar to mitosis, except that it is the paired homologs that are pulled apart rather than the centromeres. In meiosis 2, which closely resembles mitosis, the chromatids separate at the centromere to form daughter chromosomes. Ova and sperm have remarkably different timing, but the sequence of meiosis is the same.

Chromosomal Basis of Heredity

Chromosome Structure

In humans, the nuclear DNA is dispersed among 46 separate linear structures or chromosomes, each of which consists of a single, uninterrupted double helix that contains 50–250Mb of DNA, and a group of associated proteins that form the support structure or scaffolding. The scaffolding consists of five basic proteins called histones and several more acidic nonhistone proteins. Two copies of each of four histones – H2A, H2B, H3, and H4 – join to form an octamer. The DNA double helix wraps almost twice around the octamer, which involves about 140 bp. Adjacent octamers are separated by a short spacer segment of 20–60bp that is associated with histone H1. The complex of DNA and core histones is known as a nucleosome (see Figure 30-2).

Strings of nucleosomes are further compacted into a secondary helical structure known as a solenoid. These structures have a diameter of about 30 nm (see Figure 30-2) and contain six nucleosomes per turn. The solenoids are packed into large loops of 10–100 kb of DNA, which are attached to a nonhistone protein scaffolding. These loops pack together loosely to form interphase chromosomes. During early prophase, they pack together more closely to form knoblike thickenings known as chromomeres, which then coalesce further to form the bands observed in prometaphase and metaphase chromosomes when stained with appropriate dyes.

The alternating light and dark bands that characterize all nuclear chromosomes with a variety of staining methods likely reflect the compartmentalization of the genome into isochores, defined as large regions with variation in base composition or variable spacing of scaffold attachment regions. The dark bands observed with Giemsa staining are AT-rich, replicate late in the DNA synthesis phase of the cell cycle, and contain relatively few genes. The light bands observed with Giemsa are GC-rich, replicate early, and contain many genes. Some are greatly enriched for GC and contain high concentrations of genes. Most, although not all, such bands are located near the ends or telomeres of chromosomes and therefore are known as T bands.

Specialized Regions

All nuclear chromosomes have specialized regions that are required for chromosome integrity and function, including centromeres, telomeres, and origins of replication. Centromeres are DNA sequences that act in cis. That is, they act on the chromosome on which they are located and are responsible for the segregation of chromosomes during cell division. Centromeres contain extensive repeats of an approximately 171-bp unit known as alpha-satellite DNA, the sequence of which differs slightly between each chromosome. Fragments of chromosomes that lack a centromere, known as acentric fragments, are lost during cell division.

The two ends of a chromosome are called telomeres and also are required for chromosome stability. In humans, they consist of long arrays of tandem repeats of the sequence TTAGGG, which extend about 5–20 kb. DNA polymerases are unable to replicate the telomeres because of the lack of a template. This problem is resolved by the enzyme telomerase, which contains an RNA component to serve as a template to prime further synthesis on the leading strand. Further extension of the leading strand provides the needed template for the lagging strand.

Origins of replication are specialized sequences where DNA replication begins, and thus are important in maintaining chromosome number and integrity. They consist of autonomously replicating sequence elements that contain a core consensus sequence and some imperfect copies with a length of about 50 nucleotides. A consensus human autonomously replicating sequence has been identified [Strachan and Read, 2010].

Regions of variable staining known as heterochromatin consist of long arrays of repeat sequences as short as 5 bp. These regions are located primarily in the pericentromeric regions of chromosomes 1, 9, and 16, and in distal Yq. The five human acrocentric chromosomes have small satellites attached to the short arm by short stalks or secondary constrictions that contain the rRNA genes.

Chromosome Number

Each human somatic cell contains 46 chromosomes that consist of 22 matched pairs known as autosomes and two sex chromosomes: XX in females and XY in males (Figure 30-7). In contrast, human germ cells contain only 23 chromosomes, consisting of 22 unpaired autosomes and a single sex chromosome. The former is known as the diploid or 2n number, and the latter is known as the haploid or 1n number. The autosomes were numbered according to length, with chromosome 1 the longest and chromosome 22 thought to be the shortest. Although chromosome 21 later proved to be shorter than chromosome 22, the numbers were retained for historical reasons. The two members of each pair of autosomes and the two X chromosomes in females carry the same genes and are known as homologous chromosomes, or homologs. Although they appear similar under the microscope, homologs are not strictly identical. They contain the same genes, but the nucleotide sequence differs at thousands of positions.

Fig. 30-7 Standardized diagram or idiogram of human chromosomes at the 400-band stage.

Chromosome Identification

Individual chromosomes may be seen only when tightly contracted during cell division. Since DNA replication is complete, each chromosome consists of two chromatids that are joined at the primary constriction or centromere. In standard cytogenetic nomenclature, the centromere divides the chromosome into two arms, with the shorter designated the “p” arm and the longer the “q” arm. The tip of each arm is the telomere. Human chromosomes are classified into three types according to the position of the centromere:

1. metacentric, in which the centromere is centrally placed and the two arms are of about equal length

2. submetacentric, in which the centromere is off center and the arms are of unequal length

3. acrocentric, in which the centromere is near one end.

Organization of the Human Genome

The human genome comprises the total of all genetic information in the cell. It is divided into two separate compartments – a large and complex nuclear genome and a much smaller and simpler mitochondrial genome. The mitochondrial genome consists of a single circular DNA molecule that is present in many copies in each mitochondrion, while the nuclear genome is distributed among the 46 nuclear chromosomes. The available data regarding the genome have become much more extensive and accurate with completion of the Human Genome Project. A few of the most useful Human Genome Project-related websites are listed in Table 30-1.

The Nuclear Genome

The human nuclear genome consists of approximately 3 × 10⁹ bp, or 3000 Mb of DNA. About 75 percent of this represents unique or single-copy DNA, which includes genes and some important regulatory elements. The remaining 25 percent consists of several classes of repetitive DNA [Lander et al., 2001; Nussbaum et al., 2007; Venter et al., 2001].

Genes and Conserved Noncoding DNA

Somewhat surprisingly, recent estimates predict that the human genome contains less than 30,000 protein-coding genes (possibly closer to 20,000) and an uncertain number of other genes producing functional RNA products. This is far fewer than earlier estimates, and accounts for only about 1.2 percent of nuclear DNA [Lander et al., 2001; Venter et al., 2001]. Another 5 percent of the human genome is more conserved than would be expected from estimates of neutral evolution, which suggests that many of these regions have specific, regulatory functions [Chiaromonte et al., 2003; Waterston et al., 2002]. Studies of these highly conserved regions of DNA have used different thresholds, such as stretches of more than 100 bp with 70–80 percent conservation between mouse and human. Some of these regions have been found to contain important noncoding elements [Dermitzakis et al., 2002, 2003; Frazer et al., 2004; Hardison, 2000]. More stringent analysis demonstrates that the human genome contains 481 sequences of 200 or more bp that are 100 percent conserved among human, mouse, and rat [Bejerano et al., 2004]. These segments were designated “ultra-conserved elements,” and are preferentially located near genes involved in RNA processing or regulation of transcription and development. Similarly, about 5000 sequences of 100 bp or more are conserved among these three species, which emphasizes that noncoding sequences are common and important.

Repetitive DNA

Repetitive DNA in the human genome consists of several classes of DNA whose nucleotide sequence is repeated, either exactly or with minor variations, hundreds to millions of times. Some classes are clustered, whereas others are dispersed throughout the genome. Clustered, repeated sequences constitute 10–15 percent of the genome and are collectively called satellite DNA because of their separation from other DNA on density centrifugation. Satellite DNA consists of head-to-tail or tandem arrayed repeat sequences that can extend for several thousand kb. Dispersed, repeat sequences constitute 6–10 percent of the genome and belong to several different classes. Minisatellite or variable number of tandem repeat (VNTR) sequences are dispersed, intermediate-length (15–65 bp) repeats that usually span only several kb. The Alu family of DNA repeats includes about 500,000 related sequences that are each about 300 bp in length and together make up about 3 percent of the genome. The L1 family of repeats includes about 10,000 related sequences that extend up to 6 kb in length and make up another 3 percent of the genome. Although the origin of these sequences is not known, no functions have been identified, and it appears likely that they simply exploit cellular processes to propagate themselves. Several classes have been useful as polymorphic DNA markers.

Low Copy Repeats

Segmental duplications, also known as low copy repeats (LCRs), are DNA sequences of 10–250 kb, present in multiple copies with greater than 95 percent sequence identity, that make up approximately 5 percent of the human genome [Babcock et al., 2003; Bailey et al., 2002; Cheung et al., 2001; Stankiewicz and Lupski, 2002]. LCRs are dynamic regions of the genome because specific repeats tend to cluster within the same genomic regions, where they mediate unequal nonhomologous recombination events, producing segmental deletions and duplications that are collectively designated “copy number variants” (CNVs). Several of these have been associated with well-known developmental disorders in humans, such as Williams’ syndrome in 7q11.23, Angelman’s syndrome and Prader–Willi syndrome in 15q12, hereditary neuropathy with predisposition to pressure palsies and Charcot–Marie–Tooth neuropathy type 1A in 17p12, Smith–Magenis syndrome in 17p11.2, and DiGeorge’s syndrome in 22q11.2 [Babcock et al., 2003]. Many new CNV-associated devlopmental brain disorders have been described over the past few years.

Polymorphisms

A mutation is a permanent change in the DNA of an individual organism, specifically a change in the nucleotide sequence anywhere in the genome [Nussbaum et al., 2007]. Genetic diseases and many cancers are caused by mutations that adversely affect function of one or more genes, although most mutations have little or no effect on gene function and therefore do not change the survival or reproductive fitness of an individual. Some of these persist in the population as morphologic variants known as polymorphisms. Sequence changes that have frequencies of less than 1 percent are known as rare variants, whereas those with frequencies of 1 percent or more are known as polymorphisms. By convention, a genetic polymorphism is defined as the occurrence of two or more variants or alleles in a region of DNA where at least two alleles appear with frequencies greater than 1 percent. Several different classes of polymorphisms occur in the genome, and several methods in molecular biology take advantage of the normal variation between individuals.

Minisatellites

One of the most useful classes of polymorphisms in the genome is that of the minisatellite or VNTR DNA sequences. These are intermediate-length (15–65 bp) DNA sequences that are repeated one to several dozen times in tandem and usually span several kb in total length. They are highly polymorphic, and their extreme polymorphic nature, coupled with the complexity of multilocus minisatellites, makes them valuable for DNA fingerprinting applications, such as forensic, paternity, and zygosity testing and linkage mapping. They also are inherently unstable and susceptible to mutation at a higher rate than observed for other sequences of DNA.

Microsatellites

Microsatellites, also known as satellite DNA or short tandem repeats, are segments of DNA 2–5 nucleotides in length (dinucleotide, trinucleotide, tetranucleotide, or pentanucleotide repeats) that are scattered throughout the genome in noncoding regions between genes or within genes (in introns). They often are used as markers for linkage analysis because of the naturally occurring high variability in repeat number between individuals. These regions are inherently unstable and susceptible to mutations.

The most common microsatellite family consists of 50,000–100,000 cytosine-adenine (CA) repeats, which consist of short tandem repeats of the dinucleotide CA on one strand and guanine-thymine on the complementary strand. They thus take the form (CA)n/(GT)n, with n in the range of 6–30 [Weber and May, 1989]. The number of repeats within a (CA)n block varies greatly among different members of a species, producing a set of alleles that always differ in size by multiples of two bases. About 70 percent of the human population is heterozygous at any given (CA)n repeat locus, making these highly polymorphic. The human genome contains about 50,000–100,000 interspersed (CA)n blocks, which is enough to place 1 block every 30–60 kb, if evenly spaced.

For both VNTR and CA repeat sequences, the combination of high frequency in the genome and a high rate of polymorphism has made them very useful for genetic mapping and association studies. Some microsatellite repeats, most often trinucleotide repeats, present within coding regions of genes or, less often, the 5′ or 3′ UTR, can expand to an abnormal length and are the basis of triplet repeat diseases such as Huntington’s disease, some forms of spinocerebellar ataxia, and fragile X syndrome.

Single-Nucleotide Polymorphisms

Single-nucleotide polymorphisms (SNPs, pronounced “snips”) are DNA sequence variations that occur when a single nucleotide (A, T, C, or G) in the genome sequence is changed. For example, a SNP might change the DNA sequence TCACG to TTACG. The most common sequence change involves replacement of cytosine (C) with thymidine (T), which accounts for about two-thirds of all SNPs. As with other types of sequence variation, a SNP must occur in at least 1 percent of the population to be classified as a polymorphism. SNPs occur in both unique-sequence (coding and noncoding) and repetitive DNA, and are responsible for about 90 percent of human genetic variation. On average, SNPs are found approximately every 100–300 bases along the entire human genome. Although most SNPs likely have no function, some are known to influence disease predisposition or responses to drugs, and thus are proving to be very valuable in studying the causes of common human diseases. The current inventory of known SNPs can be found in the Human SNP database (dbSNP) on the NCBI Entrez website (see Table 30-1).

Restriction enzymes are DNA-cutting enzymes or endonucleases derived from bacteria that cut DNA at specific short sequences found at locations across the entire genome. SNPs can alter the sequences recognized by restriction enzymes, thus adding or removing a cutting site. This is the biological basis for restriction enzyme fragment length polymorphisms (RFLPs). Depending on the location of restriction enzyme sites, specific DNA fragment lengths are obtained on digestion with restriction endonucleases. The presence of a SNP at one of these restriction enzyme sites will affect cleavage and produce two DNA fragments of different sizes that is the RFLP. RFLPs also can be produced by any change that alters the size of the DNA fragment on which the restriction site is located, such as deletions or duplications. RFLPs are a measure of naturally occurring variations or polymorphisms of normal DNA, and are inherited according to mendelian principles. RFLPs have been useful for gene mapping.

Mitochondrial Genome

Mitochondria are cellular organelles that are primarily responsible for cellular respiration and production of adenosine triphosphate. Each cell contains numerous mitochondria, and each mitochondrion contains many copies of a small 16.5-kb circular chromosome, adding up to thousands per cell. The mitochondrial chromosome contains 37 genes that code for two types of rRNA, 22 types of tRNA, and 13 polypeptides. The two DNA strands differ significantly in base composition, with a heavy strand rich in guanines that codes for 28 genes, and a light strand rich in cytosines that codes for 9 genes. It is very densely packed, with 93 percent comprising coding sequence [Strachan and Read, 2010].

All of the genes coded by the mitochondrial chromosome are expressed only in the mitochondria. The rRNA genes differ in size from those in nuclear DNA. The genetic code by which tRNAs decipher mRNAs differs slightly from nuclear DNA. The 13 polypeptides function as subunits of the mitochondrial oxidative phosphorylation system. The nuclear genome encodes the remaining 80 or more subunits and also encodes all mitochondrial ribosomal proteins and many other essential genes, such as mitochondrial DNA and RNA polymerases.

Human Genome Project

The importance of DNA, including both genes and noncoding regions, became increasingly apparent during the 1970s and 1980s, leading to one of the most ambitious scientific research projects ever undertaken – a plan to sequence the entire human genome. This project, which was begun in 1990, came to be known as the Human Genome Project. The goals of the project, as taken from the Human Genome Project website (see Table 30-1), were as follows:

to identify all of the approximately 20,000–25,000 genes in human DNA

to determine the sequences of the 3 billion chemical bp that make up human DNA

to store this information in databases

to improve tools for data analysis

to address the ethical, legal, and social issues (ELSIs) that may arise from the project.

The successful completion of the Human Genome Project has had the effect of changing genetic research from “bottom up” to “top down” research. That is, a major goal of research before completion of the Human Genome Project was to determine the nucleotide sequence of genes associated with the disease under study. Following completion of the project, research now typically begins with the nucleotide sequence. Although the Human Genome Project has been officially completed, numerous difficult regions of duplicated DNA remain to be sequenced correctly, and data analysis of the entire project is on-going. The effects of the Human Genome Project have already been enormous. Research projects that once required several years now can be done in several weeks or months.

Technology of Cytogenetics

The modern field of cytogenetics began in the 1950s, when methods for arresting cells during mitosis were developed. This is a stage of the cell cycle when chromosomes are maximally contracted and can be visualized under the microscope with various stains. The human diploid chromosome number of 46 was discovered, and many different defects in chromosome number and structure were found, such as Down syndrome. The field has expanded, with development of new computerized image recognition systems for chromosome identification and a variety of methods that make use of molecular genetics methodologies. Thus, the distinction between cytogenetics and molecular genetics has become blurred. In general, cytogenetics tests examine large regions of the genome, such as chromosomes or regions of chromosomes, whereas standard molecular genetics methods focus on smaller regions of the genome, from single nucleotides to genes and gene regions.

Chromosome Analysis

When methods for examining chromosomes under the microscope were first developed, individual chromosomes could not be identified because of solid staining. Instead, they were separated into seven groups (A to G), based on their length and centromere position. It is now possible to identify all 24 human chromosomes individually, using several different staining techniques that take advantage of differences in chromatin structure and composition to produce a recognizable pattern of bands, as shown in the diagram in Figure 30-7. These methods are now used to examine the entire chromosome complement of an individual, which is known as the karyotype. The same term is used to describe the normal chromosome complement of a species.

The three most commonly used staining methods are G-banding, R-banding, and Q-banding. For Giemsa or G-banding, the chromosomes are treated with trypsin and then Giemsa stain to produce the alternating light and dark bands known as G bands. For reverse or R-banding, the chromosomes are pretreated with heat and then stained with Giemsa. The resulting R bands are the exact reverse of those produced by G-banding. For quinacrine or Q-banding, chromosomes are stained with quinacrine mustard and examined under fluorescent light. A specific pattern of bright and dim Q bands is seen, with the bright Q bands corresponding to the dark G bands.

For standard chromosome analysis, cell division is arrested in metaphase, when 400–550 bands per haploid set can be seen. Analysis should be performed on cells with at least 550-band resolution. For high-resolution chromosome analysis, cell division is arrested in prophase before full contraction has occurred, when 550–850 bands per haploid set can be seen. This technique is labor-intensive but may be useful for finding very small chromosome rearrangements.

A uniform system of human chromosome classification and nomenclature was developed at a series of international conferences, and most recently revised in 2009 [ISCN, 2009]. In this system, the chromosomes are separated into regions and subregions, based on the banding pattern. For example, band 17p13.3 (read as “17-p-one-three-point-three”) is found near the telomere of the short arm of chromosome 17. During the past decade, computer image analysis systems have been developed that can locate chromosome spreads on the slide, recognize and automatically sort chromosomes, and help with analysis. However, review by trained cytogenetic technicians is still required.

Fluorescence In Situ Hybridization

Fluorescence in situ hybridization (FISH) is a technique used to detect specific chromosomes or chromosomal regions through hybridization (attachment) of fluorescently labeled DNA probes to denatured chromosomal DNA. Examination under fluorescent lighting detects the presence or absence of the hybridized fluorescent signal (and hence presence or absence of the chromosome material).

This study usually is performed on metaphase chromosomes (Figure 30-8) but also can be used on cells in interphase. Interphase FISH often is used for rapid detection of specific types of aneuploidy in fetal cells and for detection of certain deletions, duplications, and other abnormalities in tumor cells. In contrast with metaphase FISH, interphase FISH does not permit visualization of the actual chromosomes, so that most types of structural rearrangements cannot be detected. FISH can be used to examine a small set of chromosomal regions at once, usually 1 or 2, although study of 8–10 is possible with special fluorescent markers. Telomere-specific FISH analysis is an example of hybridization with multiple probes simultaneously. Telomere-specific probes that correspond to the telomeres of all of the chromosomes are hybridized to metaphase chromosomes in groups and used to detect abnormalities at the ends of chromosomes that are not visible by routine chromosome analysis.

Fig. 30-8 Fluorescence in situ hybridization (FISH) of a standard metaphase spread using a set of three overlapping cosmids at D17S379.

The top arrow points to two distinct D17S379 probe signals on the two chromatids of one chromosome 17 homolog. The bottom arrow points to the tip of the other chromosome 17 homolog, which lacks the normal signal and is thus deleted for this probe. The chromosome 17 centromeres are marked by a larger signal just below (top) or above (bottom) the arrows. Different colors are used for the D17S379 and 17 centromere probes so that they can be differentiated easily under the microscope.

(Courtesy of David H. Ledbetter, Emory University, Atlanta, GA.)

Chromosome Microarrays

Several new methods have been developed to test for loss or gain of DNA sequence that have much higher resolution than chromosome analysis. These include comparative genomic hybridization (CGH) using either bacterial artificial chromosomes or short DNA molecules called oligonucleotides or “oligos,” or SNP arrays modified to detect dosage of individual markers.

CGH is a molecular cytogenetics method developed to detect changes in copy number between two genomes, typically those of a control and an experimental subject. The alterations are classified as DNA gains (duplications) and losses (deletions), and reveal a characteristic pattern that includes mutations at chromosomal and subchromosomal levels. Equal amounts of DNA from two different sources (control and experimental) are labeled with two different fluorescent labels and hybridized to normal metaphase chromosome spreads. For example, control DNA may be labeled with red, and experimental subject DNA with green. When the control and the subject samples both contain a DNA fragment of interest, both labels are seen; this produces yellow fluorescence on the metaphase chromosome spread. When a given DNA fragment is deleted in the experimental subject, only the red control label is seen. When a given DNA fragment is duplicated in the subject, only the green subject label is seen.

Array formats for CGH have been developed that have increased the resolution of this technique for detecting smaller deletions and duplications. Instead of hybridizing on to metaphase chromosome spreads, a set of DNA probes across the entire genome is used. The probes are placed on microarrays that can detect DNA fragments with the same sequence as for the probe. Several different methods have been developed using different probes, such as bacterial artificial chromosome (BAC) DNA, complementary DNA (cDNA), or DNA fragments produced by cleavage of genomic DNA by the restriction enzyme BglII [Ishkanian et al., 2004; Lucito et al., 2003; Sebat et al., 2004]. These offer different resolution, with probes every 15–100 kb approximately. The total number of probes has increased from approximately 40,000 to 1 million on commericially available arrays. CGH technology has numerous advantages over FISH, including coverage of the entire genome, finer resolution, and lower cost per probe tested. Thus, clinical tests based on CGH technology have begun to replace FISH technology.

DNA gains and losses may also be detected using the same SNP-based microarrays in common use for genotyping. They are used to measure intensity differences and ratios of alleles at up to 1 million single nucleotides across the genome, detecting many CNVs, as well as detecting intercellular mosaicism and copy-neutral loss of heterozygosity, as occurs with uniparental disomy and other rare mechanisms. In general, the cost per probe is lower with SNP-based microarrays than for CGH-based microarrays, but the results are easier to interpret for CGH-based arrays due to more favorable signal-to-noise ratio. An example of a 14-Mb deletion of human chromosome 1 is shown in Figure 30-9.

Fig. 30-9 Data from a SNP-based chromosome microarray shows a 14-Mb deletion of human chromosome 1p31.

The copy number loss is shown here by reduced dosage (red arrow). These SNP data were generated from a Human660W-Quad v1 DNA Analysis BeadChip® from Illumina, Inc., and the figure generated from Nexus Copy Number® software from Biodiscovery, Inc.

Technology of Molecular Genetics

Molecular genetics is that branch of genetics concerned with the structure and function of genes at the molecular DNA level. The rapid gains in this field during the past decade have resulted from discovery of several new techniques that have made detailed analysis of both normal and abnormal genes possible. These discoveries have in turn led to better understanding of many important biologic processes, as well as the molecular basis for many genetic diseases. Several of these methods have proved to be of particular importance and are commonly used in research studies. Some familiarity with these procedures is helpful in understanding the nature and significance of new discoveries in this area. This section presents a brief introduction to some of the more important procedures. More detailed information can be found in several laboratory manuals, especially Current Protocols in Human Genetics [Haines et al., 2010].

DNA Clones

A vector is a DNA molecule that can replicate itself in a host cell, such as a bacterium or yeast. Integration of DNA fragments into the vector with restriction endonucleases and DNA ligase results in propagation of the DNA fragment along with the vector, producing large quantities of the fragment of interest. Vectors with inserted recombinant DNA fragments of interest are known as clones, and the methods used to generate them are collectively known as cloning. Clones are chosen at random from clone libraries, which are large collections of clones originating from a specific source, such as the total genomic DNA or chromosome-specific DNA of a human or other organism.

Several common types of vectors have been used, including phage (bacterial virus) (up to 20-kb insert DNA), plasmids (accessory circular bacterial chromosomes, used to clone several kb of DNA), cosmids (approximately 35–45 kb), BACs (approximately 100–150 kb), P1 plasmid artificial chromosomes (PACs, approximately 100–150 kb), and yeast artificial chromosomes (YACs, up to 1000 kb). The most commonly used at present are BACs and PACs [Stein, 1997], which have proved to be useful for FISH, CGH, and many other technologies. Creation of chromosome-specific DNA libraries by cloning followed by mapping and sequencing is the basis of the information obtained through the efforts of the Human Genome Project.

Restriction Enzymes

Restriction enzymes or endonucleases are bacterial enzymes that recognize short, double-stranded DNA sequences and cut the DNA molecule at or near the recognition site [Lewin, 2007]. When a mutation occurs that changes as few as one of the basepairs in the sequence, it is no longer recognized and cut by the enzyme. Several hundred restriction endonucleases have been isolated. Most of the recognition sites are palindromes, which means that they read the same in the 5′ to 3′ direction on both strands, and most of the enzymes leave short overhangs of single-stranded DNA that are known as sticky ends. For example, the enzyme BamHI recognizes the sequence GGATCC and cuts it between the two G bases, leaving the following 4-base overhang:

Restriction endonucleases have several important uses in molecular biology. First, they are used to cut or “digest” large DNA molecules into a reproducible collection of a million or more smaller and more manageable DNA fragments that can be identified on the basis of their size. Second, a mutation at any of the recognition sites that changes the sequence, or a mutation elsewhere that creates a new recognition site, can potentially be detected. Finally, DNA molecules cut with the same restriction endonuclease all have the same sticky ends and may be joined using the enzyme DNA ligase. This condition allows specific DNA sequences of interest to be inserted into vectors and introduced into cells such as the bacterium Escherichia coli or the yeast Saccharomyces cerevisiae (common bakers’ yeast), which then can be propagated to produce large amounts of the sequence of interest. DNA sequences inserted into a vector are known as recombinant DNA. This is the basis of DNA cloning.

Polymerase Chain Reaction

The polymerase chain reaction (PCR) technique has revolutionized the field of molecular genetics. It is a simple but elegant method to amplify a small amount of DNA greater than a million-fold within a matter of hours. PCR results in the enrichment and amplification of a particular DNA region of interest from the total genome, making it more amenable to study, without the use of cloning or Southern blots (described later). The region of DNA with known base sequence to be amplified, such as part of a gene, is selected, and two short DNA sequences flanking the region of interest are synthesized to serve as primers for amplification.

To construct the primers, a short sequence of about 20–25 bp just upstream or 5′ of the target sequence on the DNA “sense” strand is chosen as a starting site, and an oligonucleotide (or primer) that is complementary to this short upstream sequence is synthesized. Another short sequence upstream or 5′ of the target sequence on the complementary (or “antisense”) strand also is chosen, and a second complementary oligonucleotide is synthesized. The two primers thus flank the region of interest on opposite strands. The DNA is denatured to separate the strands, after which the oligonucleotides are hybridized to the complementary sequences. The short oligonucleotides then serve as primers for synthesis of a complete complementary DNA strand with appropriate deoxynucleotide triphosphate molecules (adenosine, cytosine, guanosine, and thymidine triphosphate [dATP, dCTP, dGTP, and dTTP]) being added; this is mediated by the enzyme DNA polymerase. Because both strands are copied, one round of amplification results in a complete second copy of the original target sequence. Repeated cycles of heat denaturation, hybridization of the primers, and DNA synthesis result in the exponential amplification of the target sequence. Within a few hours, more than a million copies of the sequence may be made (Figure 30-10).

Fig. 30-10 The polymerase chain reaction.

Depiction of the cycling process of denaturation, annealing, and extension that results in the exponential amplification of DNA.

Methods of General Mutation Detection

DNA Sequence Analysis

Sanger sequencing

DNA sequence analysis is the most sensitive and direct method to detect mutations at the level of individual nucleotides [Haines et al., 2010]. The most widely used method of DNA sequencing is the Sanger method, also known as dideoxy sequencing or chain termination. It is based on the use of synthetic nucleotide analogs – 2,3-dideoxynucleoside triphosphates (ddNTPs). Dideoxy NTPs differ from nucleotides found in natural DNA in that they lack the 3′-hydroxyl group. When integrated into a sequence, they prevent the addition of further nucleotides as phosphodiester bonds cannot form between a dideoxynucleotide and the next incoming nucleotide. Thus, the DNA chain is terminated.

DNA sequencing most commonly is performed by the method of cycle sequencing, in which the DNA region to be sequenced (which is first generated by PCR) is denatured and a short oligonucleotide is annealed to one of the template strands. DNA synthesis occurs in the presence of DNA polymerase, ddNTPs, and nucleotides and starts from the 3′ end of the annealed oligonucleotide. As the DNA is synthesized, nucleotides are added on to the growing chain by the DNA polymerase; however, on occasion, a ddNTP is incorporated into the chain in place of a normal nucleotide, resulting in a chain-terminating event. At the end of the sequencing reaction, multiple DNA molecules are present such that, at each nucleotide position, a proportion of molecules are terminated owing to the incorporation of a ddNTP. These products are separated by size on capillary or polyacrylamide gel electrophoresis systems, and the fluorescently labeled ddNTPs are detected. Each ddNTP is labeled with a different fluorophore. Shorter DNA molecules migrate faster than longer molecules on electrophoresis, and by analyzing the different fluorescent signal of all of the different-sized molecules, the DNA sequence can be determined. For example, ddCTP is labeled with a blue fluorophore. Everywhere a G residue exists in the template DNA, either a dCTP or a ddCTP will be incorporated into the synthesized strand. For every G residue in the template DNA, a proportion of molecules with a ddCTP at that site will be present. Each of these molecules will be of a different size, depending on where a G residue resides in the sequence, and will be distinguished by electrophoresis. The same applies for the other ddNTPs. Specialized DNA sequencing software exists that can convert the different fluorescent signal to different-color peaks that constitute a DNA sequence chromatogram.

High-throughput sequencing

High-throughput sequencing, also known as next-generation or second-generation sequencing, is a much more high-throughput form of DNA sequencing that is revolutionizing the field of genetics and resulting in the ability to sequence entire genomes at a fraction of the cost and time compared to Sanger-based sequencing [Haines et al., 2010]. The basis of second-generation sequencing is cyclic-array sequencing, which is the sequencing of a dense array of DNA features by repetitive cycles of enzymatic reactions and imaging-based data collection. At the time of writing, there are three main commercially available platforms for second-generation sequencing, which include: Solexa technology (used by Illumina), 454 sequencing (used by Roche Applied Science), and the SOLiD platform (used by Life Technologies). While each of these platforms differs with regard to the biochemistry of the sequencing reaction and the generation of the array, the overall concept is similar. This is as follows:

1. DNA is randomly fragmented and common adaptor sequences are ligated to the ends to form “libraries.”

2. Each library is clonally amplified by approaches such as emulsion PCR or bridge PCR to form clusters/colonies of sequence features.

3. Each clonally amplified product is spatially clustered or arrayed on a solid surface.

4. Sequencing by synthesis of the clonally amplified products is performed by alternating cycles of enzyme-mediated nucleotide extension and imaging.

Specialized software present for each of the platforms converts the images obtained into DNA sequence. The array-based format of second-generation sequencing, as compared to the capillary-based format of Sanger sequencing, allows for a much higher degree of parallel processing in second-generation sequencing; this results in a throughput ranging from hundreds of megabases to gigabases of sequence per run at a dramatic reduction in cost per base sequenced.

Second-generation sequencing produces short reads of DNA sequence ranging from approximately 36 bp to approximately 400 bp, depending on the platform used. The huge amount of sequence information generated requires intensive bioinformatics and computational approaches for mapping and aligning the sequence data generated to the appropriate genomic reference sequence. Several different computational pipelines are currently used for mapping, aligning, and base-calling of second-generation sequencing data. The accuracy per base of second-generation sequencing is still of relative low quality; therefore, the more times a base is sequenced, or the “deeper” the coverage at that base, the higher the accuracy of the base call. Currently, 8× coverage per base is considered the minimum requirement for base calling; however, for accurate base-calling comparable to Sanger-based sequencing, a coverage of 20–30× per base is probably required.

Second-generation sequencing can be used for the sequencing of whole genomes, exomes (coding exons in the genome), or targeted subsets of genes. For targeted sequencing of exomes or a subset of genes, an enrichment/amplification of this sequence prior to sequencing is necessary. Enrichment can be performed using oligo capture platforms that are available in solution or solid phase. Exome capture kits are commercially available that contain oligos specific to the human exome. Custom oligo kits can be manufactured for specific genes of interest. Targeted amplification of select genes for second-generation sequencing can also be performed by standard PCR; however, the number of genes sequenced by this approach will be limited to the number of genes amplifiable. Newer methods of droplet-based PCR are available that increase the throughput of the number of genes that can be amplified per reaction.

Mutation Scanning

Mutation scanning refers to methods used to determine the presence of a sequence change in a region of DNA (such as an exon of a gene). These methods need to be followed up by DNA sequencing, however, to determine the exact nature of the sequence change. Mutation scanning generally is less labor-intensive, faster, and more cost-effective to perform than DNA sequencing and may be the method of choice when a large gene needs to be analyzed for the presence of mutations. The exons of the entire gene are subjected to mutation scanning, and only those exons that demonstrate the presence of a sequence change need to be sequenced to determine the precise nature of the sequence change.

Various methods of mutation scanning exist that differ in their sensitivities of mutation detection [Cotton, 1997; Eng and Vijg, 1997; Grompe, 1993]. The general basis of most mutation scanning methods is the abnormal migration of a DNA fragment that contains a sequence change from a normal “wild-type” sequence. All are PCR-based methods – the DNA region to be studied is amplified by PCR before the different mutation scanning methods are performed.

Single-stranded conformational polymorphism

With the single-stranded conformational polymorphism (SSCP) technique, DNA fragments are denatured and made single-stranded. The single-stranded DNA takes on a specific conformation, depending on its sequence. A change in the DNA sequence will result in a change in the single-stranded conformation structure. Denatured fragments are separated on polyacrylamide gels under a series of differing conditions, and fragments with different conformation structures will migrate differently and can be detected. This method has an approximate sensitivity of 80 percent for detecting DNA sequence changes.

Denaturing gradient gel electrophoresis

With denaturing gradient gel electrophoresis (DGGE), DNA fragments are denatured and allowed to re-anneal slowly. In the absence of a sequence change, the only DNA molecules that will be formed are homoduplexes (i.e., with no mismatch of DNA sequence between the complementary strands). In the presence of a sequence change, molecules representing both homoduplexes and heteroduplexes (i.e., a normal-sense strand binds to a mutant antisense strand, or vice versa, to create a mismatch of DNA sequence at the position of the mutation) will be formed. These products are allowed to migrate on a polyacrylamide gel with an increasing gradient, and at a particular gradient, the structure of the heteroduplex molecules changes significantly and affects its migration, compared with that of the homoduplex molecules. For this technique it is important to have specialized GC clamps at the ends of the DNA fragment that enhance the difference between the heteroduplex and homoduplex molecules, and these are included in the PCR primers used to generate the DNA fragment. This method has a high sensitivity, approximately 98–100 percent.

Denaturing high-performance liquid chromatography

Denaturing high-performance liquid chromatography (DHPLC) also is based on the separation of DNA homoduplex from heteroduplex molecules. The medium of separation is a column composed of a polystyrene-divinylbenzene copolymer to which DNA binds and is released through interaction with specific buffers. At increased temperatures, heteroduplex molecules are released from the column faster than are homoduplex molecules, and therefore can be detected. This method does not require the sophisticated GC clamps of DGGE and has a high sensitivity, close to 100 percent.

Protein truncation test

As the name suggests, the protein truncation test (PTT) is used for the detection of mutations that result in a protein truncation, such as a frameshift or nonsense mutation. The starting material is generally RNA that is converted to cDNA by reverse transcription PCR. In vitro transcription and translation are performed, and the protein products are labeled and separated by sodium dodecyl sulfate (SDS) polyacrylamide gel electrophoresis. A DNA fragment that contains a truncation mutation will result in a protein product that will be shorter in length than a normal product, and therefore can be detected. This technique will not detect mutations that do not result in protein truncation, such as missense mutations.

Southern blot analysis

Southern blot analysis, named after its inventor, EM Southern, has been used extensively for DNA analysis, particularly for the detection of DNA abnormalities. With the advent of PCR techniques, the use of this analytic method is decreasing. It still remains useful, however, for the detection of large deletions or duplications that affect a part or the whole of a gene.

Genomic DNA is digested with a restriction enzyme, which results in the production of different-sized DNA fragments that are separated on an agarose gel by electrophoresis. The separated DNA fragments are made single-stranded by treatment with an acid and then transferred and fixed on to a nylon membrane. The single-stranded fixed DNA is hybridized with a radioactively labeled probe specific for a certain gene. A probe can be a cloned fragment of a gene or a PCR product of a gene. The probe will hybridize to that region of the DNA where it finds its complementary sequence. Excess probe is washed off, and the nylon membrane is exposed to an x-ray film to reveal where the probe has bonded. When a large deletion, duplication, or other form of DNA rearrangement occurs, either deletion or creation of restriction enzyme sites can result. Using a probe that binds to the deleted, duplicated, or rearranged area will reveal a different pattern of hybridization, based on what restriction enzymes have been affected. The presence of a different hybridization pattern, compared with a normal control, is indicative of a change in the DNA structure.

Methods for Detecting Specific Sequence Changes (Genotyping)

Different methods exist for the detection of specific mutations in genes, especially point mutations or insertions and deletions of a few basepairs. These methods are useful for the detection of the common mutations present in diseases such as sickle cell anemia, cystic fibrosis, and hereditary hemochromatosis. These techniques are possible only when the base sequence and precise point mutation responsible for the disease phenotype are known.

Allele-Specific Oligonucleotide Hybridization

For allele-specific oligonucleotide (ASO) hybridization, separate ASO probes complementary to either the normal or a specific mutant allele are synthesized. The probes consist of the point mutation or its normal counterpart and 9–12 bp flanking it on either side, for a total length of 19–25 bp. The probes then are hybridized to the DNA source under stringent conditions. Because the probes are short, they are highly sensitive to sequence changes at even a single nucleotide. A probe complementary to the normal allele will hybridize to the normal allele but not to the mutant allele, and vice versa. Thus, DNA from a person homozygous for the normal gene will hybridize to the ASO probe complementary to the normal DNA sequence, but not to the probe complementary to the point mutation. DNA from persons homozygous for the mutation will hybridize to the ASO probe complementary to the mutation, but not to the normal probe. Only DNA from heterozygotes will hybridize to both probes. It is important to remember that DNA from persons who are heterozygous for different mutations in the same gene – compound heterozygotes – also will hybridize to the normal probe. Several different formats have been developed to detect ASO probes, ranging from radioactivity to chemiluminescence to fluorescence, and can be performed in both solid and liquid phases.

Single-Base Extension

Oligonucleotide primers of approximately 20 bp are designed to hybridize just upstream or downstream of the nucleotide to be genotyped, and an extension reaction – more specifically, a single-base extension (SBE) reaction – is performed in the presence of ddNTPs. In the case of a normal sequence, a ddNTP corresponding to the complementary normal nucleotide is added to the primer; in the case of a mutation, a ddNTP corresponding to the complementary mutant nucleotide is added to the primer. The ddNTPs generally are labeled with different fluorochromes, allowing their detection. In a person homozygous for a normal gene, extension will occur only with the ddNTP corresponding to the normal allele, whereas in a person homozygous for a mutated gene, extension will occur only with the ddNTP corresponding to the mutated allele. A heterozygous person will have both normal and mutated extension products. Several different formats have been developed to detect SBE products and can be performed in both solid and liquid phases.

DNA Arrays

In DNA arrays, hundreds to thousands of DNA targets are arranged (arrayed) on a solid medium such as a glass slide or microchip. DNA arrays are of two main types – genotyping arrays and sequencing arrays. Genotyping arrays consist of hundreds to thousands of different ASO probes or SBE primers arrayed on a chip that allows the genotyping of a large number of different loci simultaneously. DNA sequencing arrays consist of a series of hundreds of thousands of approximately 20-bp oligonucleotides, spanning the length of a gene, that are arranged with equal spacing, or “tiled,” across the microchip. For each nucleotide to be sequenced, four oligonucleotides are present that differ only at the central position and have an A, C, G, or T. Digested DNA is fluorescently labeled and hybridized to the tiled oligonucleotide chip under stringent conditions. The DNA hybridizes to those oligonucleotides that correspond to the correct sequence and the fluorescent signal read from which the DNA sequence is deciphered. DNA sequencing and genotyping arrays are available for several genes and are still largely used for research, although some clinical applications have been developed. Expression arrays are different and refer to cDNA arrays to which RNA is hybridized and are used to determine the expression profile of hundreds of genes simultaneously. This technique is currently used only in the research setting.

Restriction Enzyme Analysis

Some mutations result in the destruction or creation of a restriction enzyme site in the DNA sequence. PCR amplification of the region of interest, followed by digestion of the genomic DNA with the appropriate restriction enzyme and gel electrophoresis, can be performed to determine the presence or absence of a mutation that affects a restriction enzyme site. Destruction or creation of a restriction enzyme site will result in a different-sized DNA fragment compared with a normal control fragment. The sickle cell anemia mutation and the hereditary hemochromatosis mutation both affect restriction enzyme sites.

DNA Methylation Analysis

DNA methylation analysis can be performed for the detection of abnormalities in imprinted genes. Of the two alleles of an imprinted gene, one allele is methylated at the promoter (silenced allele), and the other is unmethylated (expressed). Any aberration that affects the imprinting status (e.g., deletion or uniparental disomy) will affect the methylation pattern. Methylation can be assayed with the use of methylation-sensitive restriction enzymes, followed by Southern blot analysis or by methylation-specific PCR techniques. For methylation-specific PCR assays, genomic DNA is treated with the chemical sodium bisulfite, which converts all cytosine molecules to thymidine, except if methylated. The methylated cytosine molecules are left unchanged. As a result, a methylated sequence will be changed in its nucleotide content, as compared with an unmethylated sequence after bisulfite treatment, and can be distinguished by the use of specific PCR primers. Genes subject to X-inactivation also are methylated and can be detected using the same method.

Clinical Cytogenetics

Chromosome abnormalities may involve either the number or the structure of chromosomes. The former are considered genome mutations, and the latter, chromosome mutations. The mechanisms involved in the two major types of mutations are quite different, but both may result in loss or gain of DNA in the nucleus. Both types may involve all of the cells of an organism or only a proportion. When only a proportion of cells are involved, the abnormality is termed mosaic, discussed in detail later in this section.

Abnormalities of Chromosome Number

For germ and somatic cells, the normal chromosome complement consists of the haploid and the diploid number, respectively. Any deviation from these numbers is associated with significant abnormalities. The most common types result from abnormal segregation of chromosomes in germ cells; epidemiologic studies have estimated a rate of abnormal segregation of approximately 1 per 25–50 meiotic cell divisions.

Triploidy and Tetraploidy

Occasionally, fetuses with three or four times the normal haploid number of chromosomes have been observed. These abnormal chromosome complements are called triploidy (3n) and tetraploidy (4n). The few children who are liveborn survive only briefly after birth, unless the abnormality is mosaic (i.e., involves only a proportion of their cells). Failure of a maturational division in either egg or sperm results in triploidy, whereas failure of completion of an early division of the zygote causes tetraploidy.

Aneuploidy

Aneuploidy is defined as any chromosome complement that deviates from a multiple of the haploid number. In most cases, it consists of either monosomy, which is defined as loss of an entire chromosome, or trisomy, which refers to gain of an entire chromosome. Aneuploidy is the most common and clinically significant type of chromosome disorder, occurring in 3–4 percent of all recognized pregnancies.

Both monosomy and trisomy of autosomes are lethal during early pregnancy in a large majority of affected fetuses. Autosomal monosomy is uniformly lethal, except for a few reports of liveborn children with monosomy 21. Monosomy X is prenatally lethal in most affected fetuses, but many survive and will have the phenotype of Turner’s syndrome. The effects of trisomy vary, depending on the chromosome involved. Trisomy 16 is the most frequent autosomal trisomy at conception but is uniformly lethal before birth. The most common type of trisomy in liveborn infants is trisomy 21, which is the chromosome abnormality observed in 95 percent of children with Down syndrome. The only other autosomal trisomies observed at appreciable frequencies are trisomy 13 and trisomy 18, although trisomy 8 may be observed in mosaic form.

The most common mechanism is nondisjunction, which is the failure of a pair of chromosomes to separate correctly during one of the two stages of meiosis, usually meiosis 1. The consequences of nondisjunction during meiosis 1 and 2 are somewhat different. If the error occurs during meiosis 1, the unbalanced gamete with 24 chromosomes contains both the maternal and the paternal members of the pair. If the error occurs during meiosis 2, the unbalanced gamete will contain either the maternally or the paternally derived chromosome, but not both.

Abnormalities of Chromosome Structure

Structural chromosome rearrangements consist of loss, gain, or altered position of segments of chromosomes, and many different types have been recognized. The estimated frequency is about 1 per 1700 cell divisions, making them much less frequent than aneuploidy. Rearrangements are termed balanced if the chromosome complement has a normal amount of genetic information, regardless of its location. They are termed unbalanced if there has been either loss or gain of DNA sequence. The phenotypic effects often are severe. The chromosomes involved in the reconfiguration are known as derivative chromosomes. Many different types of rearrangements have been reported, as described in the following sections. Only those derivative chromosomes that contain a functioning centromere and telomeres, however, are stable and capable of being transmitted unaltered to daughter cells during mitosis or meiosis. Derivatives lacking a centromere or telomere are unstable and are lost during cell division. Some regions of the genome, such as 8p, contain a noncentromeric sequence that is sufficiently similar to function as a “neocentromere” during cell division [Giglio et al., 2001].

Mechanisms

Some progress has been made recently in current understanding of the mechanisms causing or at least predisposing to structural chromosome rearrangements. Many appear to occur randomly, with no evidence of recurrent breakpoints. For example, no consistent breakpoints have been found for the interstitial deletions and reciprocal translocations involving chromosome band 17p13.3 [Cardoso et al., 2003]. In many other locations, the small duplicated regions known as LCRs can mediate several different types of rearrangements. These were first identified as the cause of common deletion or microdeletion syndromes, such as deletion (del) 2q13 with juvenile nephronophthisis and Joubert’s syndrome-related disorder [Parisi et al., 2004; Saunier et al., 2000], 7q11.23 in Williams’ syndrome [Osborne et al., 2001; Urban et al., 1996], del 15q11.2–q13 in Angelman’s syndrome and Prader–Willi syndrome [Amos-Landgraf et al., 1999], del17p12 in hereditary neuropathy with predisposition to pressure palsies [Chance et al., 1994], del17p11.2 in Smith–Magenis syndrome [Chen et al., 1997], and del22q11.2 in DiGeorge’s syndrome/velocardiofacial syndrome [McDermid and Morrow, 2002]. The same LCRs are associated with duplications of the same regions, including duplication 15q11.2–q13 [Mohandas et al., 1999], 17p12 [Pentao et al., 1992], and 17p11.2 [Potocki et al., 2000]. A simple diagram of this mechanism is shown in Figure 30-11.

Fig. 30-11 Drawing of a recombination event mediated by low copy repeats.

The black boxes represent low copy repeats and the long four-armed linear structures represent chromosomes with two chromatids each during meiosis 1. In the drawing on the left, the outermost chromatids have paired correctly, while the inner two chromatids have paired incorrectly (long arrow pointing to site of crossover). In the drawing on the left, the outermost chromatids appear normal. But in the innermost pair, the crossover event within the mispaired region results in one daughter chromosome with a duplication and another with a deletion. The dark squares represent the low copy repeats.

Many of the LCRs are inverted with respect to each other, which can lead to very complex combinations of deletions and duplications. This can occur for LCRs on the same chromosome, as seen with Williams’ syndrome and some X chromosome rearrangements [Giglio et al., 2000; Osborne et al., 2001], or between homologous chromosomes. The latter appears to be more common when two matching LCRs on homologous chromosomes are inverted with respect to each other, a novel type of polymorphism [Giglio et al., 2001]. Similar mechanisms also can predispose to structural rearrangements between completely different chromosomes involving either an LCR or a gene cluster, such as olfactory receptor clusters [Giglio et al., 2002; Spiteri et al., 2003].

Balanced and Unbalanced Chromosomal Rearrangements

As noted previously, structural chromosome rearrangements that result in no net loss or gain of genomic sequence are balanced, whereas those that do result in a net loss or gain of material are unbalanced. Persons with balanced rearrangements usually are normal, unless one of the chromosomal breaks disrupts an important gene. More recent studies, however, have found that chromosome rearrangements that appear balanced often have submicroscopic loss or gain of material and so are actually unbalanced [Astbury et al., 2004b], and some chromosome rearrangements are more complex than standard chromosome analysis suggests [Astbury et al., 2004a].

Specific Types of Chromosome Rearrangements

The most common structural rearrangements include terminal and interstitial deletions and duplications, reciprocal and robertsonian translocations, inversions, and rings. Examples of most of these are shown in Figure 30-12.

Fig. 30-12 Partial karyotypes of structural chromosome rearrangements.

A, Pericentric inversion: 46,XX,inv(1)(p36.1q32). The pair on the left are stained for G bands, and the pair on the right for C bands (centromere bands). B, Reciprocal translocation: 6,XX,t(2;4)(p22.2;q35.2). C, Robertsonian translocation: 45,XX,t(13q14q). D, Interstitial deletion: 46,XY,del(13)(q21.3q31). E, Ring: 46,XY,r(17)(p13.3q25.3).

(Karyotypes in A from Johnson DD et al. Hum Genet 1988;78:315; those in B, C, and D courtesy of BA Hirsch, Department of Laboratory Medicine and Pathology, University of Minnesota Medical School; karyotype in E from Dobyns WB et al. J Pediatr 1983;102:552.)

Deletions and Duplications

A deletion consists of loss or gain of a chromosome segment. Deletions may be either interstitial or terminal, with terminal deletions including the telomere (see Figure 30-8 and Figure 30-12D), whereas most duplications are interstitial. Most interstitial deletions and duplications result from unequal crossing over in LCRs (see Figure 30-11). Terminal deletions are more likely to result from simple chromosome breakage, although some apparently terminal deletions prove to be interstitial deletions in which one breakpoint happens to be close to the telomere. Any carrier of a deletion is hemizygous for the information on the corresponding segment of the normal homolog. Thus, small but cytogenetically visible deletions involving critical genes occasionally produce single-gene phenotypes, such as lissencephaly or retinoblastoma.

Duplicated segments usually are adjacent to each other and may be in the same orientation (direct dup), or inverted (inverted dup) with respect to one another. In general, the phenotypic effects of duplications are less severe than the effects of deletion of a similar segment. Small duplications also can result in single-gene phenotypes, such as Charcot–Marie–Tooth neuropathy [Chance et al., 1994], although this is recognized less often than with deletions.

Inversions

Inversions are segments within a chromosome that are inverted with respect to the normal orientation; they result from crossovers within existing duplicated segments (LCRs) or from two breaks within a single chromosome, followed by inversion of the intervening segment and repair of the breaks. When the inverted segment includes the centrosome, the rearrangement is described as a pericentric inversion (see Figure 30-12A), whereas rearrangements in which the inverted segment does not include the centrosome are designated as paracentric inversions. Both types of inversions may result in production of unbalanced gametes because of the effects of recombination within the inverted segment. With pericentric inversions, a loop is formed between the inverted chromosome and its homolog during meiosis 1 (Figure 30-13). Recombination is somewhat, but not completely, suppressed within inversion loops, so crossovers are common in larger loops.

Fig. 30-13 The effect of recombination within the loop of a pericentric inversion.

A, A normal chromosome is depicted on the left, with loci 1 to 7 in order and the centromere located between loci 3 and 4. A pericentric inversion is depicted on the right, with the segment containing loci 3–5 and the centromere inverted. B, Pairing of the normal and inverted chromosomes during meiosis 1, with a crossover occurring within the inversion loop in the middle two chromatids. C, The four types of gametes produced after completion of meiosis include a normal chromosome, a derivative chromosome with duplication of the distal short arm and deletion of the distal long arm (dup p), the reverse derivative chromosome with duplication of the distal long arm and deletion of the distal short arm (dup q), and a balanced pericentric inversion.

Recombination within a pericentric inversion loop produces derivative chromosomes in which segments distal to the breaks are either duplicated or deleted (see Figure 30-13C). The effects on the phenotype are inversely proportional to the size of the inversion. Thus, the distal segments typically are large with small pericentric inversions, and most unbalanced offspring are spontaneously aborted. Liveborn children with birth defects are more likely with larger inversions that lead to relatively small distal segments.

Crossovers within a paracentric inversion loop, which commonly result from crossovers within LCRs, result in acrocentric or dicentric chromosomes. Acrocentric chromosomes are quickly lost during subsequent rounds of cell division. Dicentric chromosomes inactivate one of the two centromeres and are retained. For both, the loss or gain of chromosomal material and genes is so great that almost all affected embryos are spontaneously aborted, unless a large part of the derivative chromosome happens to break off and become lost. This mechanism has been proved and may be more common than appreciated [Giglio et al., 2002].

Reciprocal Translocations

Reciprocal translocations consist of breaks in nonhomologous chromosomes, with a reciprocal exchange of the broken segments (see Figure 30-12B). Usually, only two chromosomes are involved, but complex translocations involving three or more chromosomes have been described and likely are more common than standard chromosome analysis has suggested [Astbury et al., 2004a]. Population studies have detected reciprocal or robertsonian translocations in about 1 in 500 newborns.

Reciprocal translocations often result in the production of unbalanced gametes. During meiosis 1, the derivative chromosomes and their normal homologs form a quadriradial shape that may separate into pairs in one of three ways: alternate, adjacent 1, and adjacent 2 segregation. Alternate segregation produces balanced gametes that have either normal chromosomes or both derivatives, which are therefore balanced. Adjacent 1 segregation produces unbalanced gametes in which homologous centromeres separate into different daughter cells. It results in duplication of the distal segment of one derivative chromosome and deletion of the distal tip of the other. In most translocation carriers, alternate and adjacent 1 segregation account for a large majority of the gametes (Figure 30-14). Adjacent 2 segregation also produces unbalanced gametes. In this uncommon mechanism, homologous centromeres pass to the same daughter cell. The resulting nondisjunction produces 3:1 and even a 4:0 segregation.

Fig. 30-14 Diagram of the alternative types of segregation and gametes produced in the carrier of a reciprocal translocation between the short arms of chromosomes 3 (light purple) and 6 (dark purple).

The top line represents the parental chromosome pairs, the middle line represents the four types of gametes produced by the father, and the bottom line represents four possible chromosome combinations that may be observed in offspring. Alternate segregation (depicted on the right) produces offspring with either normal chromosomes or the balanced translocation. Adjacent 1 segregation (depicted on the left) produces offspring with unbalanced karyotypes. Children with the derivative 6 karyotype (der[6]) have deletion of the distal segment of 6p and duplication of the distal segment of 3p. Children with the derivative 3 karyotype (der[3]) have deletion of 3p and duplication of 6p. In both alternate and adjacent 1 segregation, homologous centromeres pass to different daughter cells. In adjacent 2 segregation (not depicted), homologous centromeres pass to the same daughter cells, leading to even greater chromosomal imbalance.

Reciprocal translocations often are detected in normal adults evaluated because of repeat fetal loss, or after the birth of a child with multiple congenital anomalies caused by transmission of the translocation in unbalanced form. Apparently balanced reciprocal translocations sometimes are found in children with birth defects or abnormal development. In these instances, the abnormal phenotype usually results from either submicroscopic loss of genetic material or disruption of a gene at one of the breakpoints [Astbury et al., 2004b].

Robertsonian Translocations

Robertsonian translocations involve two acrocentric chromosomes that fuse in or near the centromere region, with loss of the short arms (see Figure 30-12C). Because the short arms contain repetitive DNA elements, especially rRNA, no phenotypic effects result. Carriers of a robertsonian translocation on chromosome 21 have a high risk of producing a child with translocation Down syndrome.

Insertions

Insertions occur when a small segment of a chromosome is removed and inserted into a different region on the same or another chromosome. If the segment is inserted with the same orientation with respect to the centromere, it is known as a direct insertion. If it is inserted with the reverse orientation, it is called an inverted insertion. Insertions are rare because three separate chromosomal breaks are required. Segregation during meiosis can produce either abnormal offspring, with duplication or deletion of the inserted segment, or normal offspring and balanced carriers.

Rings

Rings – ring chromosomes – are formed when a chromosome undergoes two breaks, usually one in each arm, and the broken ends are rejoined (see Figure 30-12E). The two segments distal to the breaks are lost, resulting in deletion of both telomeres and adjacent regions of both the short and the long arms of the chromosome. Rings may not segregate properly during mitosis and meiosis, especially if a crossover occurs. Crossover often results in breakage followed by fusion, which may produce larger or smaller rings.

Isochromosomes and Dicentrics

Isochromosomes are chromosomes in which one arm is missing and the other is duplicated as the result of misdivision of the centromere during meiosis 2. Isochromosomes also can result from translocation of an entire arm to its homolog with a breakpoint adjacent to the centromere. The most commonly observed isochromosome involves Xq. Dicentrics are rare chromosomes in which two segments, each containing a centromere, fuse end to end. They tend to break during mitosis because of the double centromeres.

Cytogenetic Nomenclature

Detailed rules regarding nomenclature of chromosomes and chromosomal abnormalities have been published [ISCN, 2009]. Examples of most of the major types of abnormalities are listed in Table 30-5 using standard nomenclature. Note that breakpoints on the same chromosome are not separated by any punctuation, whereas breakpoints on different chromosomes are separated by a semicolon.

Table 30-5 Examples of Chromosomal Abnormalities Using Standard Nomenclature (Short System)

Rearrangement	Karyotype
Genome Mutations
Triploidy	69,XXX
Monosomy	45,X (Turner’s syndrome)
Trisomy	47,XX,+21 (Down syndrome)
Deletion
Terminal	46,XY,del(8)(p21.1)
Interstitial	46,XX,del(17)(p11.2p11.2)
Ring	46,XY,r(17)(p13.3q22.3)
Duplication
Direct	46,XY,dir dup (2)(p14p23)
Inverted	46,XX,inv dup (11)(p12p15)
Paracentric Inversion
Balanced	46,XX,inv(1)(p32p36.1)
Unbalanced	46,XX, dup q, inv(1)(p32p36.1)
Pericentric Inversion
Balanced	46,XX,inv(1)(p36.1q32)
Unbalanced	46,XY,dup(q),inv(1)(p36.1q32)mat
Reciprocal Translocation
Balanced	46,XY,t(−17,+der(17),t(7;17)(p22.3;p13.3)pat
Robertsonian Translocation
Balanced	46,XX,rob(13;21)
Unbalanced	46,XY, −13,+der(13),rob(13;21)mat

del, deletion; der, derivative; dir, direct; dup, duplication; inv, inversion; mat, maternal; pat, paternal; r, ring; rob, robertsonian translocation; t, translocation.

Mutations and Genetic Diseases

The number of genetic changes causing disease, including neurologic disorders, is far too great to cover here, although many of these are reviewed in other chapters of this book. Here we review the general mechanisms that lead to genetic diseases, starting with a definition of mutation.

As a very basic definition, a mutation is simply a permanent change in the DNA of an individual. The change most often is a change in the nucleotide sequence anywhere in the genome, although some chemical modifications of DNA can result in mutations as well. Such modifications are known as epigenetic mechanisms. Genetic diseases are caused by mutations that adversely affect function of one or more genes. The same is true for many types of cancer.

Classes of Mutations

Mutations have been subdivided into three main types: genome, chromosome, and gene mutations. They may occur in either somatic or germ cells, although only germ cell mutations can be transmitted to offspring. All three occur often enough for affected individuals to be observed in clinical practice.

Genome and Chromosome Mutations

Abnormalities of chromosome number, including triploidy, tetraploidy, and aneuploidy, are classified as genome mutations. Similarly, structural rearrangements are classified as chromosome mutations. Both of these were reviewed in the preceding section. Genome and chromosome mutations are rarely perpetuated to the next generation because most result in spontaneous abortions. Thus, the frequencies cited are probably underestimates.

Gene Mutations

Gene mutations differ from genome and chromosome mutations because the segment of DNA involved is much smaller and the mechanisms are different. The most common types are basepair substitutions and small deletions or insertions that can be caused by an error during DNA replication or by base changes induced by extrinsic agents referred to as mutagens. Because genome and chromosome mutations usually are lethal, most significant heritable mutations are gene mutations.

DNA Replication Errors

DNA replication normally is an accurate process. The DNA polymerases (see Table 30-2) insert an incorrect base only once in every 10 million bp. A series of DNA repair enzymes exist that are able to recognize and replace noncomplementary bases, correcting more than 99.9 percent of errors. The overall mutation rate is therefore only 10⁻¹⁰ per basepair per cell division. The human genome consists of about 6 × 10⁹ bp, so this mutation rate results in less than 1 bp mutation per cell division. Nevertheless, an estimated 10¹⁵ cell divisions occur during the lifetime of an adult human. Thus, thousands of new mutations occur at virtually every position in the genome. Not surprisingly, inherited defects in DNA replication and repair enzymes lead to a striking increase in the frequencies of all types of mutations.

Most of the mutations occur in somatic cells, where they may cause cancer or a genetic disease affecting only part of the body, such as segmental neurofibromatosis. Fewer mutations occur in germ cells. During oogenesis, female germ cells undergo mitosis approximately 22 times and begin meiosis only once during fetal life. The cells are suspended in meiosis from fetal life till shortly before ovulation during the reproductive years. Spermatogenesis consists of approximately 30 mitoses from conception until puberty and approximately 20–5 per year thereafter. Thus, the opportunity for mutations is expected to be far greater for sperm than for ova. This phenomenon has been confirmed in several genetic disorders, such as neurofibromatosis type 1, achondroplasia, and hemophilia A. It has been estimated that as many as 1 in 10 sperm in healthy males may carry a new deleterious mutation. Most are recessive or lethal, and therefore not apparent in liveborn children.

Mutation Rate

The mutation rate for any given gene or other DNA segment depends on both its size and its location. The location may be important because certain areas of the genome are known to be “hot spots” for recombination. The average mutation rate is about 1 × 10⁻⁶ mutation per locus per generation, but the rate varies by more than a thousand-fold for different genes. For example, the number of new mutations per 10⁶ gametes is 40–100 for Duchenne muscular dystrophy and neurofibromatosis 1, but only 2–5 for aniridia and hemophilia B. These statistics include only mutations causing genetic diseases. The rate of change of protein polymorphisms suggests a rate as high as 6 × 10⁻⁶ per locus per generation. (A locus is the position of a gene on a chromosome.)

Specific Types of Gene Mutations

The development and widespread use of modern molecular techniques have led to the discovery of specific mutations at many different loci. From among these, many different types of mutations have been recognized, all of which have the potential for causing genetic diseases. They may be divided by size into single- and multiple-base changes. The latter may involve only one or a few bases or may involve millions of basepairs. In any specific gene, mutations are almost always heterogeneous, although some types may be more common than others. Thus, the specific mutations in unrelated persons with the same genetic disease often are different. A few notable exceptions to this rule have been identified, such as achondroplasia, which almost always is caused by a specific single-base change in the fibroblast growth factor receptor 3 gene [Bellus et al., 1995].

Nucleotide Substitutions

Point mutations or single-base substitutions represent one of the most common types of mutation. Most are related to an error in DNA synthesis by the enzyme DNA polymerase that was not corrected by DNA repair enzymes. Some combinations of nucleotides are mutation-prone, however. More than 30 percent of point mutations found in some genetic diseases are the result of cytosine to thymine transitions, which are caused by methylation of cytosine residues to 5-methylcytosine, especially cytosine residues occurring as the first base in a 5′-CG-3′ dinucleotide pair. The latter then undergoes spontaneous deamination to thymidine. Thus, the 5′-CG-3′ doublet represents a “hot spot” for mutation in the human genome.

Deletions, Duplications, and Insertions

The remainder of mutations consist of loss or gain of nucleotide bases somewhere in the genome; a variety of mechanisms for such changes are known. A deletion consists of any loss of DNA sequence, whereas a duplication consists of a second copy of a DNA sequence that is usually located immediately adjacent to the first copy. An insertion consists of a DNA sequence that has been removed or copied from one location and moved to a nonhomologous region elsewhere on the same chromosome or to a different chromosome. Deletions, duplications, and insertions that involve one or a few basepairs can be detected only by nucleotide sequencing. Larger deletions and duplications may be detected by several of the methods described earlier, including direct sequencing, Southern blot analysis, FISH, and CGH.

Effects of Mutations on Gene Function

The effect of mutations on gene function depends as much or more on the specific location of the mutation as on the size. Mutations that occur in DNA outside functioning genes usually have no consequences. Mutations within the boundaries of a gene may inactivate it or have little or no effect, depending on the nature of the change.

Missense Mutations

A point mutation within the coding region of a gene can alter the genetic code by changing the nucleotide triplet and cause the replacement of one amino acid by another in the gene product, thus altering function of the gene product. Such mutations are called missense mutations and do not change the reading frame of the DNA sequence. The best-known example is the A to T substitution in the sixth codon of the β-globin gene, which causes sickle cell anemia, by substituting valine for glutamic acid in the β-globin protein chain. Not all mutations within coding regions of a gene result in a missense mutation, however. All but 2 of the 20 amino acids are specified by more than one codon, most often differing in the third or “wobble” position of the triplet. The gene product will be identical if the new triplet codes for the same amino acid.

Nonsense (Chain Termination) Mutations

Mutations that generate one of the three stop codons result in premature termination of translation, whereas those that alter a stop codon allow translation to continue until the next stop codon is reached. Those mutations that result in a premature stop codon are called nonsense mutations. In general, these mutations have no effect on transcription (DNA to RNA), but the shortened polypeptide may have lost critical functional domains of the protein, or the mRNA may be so unstable that it is rapidly degraded in the cell. The latter process is known as nonsense-mediated mRNA decay, a process by which mRNA species containing premature termination codons are recognized and degraded before translation, although this typically spares truncation mutations in the last coding exon [Frischmeyer and Dietz, 1999]. Both base substitutions and nucleotide loss or gain mutations may result in nonsense mutations.

RNA Splicing Mutations

The sequence surrounding intron splice sites is highly conserved, and mutations of key nucleotides frequently prevent or reduce efficiency of splicing. The key nucleotide sequences at most splice junctions are shown in Figure 30-4.

Splicing mutations may either inactivate existing splice sites or create new ones. In the first type, the mutation alters the splice-donor, branch, or splice-acceptor site, resulting in failure to splice the intron correctly at that site. This failure results in a large insertion of nucleotides that normally are not translated into the processed mRNA. This insertion is almost certain to introduce a stop codon within the next hundred or so codons, because 3 of the possible 64 triplet combinations are stop codons. In the second type, mutations within the intron create alternative splice-donor or acceptor sites that compete with the normal splice sites during mRNA processing. Thus, a proportion of the mature mRNA will contain incorrectly spliced intron sequences. Both base substitutions and nucleotide loss or gain mutations may result in splicing mutations.

One example of the first type is a G to C transition in the first position of the intron at the donor splice site in the hexosaminidase A gene found in many Ashkenazi Jewish patients with Tay–Sachs disease [Nussbaum et al., 2007]. In this example, the bases in the exon are capitalized, whereas those in the intron are not, and the mutation is underlined.

Frameshift Mutations

Small nucleotide loss or gain mutations may alter the reading frame of the mRNA product from the point of the mutation on, which results in a completely different amino acid sequence at the carboxyl end of the protein product, or premature chain termination if a stop codon is encountered in the new reading frame. Any loss or gain mutation that involves a multiple of three bases maintains the reading frame, whereas a mutation that does not involve a multiple of three nucleotides changes the reading frame. Larger deletions that include one or more introns also may cause a frameshift mutation, because exon/intron splice sites may occur at any point in the reading frame, thereby splitting codons. If the exon just downstream from the deletion normally begins at a different position in the triplet than the deleted intron, the reading frame will be changed. By contrast, base substitutions do not cause frameshift mutations. Deletions and insertions cause dysfunction of the gene more often than point mutations because of the possibility of a frameshift.

One of the best-known frameshift mutations is a single-base deletion in the ABO blood group locus that results in the nonfunctional O allele. The deletion alters the reading frame at codon 86 until a premature stop codon is reached 30 codons later. The stop codon is normally out of frame and is therefore not read.

With some intermediate-size deletions of approximately 1 kb to 1 Mb, one or more exons of a gene may be duplicated or deleted. About two-thirds of these will change the reading frame and result in a frameshift mutation. Those that maintain the reading frame produce truncated products that may or may not retain function. These relatively small deletions and duplications are too small to be seen with chromosome analysis or FISH and cannot be found by sequencing, which is not sensitive to dosage (recall that all autosomal genes and X-linked genes in females have two copies of each gene). These may be rare or common mechanisms of mutation. For example, small deletions and duplications are the most common mutational types for Duchenne muscular dystrophy and Becker muscular dystrophy, both caused by mutations of the dystrophin or DMD gene. These are simple to detect for X-linked diseases in males, because only one gene copy is present. When any autosome of the X chromosome in a female is involved, the mutation can be detected by other methods, such as quantitative PCR assay.

Transcriptional Control Mutations

Mutations involving promoter sequences in the 5′ UTR or other regulatory sequences in the 3′ UTR of a gene may result in a significant decrease in the amount of mature, processed mRNA produced. Both base substitutions and nucleotide loss or gain mutations may result in transcriptional control abnormalities.

Principles of Medical Genetics

Several principles of genetics derived from the chromosomal and molecular basis of heredity form the basis for the different patterns of inheritance observed in genetic diseases. A working understanding of the principles of inheritance is important for understanding the genetic diseases encountered in pediatric neurology clinics, for formulating an optimal management approach to patients with these diseases, and for providing accurate genetic counseling. The simplest and best-known patterns of inheritance involve mutations of single genes; however, more complex patterns of inheritance likely are more common. Here we review some of these principles and examine the basis for genetic counseling.

Patterns of Inheritance

A discussion of inheritance requires familiarity with a special vocabulary. As reviewed previously, a gene is a sequence of DNA that is required for production of a functional product. The position of a gene on a chromosome is known as its locus. The alternative forms of a gene that may occupy a given locus are known as alleles. Different alleles typically result from one or more minor differences in nucleotide sequence. When both alleles at a given locus are identical, the person is said to be homozygous for that trait. When the alleles are different, the person is described as heterozygous. When only one allele is present, the person is hemizygous.

The genetic constitution of an individual is the genotype. At any given locus, the normal genotype consists of either a single allele or a pair of alleles. Only a single allele is present for most genes on the X chromosome in males, who have only one X chromosome. A pair of alleles is present for all genes on the autosomes, and for a subset of genes on the X chromosome located in “pseudoautosomal” regions, which have functional homologs on the Y chromosome. The observable expression of the genotype is the phenotype. Penetrance has been defined as the percentage of persons with a particular genotype who have the expected phenotype; this is an all-or-none phenomenon. Expressivity is defined as the extent to which a genetic trait or disease is expressed and may vary greatly between affected persons. The proband is the affected family member through whom a family is identified; the consultand is the person in the family who seeks advice, regardless of whether affected or not. A pedigree is a diagram of the family history that shows the family members, their relationships to the proband, and their status with regard to the hereditary condition. Some of the symbols used for pedigrees in medical genetics are illustrated in Figure 30-15. A more detailed standardized nomenclature has been proposed for publication of pedigrees [Bennet et al., 1995].

Fig. 30-15 Common symbols used in pedigrees.

The most widely recognized patterns of inheritance are single-gene or “mendelian” patterns, which include autosomal-dominant, autosomal-recessive, and X-linked modes of inheritance; example pedigrees are shown in Figure 30-16. All of these result from mutations in a single gene. The disorder or trait is autosomal if located on human chromosomes 1–22, and X-linked when located on the X chromosome. The only true Y-linked trait is male sex determination (the SRY gene). Autosomal traits are considered dominant when expressed in both heterozygotes and homozygotes, and recessive when expressed only in homozygotes; neither of these terms really fits with X-linked inheritance, as reviewed later on. The pattern of single-gene inheritance is modified in special cases in which mutations involve genes subject to imprinting or X-inactivation, or involve only a proportion of cells in the body or affected tissue. Finally, many diseases have more complicated inheritance.

Fig. 30-16 Examples of autosomal-dominant (AD), autosomal-recessive (AR), and X-linked (XL) pedigrees.

Autosomal-Dominant Inheritance

The most important attributes of autosomal-dominant inheritance are expression of the trait in heterozygotes and male-to-male transmission (see Figure 30-16). The autosomal-dominant pattern may be recognized because:

1. The trait or disease typically appears in every generation, except that it may arise by new mutation in the first affected family member.

2. Any child of an affected person has a 50 percent risk of inheriting the trait.

3. The offspring of unaffected family members also are unaffected.

4. The trait may be transmitted by a parent of either sex to a child of either sex, and specifically may be transmitted from father to son, which distinguishes it from X-linked inheritance.

Autosomal-dominant inheritance is readily identified in most families but may be difficult to discern in others. When the disease occurs as a result of a new mutation, no relatives are affected. With reduced penetrance, low expressivity, and late age at onset, other affected family members may go unrecognized. Among the best examples for each of these characteristics are myotonic dystrophy and Huntington’s disease. Finally, incorrect information regarding family relationships, such as false paternity, may complicate interpretation of the pedigree.

Most persons affected by a disorder of autosomal-dominant inheritance are heterozygous, but rarely a homozygous person is encountered. Generally, the phenotype in homozygous persons is significantly more severe than in heterozygous persons. For example, one child born to parents who each had hereditary motor and sensory neuropathy type I had a much more severe neuropathy consistent with Dejerine–Sottas disease, or hereditary motor and sensory neuropathy type III [Killian and Kloepfer, 1979]. The best-known exception to this rule is Huntington’s disease, in which persons heterozygous for this trait cannot be distinguished from homozygotes.

Autosomal-Recessive Inheritance

The most important attributes of autosomal-recessive inheritance are expression in homozygotes and equal gender distribution (see Figure 30-16). This pattern may be recognized by the following four characteristics:

1. The trait or disease may affect multiple siblings but not parents, children, or other relatives, except in highly inbred populations.

2. Each full sibling of an affected person has a 25 percent chance of inheriting the trait.

3. The parents are more likely than usual to be related.

4. With rare exceptions, males and females are equally likely to be affected.

In Western societies, a child with a disorder of autosomal-recessive inheritance may be the only affected person in the family, owing to small family size and a tendency for parents of affected children to have fewer children after the birth of a child with a genetic disease. This practice does not hold true in many other cultures, especially those with inbred populations.

When the frequency of a rare recessive allele is relatively high within a family or population, the disease may appear in more than one generation. This pattern is known as pseudodominant inheritance. Some genes on the X chromosome have functional homologs on the Y chromosome, and traits or diseases associated with these genes will behave in the same manner as for autosomal loci. This pattern is known as pseudoautosomal inheritance.

The risk of bearing a child with an autosomal-recessive disease or trait is increased when the parents are consanguineous or related by descent. More formally, the probability that a homozygote has received both alleles of a pair from an identical ancestral source is known as the coefficient of inbreeding. It also is the proportion of loci at which a person is homozygous by descent. For example, any child born to first cousins is homozygous at 1/16 of all loci. Although the relative risk of abnormal offspring is higher for first cousins than for unrelated parents, it is still low, at about 5 percent.

X-Linked Inheritance

The inheritance of diseases and traits associated with genes located on the X chromosome differs markedly from autosomal forms of inheritance because females have two X chromosomes, whereas males have only one. Thus, when mutations of an X-linked gene occur in males, no genetic “backup” is available. The situation in heterozygous (carrier) females is more complicated, owing to the phenomenon of X chromosome inactivation (Xi), which ensures that dosage for X-linked genes is the same in male and in female cells. Because Xi is random in most females, the maternally derived and paternally derived X chromosomes usually are active in about half of the cells in a female organism. Thus, mutation of one gene should cause no more than a 50 percent loss of function of the protein or other gene product. With this background, several mechanisms have been described that lead to disease expression in female carriers of X-linked mutations. First, some genes are dosage-sensitive, so that 50 percent expression is not enough for normal function. Next, by chance or because of cell selection (usually favoring the normal allele), some females have skewing of X-inactivation, such that one X chromosome is inactivated in a high proportion (80–100 percent) of cells. Unfavorable skewing will cause or worsen disease, whereas favorable skewing will prevent or reduce disease expression. Finally, skewing of X-inactivation also may result from mutations of the genes that actually control inactivation, especially the XIST gene, which is responsible for inactivation of one of the two X chromosomes. Not surprisingly, affected females usually have a less severe phenotype than that seen in affected males.

The important characteristics of X-linked inheritance result from differential segregation of the X and Y chromosomes in males and females, and the differences in gene dosage. The most consistent characteristics include more severe phenotype in males than in females, transmission of disease through carrier females who are unaffected or less affected than males, and lack of male-to-male transmission (see Figure 30-16). The last is explained by transmission of the Y chromosome from fathers to sons, whereas the disease genes are located on the X chromosome. X-linked disorders traditionally have been divided into dominant and recessive subtypes, just as for autosomal single-gene disorders. This distinction was first made in fruit flies under experimental conditions but has never worked very well for human disorders. In a recent survey of more than 30 X-linked diseases, a remarkably wide range of penetrance was found, with many disorders intermediate between so-called X-linked dominant and recessive patterns [Dobyns et al., 2004]. On the basis of these and other arguments, use of these subtypes should be discontinued. The rules for X-linked inheritance have been modified to reflect this change (Box 30-1).

Box 30-1 Rules for X-Linked Inheritance in Humans

Rules Related to Segregation of the X and Y Chromosomes

Hemizygous males transmit X chromosomes to daughters and Y chromosomes to sons

Male-to-male transmission of X–linked disorders cannot occur

Sons of hemizygous males never inherit the disorder

Daughters of affected males all are heterozygous (carriers or affected)

All affected males in a family are related through heterozygous females

Heterozygous females transmit X chromosomes to both sons and daughters

Fifty percent of sons of heterozygous females will be hemizygous males

Fifty percent of daughters of heterozygous females also will be heterozygous females

In many instances, children with X-linked diseases present primarily or frequently to pediatric neurologists, including those with adrenoleukodystrophy, the Duchenne and Becker forms of muscular dystrophy, fragile X syndrome, many X-linked mental retardation syndromes, two forms of X-linked lissencephaly, and many others. These diseases all result in more severe phenotypes in males than in females. Another class of X-linked disorders, “X-linked, male lethal,” have proved to be particularly important in pediatric neurology. These diseases are observed almost exclusively in females and are thought to cause prenatal lethality in males or to cause a much more severe phenotype that is not recognized as the same disorder. Examples are Aicardi’s, Goltz’s, and Rett’s syndromes and orofaciodigital syndrome type I.

Genomic Imprinting

In most single-gene disorders, the expression of a trait or disease is expected to be the same, regardless of whether the gene was inherited from the mother or the father. Significant differences in expression based on the gender of the transmitting parent, however, have been observed in several disorders. This phenomenon is known as imprinting and reflects differences in the state of the maternal and paternal contributions to the genome, especially differential methylation of the maternally and paternally derived chromosomes. An imprinted gene can be imprinted or differentially methylated (i.e., differentially silenced) in all cells of the body, or only in selected tissues, such as brain.

This differential silencing means that imprinted genes will be expressed from the maternally derived gene or from the paternally derived gene, but not from both. So if the functioning copy of an imprinted gene is lost owing to a deletion or mutation, the affected person is left with no functioning gene, and a disease phenotype will result. The underlying mechanisms are under study but still are not well understood [Hall, 1990; Jiang et al., 2004]. Imprinting disorders result from deletions or other types of mutations of genes within imprinted regions (or in the imprinting control regions). The most common of these result in infantile developmental disorders relevant for pediatric neurologists, such as Beckwith–Wiedemann syndrome due to defects of imprinted genes on 11p15.5 [Weksberg et al., 2003], and Angelman’s and Prader–Willi syndromes due to defects of imprinted genes on 15q11.2–q13 [Amos-Landgraf et al., 2006]. These disorders are reviewed elsewhere in this book.

Uniparental Disomy

Defects of imprinted genes also may result from a rare mutation type known as uniparental disomy (UPD). UPD is defined as the presence of a diploid cell line containing two chromosome homologs inherited from the same parent. It is believed to result from nondisjunction, which produces trisomy for a particular chromosome. Trisomy is followed by loss of one of the three homologs, reducing the chromosome number back to normal. This mutation is known as uniparental isodisomy when the two homologs are identical, and uniparental heterodisomy when they are different (as a result of crossing over, different regions of the affected chromosomes usually are involved).

When UPD involves a chromosome with an imprinted region, problems occur. For example, a child may inherit two paternally derived chromosomes, in which case no maternally derived chromosome will be present. Any genes that are normally expressed only from the maternally derived gene will not be expressed at all. This is one cause of Angelman’s syndrome. The same type of problem occurs in reverse if the child inherits two maternally inherited chromosomes. No paternally derived genes will be present, and a disease will occur. This is one cause of Prader–Willi syndrome [Amos-Landgraf et al., 2006].

Very rarely, UPD can result in a genetic disease by causing homozygosity of a recessive disease gene. Because the two chromosomes are identical, they are homologous at all loci. If the involved chromosome contains any recessive disease loci, the person with the mutation will be homozygous and therefore affected.

Mosaicism

Mosaic is a term used to refer to an individual organism or tissue that contains two or more cell lines that differ in DNA sequence, although they are derived from a single zygote. All organisms begin with a specific DNA sequence in the cell of origin or zygote. As cell division proceeds, some mutations occur that produce small differences among different cell lines. The presence of two or more cell lines differing in their DNA sequence but derived from a single zygote is known as mosaicism. Mosaicism is clinically important in many disorders and probably explains some unusual diseases in which only part of the body appears to be affected with a birth defect or genetic disease. A good example is segmental neurofibromatosis.

This phenomenon can involve any tissue or group of tissues in the body. When mosaicism is found in lymphocytes, fibroblasts, or other somatic cells of the body, it is designated somatic mosaicism. The mosaic individual typically demonstrates at least mild signs of disease. When mosaicism is found only in germ cells (egg, sperm), it is known as gonadal mosaicism. The mosaic individual usually is identified as the parent of two or more children with a genetic disease, despite having no signs of the disease clinically or on mutation analysis. Mosaicism may begin with a somatic mutation in the germline of the affected person, which then persists in the clonal descendants of that cell, including a proportion of the ova or sperm. When the mutation exists only in the germline, the parent has no signs of the disease but may conceive multiple affected children. This phenomenon has been seen frequently in Duchenne muscular dystrophy due to mutations of the DMD gene, the autosomal-dominant form of osteogenesis imperfecta associated with mutations of the COL1A1 or COL1A2 gene, and X-linked lissencephaly due to mutations of the DCX gene. The distinction between somatic and germline mosaicism is most likely artificial, however, because standard evaluations examine very few tissues.

Complex Inheritance

The most important attributes of complex or multifactorial inheritance are lack of a clear pattern of inheritance in single families, although more than one relative may be affected, and a relatively low risk for first-degree relatives (approximately the square root of the population risk), typically in the range of 1–5 percent. This form of inheritance results from variation at two or more loci with two or more alleles each, often with a prominent environmental influence. This pattern also is referred to as polygenic (we prefer oligogenic) or multifactorial inheritance. Examples of traits inherited in this pattern are head circumference, autism, neural tube defects, and common forms of epilepsy. Some are continuous traits that can be measured, such as head circumference, whereas others fall into non-overlapping groups, such as autism and epilepsy. For continuous traits, such as head circumference, children are likely to be intermediate between their parents, or closer to the mean than either parent (so-called regression to the mean).

Mitochondrial Inheritance

Mitochondria are the cellular organelles that are primarily responsible for cellular respiration and production of adenosine triphosphate, both essential for cellular energy management. Each mitochondrion contains multiple copies of a small 16.5-kb circular chromosome that codes for 13 proteins and many rRNA and tRNA genes that differ in sequence from the nuclear rRNA and tRNA genes. The proteins all are components of the respiratory chain, and the remainder of the respiratory pathway enzymes are encoded by nuclear genes.

The mitochondria in any one person are derived almost exclusively from the mother through the ovum. Each ovum contains hundreds of mitochondria, and each mitochondrion contains many copies of the circular mitochondrial chromosome. Sperm contain a few mitochondria, most of which are degraded rapidly by the proteasome-dependent protein degradation pathway of the ubiquitin system within the ovum after fertilization [Sutovsky et al., 2003]. A small paternal contribution of mitochondria, however, has been demonstrated in several species, such as sheep [Zhao et al., 2004]. The same likely is true for humans, as suggested by one example of paternal inheritance of a mitochondrial disorder [Schwartz and Vissing, 2002]. This must be very rare, however, owing to the small proportion of paternal compared with maternal mitochondria, and as indicated by studies in humans with mitochondrial diseases [Filosto et al., 2003].

In patients with mutations of mitochondrial genes, a variable proportion of the mitochondrial chromosomes carry the mutation. Thus, diseases caused by mutations in mitochondrial DNA exhibit strict maternal inheritance (with very rare exceptions) and usually will exhibit phenotypic variation within a family owing to variation in the proportion of mutant and normal mitochondria between individuals [Zeviani et al., 1989].

Maternal inheritance may be recognized because:

1. The incidence of the disease is equal in males and females.

2. The disease is transmitted from mother to offspring of both genders, but never from father to offspring.

3. Variable expression is common.

These criteria have been met for several diseases associated with mitochondrial DNA mutations, including Kearns–Sayre syndrome, Leber’s hereditary optic neuropathy, MELAS (mitochondrial encephalomyopathy, lactic acidosis, and strokelike episodes), and MERRF (myoclonic epilepsy with ragged-red fibers).

Genetic Counseling

Medical genetics differs from other specialties because family members other than the patient may be at high risk for a disease first recognized in the patient, who then becomes the proband. The person or persons actually seeking advice may not themselves be affected. Ideally, the patient or the parents or guardians of minor children or incompetent adults, and all other family members at risk, should be made aware of both the clinical and the reproductive consequences of a genetic disease. Genetic counseling is the process of providing this information. Although any physician may provide genetic counseling as part of overall patient management, it is more commonly conducted in genetics clinics.

Standard of Care

All physicians have a professional responsibility to make certain that genetic counseling has been provided in appropriate situations and to ensure that the counseling meets current standards of practice [Directors and Directors, 1995; Parker, 2010]. Failure to provide this information may have tragic results. Perhaps the best example for pediatric neurologists is the birth of a second or even a third male with Duchenne muscular dystrophy in a family. Courts have upheld the principle of physician responsibility to provide accurate counseling on several occasions. For example, the parents of a child with Down syndrome claimed negligence because the mother had not been referred for prenatal diagnosis. In another case, parents who were tested for Tay–Sachs disease carrier status both were told that they were not carriers. They later had an affected child and filed a claim. Cases of this type are known as wrongful life claims.

Responsibility to Relatives

The responsibility to provide genetic counseling does not formally extend beyond the consultand, or person seeking genetic advice. People with genetic disorders are entitled to the same confidentiality as for persons with any other type of disease. Nevertheless, the need for confidentiality does not mean that no effort should be made to inform relatives of a common risk. Whenever the genetic evaluation suggests that other family members or their future children may be at risk, the consultand should be encouraged to contact those persons or ask the physician (or designee) to contact them and advise them to seek genetic evaluation.

A majority of people act responsibly in this regard, but exceptions do arise that may present an ethical dilemma for health-care practitioners providing the counseling.

Relevance for Pediatric Neurology

The obligation to provide accurate genetic counseling is particularly important for pediatric neurologists. Many neurologic and neuromuscular diseases in children have a genetic basis or a genetic component, including some of the most common problems seen in clinics. For disorders such as Duchenne muscular dystrophy, neurofibromatosis type 1, and tuberous sclerosis, the genetic basis and need for genetic counseling are well known. For others, the genetic basis or contribution is not widely recognized. Febrile seizures, benign rolandic epilepsy, and some types of primary generalized epilepsy may affect many persons in a family and appear to have autosomal-dominant inheritance, although penetrance is not complete. Mental retardation, microcephaly, and cerebral palsy have a significant genetic component, with recurrence risks of 5–10 percent. Most brain malformations are sporadic, but familial recurrence of almost every known type of brain malformation has been described.

Because of the possibility of recurrence in relatives, pediatric neurologists should take a genetic history and advise patients and parents of their genetic risks. Referrals from pediatric neurology clinics to genetics clinics should occur frequently. Even for single-gene disorders with a known risk of recurrence, referral usually is needed to provide an accurate presentation of methods of prevention, such as prenatal diagnosis when this is available, artificial insemination by donor, contraception, sterilization, and adoption. If the parents decide to terminate a pregnancy, continued professional support is an appropriate and important part of genetic counseling.

Genetic Risk

One of the most crucial steps in offering accurate genetic counseling is to estimate the risk of recurrence of a genetic disorder in other family members. Estimation is not difficult for most diseases with a single-gene pattern of inheritance, but even these may be complex because of late age at onset or incomplete penetrance. For many other disorders, empirical recurrence risk estimates are used. These risk figures are derived from previous experience with the same disorder. Although such figures generally are accurate, exceptions occur because of causal heterogeneity and lack of knowledge regarding many rare disorders. The recurrence risks for some of the more common diseases seen in pediatric neurology clinics were reviewed by Baraitser [Baraitser, 1997]. In any given family, the recurrence risk may be different; consultation with a geneticist or genetic counselor may be helpful in providing this information.

Prenatal Diagnosis

Prenatal diagnosis can now be performed for hundreds of genetic disorders, including many of those discussed in this chapter. The major methods used include chromosome analysis, enzyme assays and other biochemical tests, molecular genetic tests, and direct examination of the fetus by ultrasonography. The last is a far more rigorous procedure than routine prenatal ultrasonography and usually is referred to as high-resolution, level 2, or genetic ultrasonography. The purposes of prenatal diagnosis are to provide:

1. a range of informed choices for parents at risk for having a child with an abnormality

2. reassurance to reduce anxiety, especially among parents at high risk

3. an opportunity for parents, who otherwise would choose not to have children, to conceive and bear healthy children.

The results of prenatal tests are normal in more than 98 percent of pregnancies evaluated, and parents are reassured that the infant will be unaffected by the condition in question. Of course, the infant remains at risk for other disorders, just as do children born to any other parents. In a small proportion of cases, the fetus is indeed found to have a serious defect. Because effective prenatal therapy is not possible for most disorders, the parents then have the option of terminating the pregnancy.

Genetics and Medicine

Genetics has become one of the most rapidly expanding fields in all of biology, and it is likely that this trend will continue. The past few years have seen completion of the Human Genome Project and identification of many genes relevant to neurologic disorders of childhood. The next few years will see the isolation of new genes responsible for common and complex diseases and many more single-gene disorders.

References

The complete list of references for this chapter is available online at www.expertconsult.com.

See inside cover for registration details.

Site	Internet Address
NCBI¹ Genetic Disease Websites
GeneTests, GeneReviews²	http://www.ncbi.nlm.nih.gov/sites/GeneTests/
OMIM³	http://www.ncbi.nlm.nih.gov/omim/
NCBI¹ Genome Data Websites
NCBI¹ homepage (Entrez)	http://www.ncbi.nlm.nih.gov/
dbGaP Genotypes and Phenotypes	http://www.ncbi.nlm.nih.gov/gap
dbSNP (SNP database)	http://www.ncbi.nlm.nih.gov/snp/
Other Genome Data Websites
Ensembl Human Genome Browser	http://uswest.ensembl.org/index.html
HUGO⁴	http://www.genenames.org/index.html
DOE⁵ Genomics Websites, includes Human Genome Project	http://genomics.energy.gov/
UCSC Genome Bioinformatics⁶	http://genome.ucsc.edu/

Molecular Basis of Heredity

Modern theories of molecular biology hold that all information needed for function of cells and organisms is contained in macromolecules composed of simple repeating units. The flow of genetic information is (almost) exclusively unidirectional: DNA to RNA to protein. That is, the sequence of deoxyribonucleic acid (DNA) specifies the synthesis and sequence of ribonucleic acid (RNA) by a process known as transcription. Messenger RNA in turn specifies the synthesis and sequence of polypeptides, which are the building blocks of proteins, by a process known as translation. Other forms of RNA function independently. This theory is the central dogma of molecular biology. Accordingly, we begin with a review of the structure and function of these three macromolecules, and continue with reviews of the processes involved in gene and protein expression, including gene structure and organization, RNA processing, and epigenetics. Epigenetics refers to modification of genes other than changes in the DNA sequence, especially by addition of methyl groups to DNA, which alters gene expression. The two most important epigenetic changes found to be relevant to clinical disorders to date are imprinting and X-inactivation.

Structure and Function of DNA

DNA is a large polymer or macromolecule composed of linear sequences of simple repeating units. The specific sequence of these units contains all of the genetic information of an individual cell or organism. The structure of DNA in its native state was deduced by Watson and Crick in 1953 [Watson and Crick, 1953]. The basic repeating unit of DNA is the nucleotide, which consists of a five-carbon sugar known as deoxyribose; a phosphate group; and a nitrogen-containing base, which may be either a purine or a pyrimidine (Figure 30-1A). In DNA, the purine base may be either adenine (A) or guanine (G), and the pyrimidine base may be either thymine (T) or cytosine (C). Nucleotides polymerize into long chains by formation of phosphodiester bonds between the 5′ carbon position of one deoxyribose molecule and the 3′ carbon of the preceding deoxyribose molecule (Figure 30-1B).

Fig. 30-1 The chemical structure of DNA.

A, The four bases of DNA. B, The sugar-phosphate backbone and 3′–5′ phosphodiester bonds.

Each DNA molecule consists of two strands of nucleotides that are held together by weak hydrogen bonds between pairs of bases: A pairs only with T, and G pairs only with C. These paired units are known as basepairs (bp). In the native state, the two strands wind around each other to form a double helix that resembles a right-hand spiral staircase, with two unequal grooves known as the major and minor grooves (Figure 30-2). A single turn of the helix measures 3.4 nm and contains ten nucleotides. Each strand has a directionality imparted by the deoxyribose sugar backbone. Adjacent nucleotides are linked by phosphodiester bonds between the 5′ and 3′ carbon atoms of the sugar residues, so that one end of the DNA strand has an unlinked 5′ carbon (the 5′ end) and the other end of the strand has an unlinked 3′ carbon atom (the 3′ end). The two strands are antiparallel – that is, they run in opposite directions so that the 5′ end of one strand is paired with the 3′ end of the other. Within living cells, DNA is associated with proteins and supercoiled into more complex structures known as chromosomes, which are described later in the chapter.

Fig. 30-2 Packaging of DNA by structural proteins.

A, The right-handed double helix of DNA. B, This wraps around a histone core to form nucleosomes. C, The nucleosomes are packed into a solenoid structure. D, Loops of solenoids compose an interphase chromosome.

(Modified from Thompson MR et al. Genetics in medicine, 5th edn. Philadelphia: WB Saunders, 1991.)

Thus, when the sequence of one DNA strand is known, the sequence of the opposite or complementary strand may be predicted. Precise replication of DNA is therefore possible, a process that involves initiation, elongation, and termination stages. The process begins with recognition of an “origin of replication.” Such points of origin are specific DNA sequences, recognized by a protein complex known as the primosome, that occur every 50–300 kilobases (kb) of DNA; the unit kb refers to 1000 sequential nucleotides. The two parental DNA strands must first be separated by helicase, an enzyme that unwinds the supercoiled DNA helix to create a replication fork. The process of elongation occurs at the site of the replication fork or replisome. Synthesis of new strands begins with the addition of approximately ten RNA bases by a protein complex known as primase, and then continues with chain elongation using the original strands as templates. This process is known as semiconservative replication. Both initiation or RNA priming and chain elongation involve large protein complexes that include several DNA polymerases.

Five distinct DNA polymerases have been isolated in mammalian systems, including human cell cultures (Table 30-2). They are able to copy DNA only by adding nucleotides to the 3′ end of the growing chain, so DNA can elongate only in the 5′ to 3′ direction. Thus, the template DNA can be read only in the reverse, or 3′ to 5′, direction. As DNA is unwound, the replication fork necessarily unwinds one strand in the 3′ to 5′ direction and the other in the 5′ to 3′ direction. The 3′ to 5′ or leading strand is replicated in a continuous fashion at the replication fork by DNA polymerases α(I), which primes the reaction, and δ(III), which synthesizes the DNA chain. The new strand is complementary and so elongates in the opposite, or 5′ to 3′, direction.

Table 30-2 DNA Polymerases in Mammalian Systems

The 5′ to 3′, or lagging, strand cannot be copied continuously because this would require synthesis of the complementary new strand in a 3′ to 5′ direction, which is not possible, because DNA polymerases are able to synthesize DNA only in the 5′ to 3′ direction. Thus, the lagging strand must be copied by DNA polymerases α(I) and δ(III) in small segments of 100–1000 bp in the opposite direction from the replication fork. These small DNA molecules are known as Okazaki fragments. DNA replication is described as semidiscontinuous because of the continuous replication of the leading strand and the discontinuous replication of the lagging strand. The Okazaki fragments are then joined by another enzyme, DNA ligase. DNA replication is a long process, requiring about 8 hours in most human cells in culture. Thus, the function of DNA is reliably to encode and store the genetic information needed for the cell and organism to function. It has no direct functions itself but rather acts by directing synthesis of both RNA and protein.

Structure and Function of RNA

RNA differs chemically from DNA in the substitution of ribose for deoxyribose in the sugar backbone of the molecule, and of uridine (U) for thymine as one of the pyrimidine bases. Also, RNA normally exists as a single-stranded rather than double-stranded molecule. Recent advances have demonstrated far more diverse functions for RNA than were previously appreciated, particularly involving genes that produce functional RNA products that do not code for proteins. These probably represent at least 5 percent of all human genes, as suggested by current knowledge [Strachan and Read, 2010]. Several distinct classes of RNA molecules have been recognized, most of which are involved with regulating or assisting gene expression.

Ribosomal RNA

Ribosomal RNAs (rRNAs) are functional RNA transcripts that constitute one of the main components of cytoplasmic ribosomes. The genes coding for the major form of cytoplasmic rRNA are located in multiple copies on the short arms of the acrocentric chromosomes: 13, 14, 15, 21, and 22. They code for a single large 45S primary transcript that is cleaved into 28S, 18S, and 5.8S rRNA classes, designated by their separation in centrifugation gradients and by several associated proteins. Multiple copies of another gene on chromosome 1 produce 5S rRNA.

Transfer RNA

Transfer RNAs (tRNAs) are small RNA transcripts that bind specific amino acids and transport them to ribosomes for use during protein synthesis. More than 40 subfamilies of tRNA genes are known, dispersed across the genome. The mitochondrial genome uses a separate set of tRNAs.

Messenger RNA

Messenger RNAs (mRNAs) are the RNA transcripts of all genes that encode polypeptides and some other genes that encode unprocessed functional RNA molecules. Most are large. All mRNA transcripts undergo further processing, including excision of large segments of noncoding RNA known as introns, the addition of 7-methylguanosine to the first 5′ nucleotide, forming a CAP structure, cleavage of the 3′ end at a specific point downstream from the end of the coding sequence, and addition of the polyA tail at a site specified in part by the sequence AAUAAA, which is located in the 3′ untranslated region (3′ UTR) of the gene. The polyA tail appears to increase the stability of mRNA. The fully processed mRNA is transported to the cytoplasm, where translation occurs.

Small Nuclear RNA

Small nuclear RNA (snRNA) transcripts are small, uridine-rich RNA transcripts that associate with specific proteins to form ribonucleoprotein particles (RNPs). Some of them function in RNA splicing (removing introns from mRNA). They comprise a large family of genes dispersed across the genome.

Small Nucleolar RNA

Small nucleolar RNAs (snoRNAs) are small RNA transcripts that are present in the nucleolus and have important roles in specific cleavage reactions and base-specific modifications during maturation of ribosomal RNA. About 200 snoRNA genes have been identified.

MicroRNA

MicroRNAs (miRNAs) are another class of small noncoding genes that regulate the expression of protein-encoding genes at the post-transcriptional RNA level [Denli et al., 2004]. The process begins with transcription (synthesis) of primary RNA transcripts that range in size from several hundred to several thousand kb. These transcripts are recognized and cut into precursor miRNAs in the nucleus by a protein known as Dicer, moved to the cytoplasm, and processed into mature miRNAs. The mature miRNAs join the RNA-induced silencing complex (RISC), which recognizes and cleaves (or otherwise silences) a target gene. This process has been demonstrated in many organisms, including mammals, and appears likely to play a key role in regulation of many genes.

Structure and Function of Polypeptides and Proteins

Proteins are composed of one or more polypeptide chains. Polypeptides are large polymers or macromolecules composed of linear sequences of repeating units known as amino acids, which are more complex than the repeating units of DNA or RNA. Amino acids consist of a three-carbon backbone, with an amino group attached to carbon 1 and a carboxyl group to carbon 3. They differ in the composition of a side chain attached to carbon 2. With rare exceptions, all polypeptides and proteins in nature are built from different sequences of 20 amino acids (Table 30-3). The side chains may be neutral and hydrophobic, neutral and polar, basic, or acidic. The simplest amino acid is valine, which has a hydrogen ion as the side chain.

Table 30-3 Classification of Amino Acids by Side Chain

Amino Acid	3-letter Code	1-letter Code
Neutral and Hydrophobic
Alanine	Ala	A
Isoleucine	Ile	I
Leucine	Leu	L
Methionine	Met	M
Phenylalanine	Phe	F
Proline	Pro	P
Tryptophan	Trp	W
Valine	Val	V
Neutral and Polar
Asparagine	Asn	N
Cysteine	Cys	C
Glutamine	Glu	Q
Glycine	Gly	G
Serine	Ser	S
Threonine	Thr	T
Tyrosine	Tyr	Y
Acidic
Aspartic acid	Asp	D
Glutamic acid	Glu	E
Basic
Arginine	Arg	R
Histidine	His	H
Lysine	Lys	K

The process of information transfer from RNA polypeptides to proteins is known as translation. It relies on the genetic code, the system by which the nucleotide sequence of mRNA specifies the amino acid sequence of a polypeptide chain. In this nearly universal code, each set of three adjacent bases in the mRNA transcript constitutes a codon, and different combinations of bases within the codon specify the individual amino acids (Table 30-4). The small tRNA molecules serve as the molecular link between mRNA codons and amino acids. One segment of each tRNA transcript contains a three-base anticodon that is complementary to a specific codon on the mRNA, whereas another segment contains a binding site for one of the 20 amino acids.

Table 30-4 The Nuclear Genetic Code

With a total of only 20 amino acids and 64 possible codons, most amino acids are specified by more than one codon. For some of the different amino acids, the base in the third position in the triplet may be either of the purines, either of the pyrimidines, or sometimes any of the four bases. For this reason, the third position in the codon sometimes is called the wobble position. Arginine and leucine are each specified by six codons, whereas only methionine and tryptophan are specified by a single codon. Three codons signal termination of translation and accordingly are called stop codons.

Transcription

The process of information transfer from DNA to RNA is known as transcription. Synthesis of RNA begins at a specific transcription start site and continues in a 5′ to 3′ direction with regard to the RNA product. The DNA strand that corresponds to the RNA sequence is known as the coding or sense strand. This strand, however, is not used as the template for synthesis of an RNA molecule. Rather, the complementary DNA strand, known as the noncoding or antisense strand, actually serves as the template and is read in the 3′ to 5′ direction. The RNA product is known as a transcript.

Translation

The process of information transfer from RNA to polypeptide or protein is known as translation. This process takes place in the cytoplasm on small structures known as ribosomes, macromolecules composed of the four species of rRNA noted earlier. They function like small migrating factories that travel along an mRNA template, engaging in rapid cycles of peptide bond synthesis. The process consists of initiation, elongation, and termination stages.

The ribosome contains a large site that binds about 35 bp of mRNA, and two adjacent sites for binding the smaller aminoacyl-tRNA molecules. The first is the acceptor or A site, which holds the incoming aminoacyl-tRNA. The second is the donor or P site, which is occupied by a tRNA carrying the growing polypeptide chain. Translation begins with mRNA binding to the ribosome at the site of the first AUG base triplet, which specifies the amino acid methionine, and also serves as the start signal for synthesis of the polypeptide chain and establishes the reading frame of the mRNA.

The mRNA and tRNA then move in the same direction along the ribosome, with the tRNA moving from the “A” site to the “P” site, and the mRNA sliding over three bases, allowing recognition of the next codon. Bonding between the mRNA codon and tRNA anticodon brings the appropriate amino acid into position on the ribosome to form a new peptide bond to the carboxyl end of the growing polypeptide chain. As part of this reaction, the polypeptide chain is released from the tRNA at the “P” site, but remains bonded to the tRNA at the “A” site. The tRNA and mRNA then move another 3 bp along the chain, and the process is repeated. This reaction continues until one of the stop codons is reached. Thus, proteins are synthesized from the amino to the carboxyl terminus, which corresponds to translation from the 5′ to the 3′ end of the mRNA molecule, and methionine is always the first amino acid of each polypeptide chain, although it usually is removed before protein synthesis is completed.

Gene Structure and Organization

As noted earlier, a gene traditionally has been defined as a unit of genetic information. This concept has gradually progressed to a more useful definition, which states that a gene is a sequence of DNA on a chromosome that is required for production of a functional product, which can be either a protein or a functional RNA molecule [Nussbaum et al., 2007]. By convention, genetic information is always read in the 5′ to 3′ direction, whether encoded in DNA or RNA – in an upstream to downstream direction. The nomenclature regarding the 5′ and 3′ positions of the sugar backbone can be confusing. The 5′ carbon of the first nucleotide of a sequence is joined by a phosphodiester bond to a nucleotide not involved in the sequence, whereas its 3′ carbon is joined to the 5′ carbon of the second nucleotide, and so on. The last nucleotide of the sequence has a 3′ carbon, which joins another uninvolved nucleotide.

Genes

Genes are composed of a continuous length of DNA with definable start and end points, which include the sequence that codes for the RNA or polypeptide product and is thus known as the coding region. It has become clear, however, that the structure of a gene is complex and includes much more than the coding sequence of the protein. All genes include additional sequences on either end of the coding region – designated the 5′ and 3′ UTRs – that do not code for an RNA product or polypeptide. These regions function to regulate transcription and RNA stability. The gene is considered to include the entire sequence represented in the RNA product because some mutations within noncoding regions can impair gene function.

A model of a typical human gene is shown in Figure 30-3. Promoter sequences required for regulation and initiation of RNA transcription (red diamonds in Figure 30-3) are present at the 5′ end of the gene, such as the CAT and TATA boxes whose sequences are tightly conserved among many different genes and species. Downstream from the promoter sequences is a specific sequence that signals the start of transcription. A short way further downstream is an initiator codon, AUG, which codes for methionine. This triplet is the translation start site, which signals the start of the coding sequence for the polypeptide product. The region between the transcription and translation start sites is the 5′ UTR.

Fig. 30-3 The structure of a typical human gene.

The gene includes a primary regulatory region known as the promoter just upstream of the transcription start site that is required for binding of both DNA and RNA polymerases (red diamonds), as well as several types of distant regulatory elements that protect the gene from regulation of other nearby genes (insulator), increase or decrease gene expression (enhancers and silencers), or regulate several genes in the region (locus control region).

The next segment of the gene is the coding region. The coding regions of most genes in prokaryotes and lower eukaryotes are colinear, which means that the coding sequence corresponds exactly to the sequence of amino acids in the polypeptide. By contrast, most higher eukaryotic genes, including human genes, contain additional sequences that lie within the coding region, interrupting the sequence that represents the polypeptide. The regions that code for the final polypeptide (or functional RNA) product are known as exons, whereas the regions that are missing from the final mRNA product are introns. The removal of introns from the final mRNA product is known as splicing, a complex process that is regulated by a large number of proteins and functional RNA transcripts.

The coding sequence ends at one of three specific stop codons: UAA, UAG, or UGA. The last segment of the gene is the 3′ UTR, which contains a polyadenylation signal and presumably a signal to end transcription, although no transcription stop sequence has been identified. The length of a gene may vary, ranging from less than 1 kb to several hundred kb. The longest gene known, which codes for dystrophin, spans more than 2000 kb of genomic sequence, although this is not the largest protein produced in the cell.

Regulatory Regions

Many genes have highly conserved sequences, a longer distance upstream and downstream of the transcribed gene, that are involved in regulating expression, including enhancers, silencers, locus control regions, and insulators (see Figure 30-3). Enhancer elements function to increase gene expression, while silencers reduce gene expression. Locus control regions may regulate expression of several genes within a chromosome region, while insulators prevent co-regulation of more distant genes and gene regions. All of these are sequences that bind proteins called transcription factors, which can be ubiquitous, tissue-specific, and/or temporally expressed. Promoters are located immediately 5′ of the gene and bind to RNA polymerase II, a necessary step for transcription. Other transcription factors bind upstream of the promoter and activate transcription. Enhancers and silencers are often located at a distance from the promoter, and increase or decrease transcription in a tissue-specific or temporal manner. Overall, the transcription of each gene is tightly regulated, with multiple transcription factors involved.

RNA Processing

Transcription of DNA gives rise to a precursor RNA that corresponds exactly to the genome sequence but must be modified in several ways to become functional, especially for mRNA. The first modification to mRNA is the addition of a CAP structure to the 5′ end and this is followed by the removal or splicing of introns. The mechanism of mRNA splicing depends on the specific nucleotide sequences at the exon/intron boundaries called splice junctions (Figure 30-4). The most important of these is the GT-AG rule: introns almost always start with GT (actually GU, because this occurs in RNA), which is therefore called the splice-donor site, and end with AG, which is called the splice-acceptor site. Several additional specific sequences are also needed, including sequences within the intron just after the GT splice-donor site, at a highly conserved branch site located about 40 bp before the end of the intron and just before the AG splice-acceptor site. The splicing mechanism produces the following:

1. cleavage at the 5′ donor site splice junction just before the invariant G

2. nucleolytic attack by the terminal G of the splice-donor site at the invariant A of the branch site to form a “lariat”-shaped structure

3. cleavage at the 3′ splice-acceptor site at the 3′ splice junction, leading to release of the intronic RNA as a lariat or loop, and splicing of the two exons.

Fig. 30-4 Consensus sequences at the splice-donor, branch, and splice-acceptor sites in introns of higher eukaryotes.

The GT dinucleotide at the start of the intron, the A near the end of the branch site, and the AG dinucleotide that ends the intron are invariant, whereas most others represent only the most common nucleotide. When two nucleotides are depicted at a single position, no preference is shown as to which is listed on the top or on the bottom. Abbreviations: A, adenine; C, cytosine; G, guanine; N, any nucleotide; T, thymine.

(Modified from Strachan T, Read AP. Human molecular genetics. New York: Wiley-Liss, 1996.)

These reactions are catalyzed by large complexes composed of snRNA and specific proteins. The snRNAs involved have specific sequences that allow binding with conserved intronic sequences or the recognition sites of other snRNAs. The snRNA–protein–target RNA complexes form large particles known as spliceosomes. Once a 5′ splice site is recognized, the complex scans the RNA sequence until it encounters a branch site, which aids in identifying the nearby 3′ splice-acceptor site. This process does not necessarily happen in linear order along the RNA. Rather, the order likely is determined by the vagaries of RNA folding. The last steps involve cleavage of part of the 3′ UTR, which occurs at a specific point downstream from the end of the coding sequence, and addition of a long sequence of adenosine nucleotides that is called the polyA tail. The site of the polyA tail is specified in part by the sequence AAUAAA, which is located within the 3′ UTR.

Imprinting and X-Inactivation

Several regions of the genome are subject to inactivation under special circumstances, with no changes to the DNA sequence. The processes involved thus represent a form of “epigenetic” modification. The two processes reviewed here, imprinting and X-chromosome inactivation, both can result in a phenotype when disrupted.

Imprinting

The process by which certain genes in specific chromosomal regions are expressed from only one chromosome, depending on the parental origin of the chromosome, is known as “imprinting.” Although the mechanism is only partly understood, a key component involves allele-specific DNA methylation, found predominantly at the carbon 5 position of about 80 percent of all cytosines that are part of symmetrical cytosine-guanine (CpG) dinucleotides [Jiang et al., 2004; Strachan and Read, 2010; Weksberg et al., 2003].

This process is controlled by regulatory imprinting “centers,” located nearby on the same chromosome as that of the silenced or “imprinted” gene. In effect, then, two alleles of the same gene that are identical in nucleotide sequence but derived from opposite parents are regulated differently in the same nucleus. This process is reversible, so that the silent, imprinted allele can be reactivated and the active allele silenced when passed through the germline of the opposite-sex parent. Most imprinted genes are found in large clusters of greater than 1 Mb (megabase pairs) in length. Imprinted clusters have been identified in chromosomes 6q24, 7p11.2, 11p15.5, 14q32, 15q11–q13, and 20q13.2, and others may exist as well [Cavaille et al., 2002; Gardner et al., 2000; Hall, 1990; Jiang et al., 2004; Weksberg et al., 2003; Wylie et al., 2000]. Imprinted regions share several common characteristics, including differential DNA methylation, allele-specific RNA transcription, antisense transcripts, histone modifications, and differences in timing of replication.

X-Inactivation

In mammalian cells with two (or more) X chromosomes, all but one undergo widespread gene silencing by methylation. This phenomenon, known as X-chromosome inactivation (Xi), causes one of the two X chromosomes in cells of female mammals to become transcriptionally inactive early in embryonic development, a phenomenon known as the Lyon hypothesis [Lyon, 1961, 2002]. In mutant cells with more than two X chromosomes, all but one become inactivated. This has the effect of balancing gene dosage of X-linked genes between male and female cells. The process of Xi is random, so that on average the maternally and paternally derived X chromosomes are each inactivated in approximately 50 percent of cells. Changes in this pattern are seen in female carriers of some X-linked diseases, resulting in skewing of Xi. This alteration can be favorable, with decreased severity of the phenotype, or unfavorable, with increased severity of the phenotype [Dobyns et al., 2004].

Cell Cycle and Chromosomal Basis of Heredity

Current knowledge regarding the chromosomal basis of heredity and that concerning the cell cycle are inextricably linked because the intracellular structures now known as chromosomes were first seen in cells undergoing cell division. The existence of chromosomes was foreshadowed by Gregor Mendel’s work. For years after he described independent sorting of genetic traits, occasional exceptions to Mendel’s law of segregation were discovered. Certain traits were found that were typically inherited as a group. These observations were eventually explained by the discovery of chromosomes. The nuclear material of a cell, or chromatin, appears homogeneous during most of the cell cycle, but condenses into distinct rod-shaped organelles during cell division. These tiny structures were called chromosomes because they stain darkly with various biologic dyes.

Cell Cycle

Humans begin life as a single diploid cell or zygote, which gives rise to all of the cells of the body by a combination of cell growth and cell division, with the latter including both asexual (mitosis) and sexual (meiosis) cell division. The life cycle of somatic cells is divided into four stages. After cell division, the cell enters the G₁ (gap 1) resting phase, during which DNA synthesis does not occur. Some differentiated cells, such as neurons, stop growth in a modified G₁ phase known as G₀. Late in G₁, the cell passes a critical point, after which it proceeds through the rest of the cell cycle at a standard rate. G₁ is followed by the S phase, during which DNA synthesis or replication occurs. The genetic material is duplicated in the form of two chromatids (future chromosomes), joined by attachment to a single centromere. The cell then enters the G₂ (gap 2) resting phase, which is much shorter than G₁. The G₁, S, and G₂ phases together constitute interphase.

Mitosis

Somatic cell division, or mitosis, is an elaborate mechanism that distributes one chromatid of each duplicated chromosome to each of the two daughter cells. The process is continuous but has been divided into the following five stages: prophase, prometaphase, metaphase, anaphase, and telophase (Figure 30-5).

Fig. 30-5 Diagram of mitosis demonstrating two chromosome pairs.

In prophase, the chromatin begins to condense, the nucleolus disappears, and the mitotic spindle begins to form. Prophase is followed by prometaphase, during which the nuclear membrane disappears, allowing the chromosomes to disperse in the cell and attach to the spindle by paired kinetochores located at the centromere. In metaphase, the chromosomes are maximally contracted and arranged at the equatorial plane of the cell. In anaphase, the replicated chromosomes separate at the centromere, allowing the two chromatids to become daughter chromosomes, which move to opposite ends of the cell. In telophase, the chromosomes decondense, the nuclear membrane reforms, and the nucleus returns to the interphase appearance. Shortly afterward, the cytoplasm divides to form two daughter cells. For routine studies, chromosomes are examined during metaphase. For high-resolution studies, they are examined before the point of maximal contraction, during prophase or prometaphase.

Meiosis

Reproductive cell division, or meiosis, is an even more complex mechanism in which two successive cell divisions, known as meiosis 1 and meiosis 2, give rise to the haploid germ cells (Figure 30-6). Meiosis is of critical importance in understanding many of the methods of modern molecular genetics and the pathogenesis of many genetic diseases.

Fig. 30-6 Diagram of meiosis depicting two chromosome pairs.

In meiosis 1, the chromosome number is reduced from the diploid to the haploid number. The key step consists of close pairing of homologous chromosomes during prophase 1, which is further divided into several stages. During leptotene, the chromosomes first become visible, with homologs located close together. During zygotene, the homologs begin to pair closely along their entire length, held together by a thin protein-containing structure known as a synaptonemal complex. During pachytene, synapsis or pairing is completed, and the homologs appear as a bivalent. Pachytene is the stage during which exchange of homologous segments between nonsister chromatids occurs, which is known as recombination or crossing over. The remaining steps are similar to mitosis, except that it is the paired homologs that are pulled apart rather than the centromeres. In meiosis 2, which closely resembles mitosis, the chromatids separate at the centromere to form daughter chromosomes. Ova and sperm have remarkably different timing, but the sequence of meiosis is the same.

Chromosomal Basis of Heredity

Chromosome Structure

In humans, the nuclear DNA is dispersed among 46 separate linear structures or chromosomes, each of which consists of a single, uninterrupted double helix that contains 50–250Mb of DNA, and a group of associated proteins that form the support structure or scaffolding. The scaffolding consists of five basic proteins called histones and several more acidic nonhistone proteins. Two copies of each of four histones – H2A, H2B, H3, and H4 – join to form an octamer. The DNA double helix wraps almost twice around the octamer, which involves about 140 bp. Adjacent octamers are separated by a short spacer segment of 20–60bp that is associated with histone H1. The complex of DNA and core histones is known as a nucleosome (see Figure 30-2).

Strings of nucleosomes are further compacted into a secondary helical structure known as a solenoid. These structures have a diameter of about 30 nm (see Figure 30-2) and contain six nucleosomes per turn. The solenoids are packed into large loops of 10–100 kb of DNA, which are attached to a nonhistone protein scaffolding. These loops pack together loosely to form interphase chromosomes. During early prophase, they pack together more closely to form knoblike thickenings known as chromomeres, which then coalesce further to form the bands observed in prometaphase and metaphase chromosomes when stained with appropriate dyes.

The alternating light and dark bands that characterize all nuclear chromosomes with a variety of staining methods likely reflect the compartmentalization of the genome into isochores, defined as large regions with variation in base composition or variable spacing of scaffold attachment regions. The dark bands observed with Giemsa staining are AT-rich, replicate late in the DNA synthesis phase of the cell cycle, and contain relatively few genes. The light bands observed with Giemsa are GC-rich, replicate early, and contain many genes. Some are greatly enriched for GC and contain high concentrations of genes. Most, although not all, such bands are located near the ends or telomeres of chromosomes and therefore are known as T bands.

Specialized Regions

All nuclear chromosomes have specialized regions that are required for chromosome integrity and function, including centromeres, telomeres, and origins of replication. Centromeres are DNA sequences that act in cis. That is, they act on the chromosome on which they are located and are responsible for the segregation of chromosomes during cell division. Centromeres contain extensive repeats of an approximately 171-bp unit known as alpha-satellite DNA, the sequence of which differs slightly between each chromosome. Fragments of chromosomes that lack a centromere, known as acentric fragments, are lost during cell division.

The two ends of a chromosome are called telomeres and also are required for chromosome stability. In humans, they consist of long arrays of tandem repeats of the sequence TTAGGG, which extend about 5–20 kb. DNA polymerases are unable to replicate the telomeres because of the lack of a template. This problem is resolved by the enzyme telomerase, which contains an RNA component to serve as a template to prime further synthesis on the leading strand. Further extension of the leading strand provides the needed template for the lagging strand.

Origins of replication are specialized sequences where DNA replication begins, and thus are important in maintaining chromosome number and integrity. They consist of autonomously replicating sequence elements that contain a core consensus sequence and some imperfect copies with a length of about 50 nucleotides. A consensus human autonomously replicating sequence has been identified [Strachan and Read, 2010].

Regions of variable staining known as heterochromatin consist of long arrays of repeat sequences as short as 5 bp. These regions are located primarily in the pericentromeric regions of chromosomes 1, 9, and 16, and in distal Yq. The five human acrocentric chromosomes have small satellites attached to the short arm by short stalks or secondary constrictions that contain the rRNA genes.

Chromosome Number

Each human somatic cell contains 46 chromosomes that consist of 22 matched pairs known as autosomes and two sex chromosomes: XX in females and XY in males (Figure 30-7). In contrast, human germ cells contain only 23 chromosomes, consisting of 22 unpaired autosomes and a single sex chromosome. The former is known as the diploid or 2n number, and the latter is known as the haploid or 1n number. The autosomes were numbered according to length, with chromosome 1 the longest and chromosome 22 thought to be the shortest. Although chromosome 21 later proved to be shorter than chromosome 22, the numbers were retained for historical reasons. The two members of each pair of autosomes and the two X chromosomes in females carry the same genes and are known as homologous chromosomes, or homologs. Although they appear similar under the microscope, homologs are not strictly identical. They contain the same genes, but the nucleotide sequence differs at thousands of positions.

Fig. 30-7 Standardized diagram or idiogram of human chromosomes at the 400-band stage.

Chromosome Identification

Individual chromosomes may be seen only when tightly contracted during cell division. Since DNA replication is complete, each chromosome consists of two chromatids that are joined at the primary constriction or centromere. In standard cytogenetic nomenclature, the centromere divides the chromosome into two arms, with the shorter designated the “p” arm and the longer the “q” arm. The tip of each arm is the telomere. Human chromosomes are classified into three types according to the position of the centromere:

1. metacentric, in which the centromere is centrally placed and the two arms are of about equal length

2. submetacentric, in which the centromere is off center and the arms are of unequal length

3. acrocentric, in which the centromere is near one end.

Organization of the Human Genome

The human genome comprises the total of all genetic information in the cell. It is divided into two separate compartments – a large and complex nuclear genome and a much smaller and simpler mitochondrial genome. The mitochondrial genome consists of a single circular DNA molecule that is present in many copies in each mitochondrion, while the nuclear genome is distributed among the 46 nuclear chromosomes. The available data regarding the genome have become much more extensive and accurate with completion of the Human Genome Project. A few of the most useful Human Genome Project-related websites are listed in Table 30-1.

The Nuclear Genome

The human nuclear genome consists of approximately 3 × 10⁹ bp, or 3000 Mb of DNA. About 75 percent of this represents unique or single-copy DNA, which includes genes and some important regulatory elements. The remaining 25 percent consists of several classes of repetitive DNA [Lander et al., 2001; Nussbaum et al., 2007; Venter et al., 2001].

Genes and Conserved Noncoding DNA

Somewhat surprisingly, recent estimates predict that the human genome contains less than 30,000 protein-coding genes (possibly closer to 20,000) and an uncertain number of other genes producing functional RNA products. This is far fewer than earlier estimates, and accounts for only about 1.2 percent of nuclear DNA [Lander et al., 2001; Venter et al., 2001]. Another 5 percent of the human genome is more conserved than would be expected from estimates of neutral evolution, which suggests that many of these regions have specific, regulatory functions [Chiaromonte et al., 2003; Waterston et al., 2002]. Studies of these highly conserved regions of DNA have used different thresholds, such as stretches of more than 100 bp with 70–80 percent conservation between mouse and human. Some of these regions have been found to contain important noncoding elements [Dermitzakis et al., 2002, 2003; Frazer et al., 2004; Hardison, 2000]. More stringent analysis demonstrates that the human genome contains 481 sequences of 200 or more bp that are 100 percent conserved among human, mouse, and rat [Bejerano et al., 2004]. These segments were designated “ultra-conserved elements,” and are preferentially located near genes involved in RNA processing or regulation of transcription and development. Similarly, about 5000 sequences of 100 bp or more are conserved among these three species, which emphasizes that noncoding sequences are common and important.

Repetitive DNA

Repetitive DNA in the human genome consists of several classes of DNA whose nucleotide sequence is repeated, either exactly or with minor variations, hundreds to millions of times. Some classes are clustered, whereas others are dispersed throughout the genome. Clustered, repeated sequences constitute 10–15 percent of the genome and are collectively called satellite DNA because of their separation from other DNA on density centrifugation. Satellite DNA consists of head-to-tail or tandem arrayed repeat sequences that can extend for several thousand kb. Dispersed, repeat sequences constitute 6–10 percent of the genome and belong to several different classes. Minisatellite or variable number of tandem repeat (VNTR) sequences are dispersed, intermediate-length (15–65 bp) repeats that usually span only several kb. The Alu family of DNA repeats includes about 500,000 related sequences that are each about 300 bp in length and together make up about 3 percent of the genome. The L1 family of repeats includes about 10,000 related sequences that extend up to 6 kb in length and make up another 3 percent of the genome. Although the origin of these sequences is not known, no functions have been identified, and it appears likely that they simply exploit cellular processes to propagate themselves. Several classes have been useful as polymorphic DNA markers.

Low Copy Repeats

Segmental duplications, also known as low copy repeats (LCRs), are DNA sequences of 10–250 kb, present in multiple copies with greater than 95 percent sequence identity, that make up approximately 5 percent of the human genome [Babcock et al., 2003; Bailey et al., 2002; Cheung et al., 2001; Stankiewicz and Lupski, 2002]. LCRs are dynamic regions of the genome because specific repeats tend to cluster within the same genomic regions, where they mediate unequal nonhomologous recombination events, producing segmental deletions and duplications that are collectively designated “copy number variants” (CNVs). Several of these have been associated with well-known developmental disorders in humans, such as Williams’ syndrome in 7q11.23, Angelman’s syndrome and Prader–Willi syndrome in 15q12, hereditary neuropathy with predisposition to pressure palsies and Charcot–Marie–Tooth neuropathy type 1A in 17p12, Smith–Magenis syndrome in 17p11.2, and DiGeorge’s syndrome in 22q11.2 [Babcock et al., 2003]. Many new CNV-associated devlopmental brain disorders have been described over the past few years.

Polymorphisms

A mutation is a permanent change in the DNA of an individual organism, specifically a change in the nucleotide sequence anywhere in the genome [Nussbaum et al., 2007]. Genetic diseases and many cancers are caused by mutations that adversely affect function of one or more genes, although most mutations have little or no effect on gene function and therefore do not change the survival or reproductive fitness of an individual. Some of these persist in the population as morphologic variants known as polymorphisms. Sequence changes that have frequencies of less than 1 percent are known as rare variants, whereas those with frequencies of 1 percent or more are known as polymorphisms. By convention, a genetic polymorphism is defined as the occurrence of two or more variants or alleles in a region of DNA where at least two alleles appear with frequencies greater than 1 percent. Several different classes of polymorphisms occur in the genome, and several methods in molecular biology take advantage of the normal variation between individuals.