From Figure 8.2 the total frequency for each genotype in the second generation can be derived (Table 8.1). This shows that the relative frequency or proportion of each genotype is the same in the second generation as in the first. In fact, no matter how many generations are studied, the relative frequencies will remain constant. The actual numbers of individuals with each genotype will change as the population size increases or decreases, but their relative frequencies or proportions remain constant. This is the fundamental tenet of the Hardy-Weinberg principle. When studies confirm that the relative proportions of each genotype remain constant with frequencies of p², 2pq, and q², then that population is said to be in Hardy-Weinberg equilibrium for that particular genotype.

Table 8.1 Frequency of the Various Types of Offspring from the Matings Shown in Figure 8.2

Factors that Can Disturb Hardy-Weinberg Equilibrium

So far, this relates to an ‘ideal’ population. By definition such a population is large and shows random mating with no new mutations and no selection for or against any particular genotype. For some human characteristics, such as neutral genes for blood groups or enzyme variants, these criteria can be fulfilled. However, several factors can disturb Hardy-Weinberg equilibrium, either by influencing the distribution of genes in the population or by altering the gene frequencies. These factors include:

1 Non-random mating

2 Mutation

3 Selection

4 Small population size

5 Gene flow (migration).

Selection can act in the opposite direction by increasing fitness. For some autosomal recessive disorders there is evidence that heterozygotes show a slight increase in biological fitness compared with unaffected homozygotes—referred to as heterozygote advantage. The best understood example is sickle-cell disease, in which affected homozygotes have severe anemia and often show persistent ill-health (p. 159). However, heterozygotes are relatively immune to infection with Plasmodium falciparum malaria because their red blood cells undergo sickling and are rapidly destroyed when invaded by the parasite. In areas where this form of malaria is endemic, carriers of sickle-cell anemia (sickle cell trait), have a biological advantage compared with unaffected homozygotes. Therefore, in these regions the proportion of heterozygotes tends to increase relative to the proportions of normal and affected homozygotes, and Hardy-Weinberg equilibrium is disturbed.

Small Population Size

In a large population, the numbers of children produced by individuals with different genotypes, assuming no alteration in fitness for any particular genotype, will tend to balance out, so that gene frequencies remain stable. However, in a small population it is possible that by random statistical fluctuation one allele could be transmitted to a high proportion of offspring by chance, resulting in marked changes in allele frequency from one generation to the next, so that Hardy-Weinberg equilibrium is disturbed. This is known as random genetic drift. If one allele is lost altogether, it is said to be extinguished and the other allele is described as having become fixed (Figure 8.3).

FIGURE 8.3 Possible effects of random genetic drift in large and small populations.

Gene Flow (Migration)

If new alleles are introduced into a population as a consequence of migration, with later intermarriage, a change in the relevant allele frequencies will result. This slow diffusion of alleles across racial or geographical boundaries is known as gene flow. The most widely quoted example is the gradient shown by the incidence of the B blood group allele throughout the world (Figure 8.4). This allele is thought to have originated in Asia and spread slowly westward as a result of admixture through invasion.

FIGURE 8.4 Distribution of blood group B throughout the world.

(From Mourant AE, Kopéc AC, Domaniewska-Sobczak K 1976 The distribution of the human blood groups and other polymorphisms, 2nd ed. Oxford University Press, London, with permission.)

Validity of Hardy-Weinberg Equilibrium

It is relatively simple to establish whether a population is in Hardy-Weinberg equilibrium for a particular trait if all possible genotypes can be identified. Consider a system with two alleles, A and a, with three resulting genotypes, AA, Aa/aA, and aa. Among 1000 individuals selected at random, the following genotype distributions are observed:

AA	800
Aa/aA	185
aa	15

From these figures, the incidence of the A allele (p) equals [(2 × 800) + 185]/2000 = 0.8925 and the incidence of the a allele (q) equals [185 + (2 × 15)]/2000 = 0.1075.

Now consider what the expected genotype frequencies would be if the population were in Hardy-Weinberg equilibrium, and compare these with the observed values:

Genotype	Observed	Expected
AA	800	796.5 (p² × 1000)
Aa/aA	185	192 (2pq × 1000)
aa	15	11.5 (q² × 1000)

These observed and expected values correspond closely and formal statistical analysis with a χ² test would confirm that the observed values do not differ significantly from those expected if the population is in equilibrium.

Next consider a different system with two alleles, B and b. Among 1000 randomly selected individuals the observed genotype distributions are:

BB	430
Bb/bB	540
bb	30

From these values, the incidence of the B allele (p) equals [(2 × 430) + 540]/2000 = 0.7 and the incidence of the b allele (q) equals [540 + (2 × 30)]/2000 = 0.3.

Using these values for p and q, the observed and expected genotype distributions can be compared:

Genotype	Observed	Expected
BB	430	490 (p² × 1000)
Bb/bB	540	420 (2pq × 1000)
bb	30	90 (q² × 1000)

These values differ considerably, with an increased number of heterozygotes at the expense of homozygotes. Such deviation from Hardy-Weinberg equilibrium should prompt a search for factors that could result in increased numbers of heterozygotes, such as heterozygote advantage or negative assortative mating—i.e., the attraction of opposites!

Despite the number of factors that can disturb Hardy-Weinberg equilibrium, most populations are in equilibrium for most genetic traits, and significant deviations from expected genotype frequencies are unusual.

Applications of Hardy-Weinberg Equilibrium

Estimation of Carrier Frequencies

If the incidence of an AR disorder is known, it is possible to calculate the carrier frequency using some relatively simple algebra. For example, if the disease incidence is 1 in 10,000, then q² = and q = . Because p + q = 1, therefore p = . The carrier frequency can then be calculated as 2 × × (i.e., 2pq), which approximates to 1 in 50. Thus, a rough approximation of the carrier frequency can be obtained by doubling the square root of the disease incidence. Approximate values for gene frequency and carrier frequency derived from the disease incidence can be extremely useful in genetic risk counseling (p. 266) (Table 8.2). However, if the disease incidence includes cases resulting from consanguineous relationships, then it is not valid to use the Hardy-Weinberg principle to calculate heterozygote frequencies because a high incidence of consanguinity disturbs the equilibrium by leading to a relative increase in the proportion of affected homozygotes.

Table 8.2 Approximate Values for Gene Frequency and Carrier Frequency Calculated from the Disease Incidence Assuming Hardy-Weinberg Equilibrium

Disease Incidence (q2)	Gene Frequency (q)	Carrier Frequency (2pq)
1/1000	1/32	1/16
1/2000	1/45	1/23
1/5000	1/71	1/36
1/10,000	1/100	1/50
1/50,000	1/224	1/112
1/100,000	1/316	1/158

For an X-linked recessive (XLR) disorder, the frequency of affected males equals the frequency of the mutant allele, q. Thus, for a trait such as red-green color blindness, which affects approximately 1 in 12 male western European whites, q = and p = . This means that the frequency of affected females (q²) and carrier females (2pq) is and , respectively.

Estimation of Mutation Rates

Direct Method

If an autosomal dominant (AD) disorder shows full penetrance, and is therefore always expressed in heterozygotes, an estimate of its mutation rate can be made relatively easily by counting the number of new cases in a defined number of births. Consider a sample of 100,000 children, 12 of whom have a particular AD disorder such as achondroplasia (p. 93). Only two of these children have an affected parent, so that the remaining 10 must have acquired their disorder as a result of new mutations. Therefore 10 new mutations have occurred among the 200,000 genes inherited by these children (because each child inherits two copies of each gene), giving a mutation rate of 1 per 20,000 gametes per generation. In fact, this example is unusual because all new mutations in achondroplasia occur on the paternally derived chromosome 4; therefore the mutation rate is 1 per 10,000 in spermatogenesis and, as far as we know, zero in oogenesis.

Indirect Method

For an AD disorder with reproductive fitness (f) equal to zero, all cases must result from new mutations. If the incidence of a disorder is denoted as I and the mutation rate as m, then as each child inherits two alleles, either of which can mutate to cause the disorder, the incidence equals twice the mutation rate (i.e., I = 2µ).

If fitness is greater than zero, and the disorder is in Hardy-Weinberg equilibrium, then genes lost through reduced fitness must be counterbalanced by new mutations. Therefore, 2µ = I(1 – f) or µ = [I(1 – f)]/2.

Thus, if an estimate of genetic fitness can be made by comparing the average number of offspring born to affected parents, to the average number of offspring born to controls such as their unaffected siblings, it will be possible to calculate the mutation rate.

For an XLR condition with an incidence in males equal to I^M, three X chromosomes are transmitted per couple per generation. Therefore, 3µ = I^M(1 – f) or µ = [I^M(1 – f)]/3.

Why is it Helpful to Know Mutation Rates?

There is a tendency to either love or hate mathematical formulae but the link between mutation rates, disease incidence, and fitness does hold practical value.

Estimation of Gene Size

If a disorder has a high mutation rate the gene may be large. Alternatively, it may contain a high proportion of GC residues and be prone to copy error, or contain a high proportion of repeat sequences (p. 23), which could predispose to misalignment in meiosis resulting in deletion and duplication.

Determination of Mutagenic Potential

Accurate methods for determining mutation rates may be useful in relation to predicted and observed differences in disease incidence in the aftermath of events such as nuclear accidents, for example Chernobyl in 1986 (p. 26).

Consequences of Treatment of Genetic Disease

As discussed later, improved treatment for serious genetic disorders may increase biological fitness, which may result in an increase in disease incidence.

Why are Some Genetic Disorders More Common than Others?

It follows that if a gene has a high mutation rate, the disease incidence may be relatively high. However, factors other than the mutation rate and biological fitness may be involved, as mentioned previously. These are now considered in the context of population size.

Small Populations

Several rare AR disorders show a relatively high incidence in certain population groups (Table 8.3). High allele frequencies are usually explained by the combination of a founder effect together with social, religious, or geographical isolation—hence the term genetic isolates. In some situations, genetic drift may have played a role.

Table 8.3 Rare Recessive Disorders that Are Relatively Common in Certain Groups of People

Group	Disorder	Clinical Features
Finns	Congenital nephrotic syndrome	Edema, proteinuria, susceptibility to infection
	Aspartylglycosaminuria	Progressive mental and motor deterioration, coarse features
	Mulibrey nanism	Muscle, liver, brain and eye involvement
	Congenital chloride diarrhea	Reduced Cl^– absorption, diarrhea
	Diastrophic dysplasia	Progressive epiphyseal dysplasia with dwarfism and scoliosis
Amish	Cartilage–hair hypoplasia	Dwarfism, fine, light-colored and sparse hair
	Ellis–van Creveld syndrome	Dwarfism, polydactyly, congenital heart disease
	Glutaric aciduria type 1	Episodic encephalopathy and cerebral palsy-like dystonia
Hopi and San Blas Indians	Albinism	Lack of pigmentation
Ashkenazi Jews	Tay-Sachs disease	Progressive mental and motor deterioration, blindness
	Gaucher disease	Hepatosplenomegaly, bone lesions, skin pigmentation
	Dysautonomia	Indifference to pain, emotional lability, lack of tears, hyperhidrosis
Karaite Jews	Werdnig-Hoffmann disease	Infantile spinal muscular atrophy
Afrikaners	Sclerosteosis	Tall stature, overgrowth of craniofacial bones with cranial nerve palsies, syndactyly
	Lipoid proteinosis	Thickening of skin and mucous membranes
Ryukyan islands (off Japan)	‘Ryukyan’ spinal muscular atrophy	Muscle weakness, club foot, scoliosis

For example, several very rare AR disorders occur at relatively high frequency in the Old Order Amish living in Pennsylvania—Christians originating from the Anabaptist movement who fled Europe during religious persecution in the eighteenth century. Original founders of the group must have carried abnormal alleles that became established at relatively high frequency due to the restricted number of partners available to members of the community.

Founder effects can also be observed in AD disorders. Variegate porphyria, which is characterized by photosensitivity and drug-induced neurovisceral disturbance, has a high incidence in the Afrikaner population of South Africa, believed to be due to one of the early Dutch settlers having transmitted the condition to a large number of descendants (p. 109).

Interestingly, the Hopi Indians of Arizona show a high incidence of albinism. Affected males were excused from outdoor farming activities because of the health and visual problems of bright sunlight, thus providing more opportunity to reproduce relative to unaffected group members.

Large Populations

When a serious AR disorder, resulting in reduced fitness in affected homozygotes, has a high incidence in a large population, the explanation is presumed to lie in either a very high mutation rate and/or a heterozygote advantage. The latter explanation is the more probable for most AR disorders (Table 8.4).

Table 8.4 Presumed Increased Resistance in Heterozygotes that could Account for the Maintenance of Various Genetic Disorders in Certain Populations

Heterozygote Advantage

For sickle cell (SC) anemia (p. 159) and thalassemia (p. 161), there is very good evidence that heterozygote advantage results from reduced susceptibility to Plasmodium falciparum malaria, as explained in Chapter 10. Americans of Afro-Caribbean origin are no longer exposed to malaria, so it would be expected that the frequency of the SC allele in this group would gradually decline. However, the predicted rate of decline is so slow that it will be many generations before it is detectable.

For several AR disorders the mechanisms proposed for heterozygote advantage are largely speculative (see Table 8.4). The discovery of the cystic fibrosis (CF) gene, with the subsequent elucidation of the role of its protein product in membrane permeability (p. 301), supports the hypothesis of selective advantage through increased resistance to the effects of gastrointestinal infections, such as cholera and dysentery, in the heterozygote. This relative resistance could result from reduced loss of fluid and electrolytes. It is likely that this selective advantage was of greatest value several hundred years ago when these infections were endemic in Western Europe. If so, a gradual decline in the incidence of CF would be expected. However, if this theory is correct one has to ask why CF has not become relatively common in other parts of the world where gastrointestinal infections are endemic, particularly the tropics; in fact, the opposite is the case, for CF is rare in these regions.

An alternative, but speculative, mechanism for the high incidence of a condition such as CF is that the mutant allele is preferentially transmitted at meiosis. This type of segregation distortion, whereby an allele at a particular locus is transmitted more often than would be expected by chance (i.e., in more than 50% of gametes), is referred to as meiotic drive. Firm evidence for this phenomenon in CF is lacking, although it has been demonstrated in the AD disorder myotonic dystrophy (p. 295).

A major practical problem when studying heterozygote advantage is that even a tiny increase in heterozygote fitness, compared with the fitness of unaffected homozygotes, can be sufficient to sustain a high allele frequency. For example, in CF, with an allele frequency of approximately 1 in 50, a heterozygote advantage of 2% to 3% would be sufficient to account for the high allele frequency.

Genetic Polymorphism

Polymorphism is the occurrence in a population of two or more genetically determined forms (alleles, sequence variants) in such frequencies that the rarest of them could not be maintained by mutation alone. By convention, a polymorphic locus is one at which there are at least two alleles, each with a frequency greater than 1%. Alleles with frequencies of less than 1% are referred to as rare variants.

In humans, at least 30% of structural gene loci are polymorphic, with each individual being heterozygous at between 10% and 20% of all loci. Known polymorphic protein systems include the ABO blood groups (p. 205) and many serum proteins, which may exhibit polymorphic electrophoretic differences—or isozymes.

DNA polymorphisms, including SNPs, have been crucial to positional cloning, gene mapping, and isolation of many disease genes (p. 75). They are also used in gene tracking (p. 70) in the clinical context of presymptomatic tests, prenatal diagnosis and carrier detection for many single-gene disorders where direct mutation analysis may not be possible. The value of a particular polymorphic system is assessed by determining its polymorphic information content (PIC). The higher the PIC value, the more likely it is that a polymorphic marker will be of value in linkage analysis and gene tracking.

Segregation Analysis

Segregation analysis refers to the study of the way in which a disorder is transmitted in families so as to establish the underlying mode of inheritance. The mathematical aspects of segregation analysis are very complex and far beyond the scope of this book—and most doctors! However, it is important that those who encounter families with genetic disease have some understanding of the principles involved and some awareness of the pitfalls and problems.

Autosomal Dominant Inheritance

For an AD disorder, the simplest approach is to compare the observed numbers of affected offspring born to affected parents with what would be expected based on the disease penetrance (i.e., 50% if penetrance is complete). A χ² test can be used to see whether the observed and expected numbers differ significantly. Care must be taken to ensure that a bias is not introduced by excluding parents who were ascertained through an affected child.

Autosomal Recessive Inheritance

For disorders thought to follow AR inheritance, formal segregation analysis is much more difficult. This is because some couples who are both carriers will by chance not have affected children, therefore not feature in ascertainment. To illustrate this, consider 64 possible sibships of size 3 in which both parents are carriers, drawn from a large hypothetical population (Table 8.5). The sibship structure shown in Table 8.5 is that which would be expected, on average.

Table 8.5 Expected Sibship Structure in a Hypothetical Population that Contains 64 Sibships Each of Size 3, in Which Both Parents are Carriers of an Autosomal Recessive Disorder. If No Allowance Is Made for Truncate Ascertainment, in that the 27 Sibships with No Affected Cases Will Not Be Ascertained, Then a Falsely High Segregation Ratio of 48/111 (= 0.43) Will Be Obtained

In this population, on average, 27 of the 64 sibships will not contain any affected individuals. This can be calculated simply by cubing —i.e., × × = . Therefore, when the families are analyzed, these 27 sibships containing only healthy individuals will not be ascertained—referred to as incomplete ascertainment. If this is not taken into account, a falsely high segregation ratio of 0.43 will be obtained instead of the correct value of 0.25.

Mathematical methods have been devised to cater for incomplete ascertainment, but analysis is usually further complicated by problems associated with achieving full or complete ascertainment. In practice ‘proof’ of AR inheritance requires accurate molecular or biochemical markers for carrier detection. Affected siblings (especially when at least one is female) born to unaffected parents usually suggests AR inheritance, but somatic and germline parental mosaicism (p. 121), non-paternity, and other possibilities need to be considered. There are some good examples of conditions originally reported to follow AR inheritance but subsequently shown to be dominant with germline or somatic mosaicism; for example, osteogenesis imperfecta and pseudoachondroplasia. However, a high incidence of parental consanguinity undoubtedly provides strong supportive evidence for AR inheritance, as first noted by Bateson and Garrod in 1902 (pp. 7, 113).

Genetic Linkage

Mendel’s third law—the principle of independent assortment—states that members of different gene pairs assort to gametes independently of one another (p. 5). Stated more simply, the alleles of genes at different loci segregate independently. Although this is true for genes on different chromosomes, it is not always true for genes that are located on the same chromosome (i.e., close together, or syntenic).

Two loci positioned adjacent, or close, to each other on the same chromosome, will tend to be inherited together, and are said to be linked. The closer they are, the less likely they will be separated by a crossover, or recombination, during meiosis I (Figure 8.5).

FIGURE 8.5 Segregation at meiosis of alleles at two loci. In A the loci are on different chromosomes and in B they are on the same chromosome but widely separated. Hence these loci are not linked and there is independent assortment. In C the loci are closely adjacent so that separation by a cross-over is unlikely (i.e., the loci are linked).

Linked alleles on the same chromosome are said to be in coupling, whereas those on opposite homologous chromosomes are described as being in repulsion. This is known as the linkage phase. Thus in the parental chromosomes in Figure 8.5, C, A and B, as well as a and b, are in coupling, whereas A and b, as well as a and B, are in repulsion.

Recombination Fraction

The recombination fraction, usually designated as θ (Greek theta), is a measure of the distance separating two loci, or more precisely an indication of the likelihood that a cross-over will occur between them. If two loci are not linked then θ equals 0.5 because, on average, genes at unlinked loci will segregate together during 50% of all meioses. If θ equals 0.05, this means that on average the syntenic alleles will segregate together 19 times out of 20 (i.e., a crossover will occur between them during, on average, only 1 in 20 meioses).

Centimorgans

The unit of measurement for genetic linkage is known as a map unit or centimorgan (cM). If two loci are 1 cM apart, a crossover occurs between them, on average, only once in every 100 meioses (i.e., θ = 0.01). Centimorgans are a measure of the genetic, or linkage, distance between two loci. This is not the same as physical distance, which is measured in base pairs (kb – kilobases: 1000 base pairs: Mb – megabases: 1,000,000 base pairs).

The human genome has been estimated by recombination studies to be about 3000 cM in length in males. Because the physical length of the haploid human genome is approximately 3 × 10⁹ bp, 1 cM corresponds to approximately 10⁶ bp (1 Mb or 1000 kb). However, the relationship between linkage map units and physical length is not linear. Some chromosome regions appear to be particularly prone to recombination—so-called ‘hotspots’—and recombination occurs less often during meiosis in males than in females, in whom the genome ‘linkage’ length has been estimated to be 4200 cM. Generally, in humans one or two recombination events take place between each pair of homologous chromosomes in meiosis I, with a total of ~40 across the entire genome. Recombination events are rare close to the centromeres but relatively common in telomeric regions.

Linkage Analysis

Linkage analysis has proved invaluable for mapping genes (see Chapter 5). It is based on studying the segregation of the disease with polymorphic markers from each chromosome—preferably in large families. Eventually a marker will be identified that co-segregates with the disease more often than would be expected by chance (i.e., the marker and disease locus are linked). The mathematical analysis tends to be very complex, particularly if many closely adjacent markers are being used, as in multipoint linkage analysis. However, the underlying principle is relatively straightforward and involves the use of likelihood ratios, the logarithms of which are known as LOD scores (logarithm of the odds).

LOD Scores

When studying the segregation of alleles at two loci that could be linked, a series of likelihood ratios is calculated for different values of the recombination fraction (θ), ranging from θ = 0 to θ = 0.5. The likelihood ratio at a given value of θ equals the likelihood of the observed data, if the loci are linked at recombination value of θ, divided by the likelihood of the observed data if the loci are not linked (θ = 0.5). The logarithm to the base 10 of this ratio is known as the LOD score (Z)—i.e., LOD (θ) = log₁₀ [Lθ/L(0.5)]. Logarithms are used because they allow results from different families to be added together.

For example, when a research paper reports that linkage of a disease with a DNA marker has been identified with a LOD score (Z) of 4 at recombination fraction (θ) 0.05, this means that the results, in the families studied, indicate that it is 10,000 (10⁴) times more likely that the disease and marker loci are closely linked (i.e., 5 cM apart) than that they are not linked. It is generally agreed that a LOD score of +3 or more is confirmation of linkage. This would yield a ratio of 1000 to 1 in favor of linkage; however, because there is a prior probability of only 1 in 50 that any two given loci are linked, a LOD score of +3 means that the overall probability that the loci are linked is approximately 20 to 1—i.e., [1000 × ]:1. The importance of taking prior probabilities into account in probability theory is discussed in the section on Bayes’ theorem (p. 339).

A ‘Simple’ Example

Consider a three-generation family in which several members have an AD disorder (Figure 8.6). A and B are alleles at a locus that is being tested for linkage to the disease locus.

FIGURE 8.6 Three-generation pedigree showing segregation of an autosomal dominant disorder and alleles (A and B) at a locus that may or may not be linked to the disease locus.

To establish whether it is likely that these two loci are linked, the LOD score is calculated for various values of θ. The value of θ that gives the highest LOD score is taken as the best estimate of the recombination fraction. This is known as a maximum likelihood method.

To demonstrate the underlying principle, the LOD score is calculated for a value of θ equal to 0.05. If θ equals 0.05 then the loci are linked, in which case the disease gene and the B marker must be on the same chromosome in II2, as both of these characteristics have been inherited from the mother. Thus in II2 the linkage phase is known: the disease allele and the B allele are in coupling. Therefore the probability that III1 will be affected and will also inherit the B marker equals 0.95 (i.e., 1 – θ). A similar result is obtained for the remaining three members of the sibship in generation III, giving a value for the numerator of (0.95)⁴. If the loci are not linked, the likelihood of observing both the disease and marker B in III1 equals 0.5. A similar result is obtained for his three siblings, giving a value for the denominator of (0.5)⁴.

Therefore the LOD score for this family, given a value of θ = 0.05, equals log₁₀ 0.95⁴/0.5⁴ = log₁₀ 13.032 = 1.12. For a value of θ = 0, the LOD score equals log₁₀ 1⁴/0.5⁴ = log₁₀ 16 = 1.20. For a value of θ = 0.1, the LOD score equals log₁₀ 0.9⁴/0.5⁴ = log₁₀ 10.498 = 1.02. The highest LOD score is obtained for a value of θ equals 0, which is consistent with the fact that if the disease and marker loci are linked then no recombination has occurred between the two loci in members of generation III.

To confirm linkage other families would have to be studied by pooling all the results until a LOD score of +3 or greater was obtained. A LOD score of −2 or less is taken as proof that the loci are not linked. This less stringent requirement for proof of non-linkage (i.e., a LOD score of −2 compared with +3 for proof of linkage) is due to the high prior probability of that any two loci are not linked.

Multipoint Linkage Analysis

Two-point linkage analysis is often used to map a disease locus to a specific chromosome region. This gives a rather rough or ‘coarse’ indication of the location of the disease locus. The next step often involves multipoint linkage analysis using a series of polymorphic markers that are known to map to the disease region. This process allows fine tuning of the probable position of the disease locus within the rough interval defined by the small number of polymorphic marker loci.

Using this approach the results of linkage studies with the various markers are analyzed by a computer program that calculates the overall likelihood of the position of the disease locus in relation to the marker loci. The results are presented in the form of a likelihood ratio known as a location score. This is calculated for different positions of the disease locus and a graph is drawn up of location score against map distance (Figure 8.7). On this graph the peaks represent possible positions of the disease locus, with the tallest peak being the most probable location. The troughs represent the positions of the polymorphic marker loci.

FIGURE 8.7 Multipoint linkage analysis. A, B, and C represent the known linkage relationships of three polymorphic marker loci. X, Y, and Z represent in descending order of likelihood the probable position of the disease locus.

Multipoint linkage analysis is used to define the smallest possible interval in which a disease locus is located, so that physical mapping methods can then be applied to isolate the disease gene (see Chapter 5).

Autozygosity Mapping

This ingenious form of linkage analysis has been used to map many rare AR disorders. Autozygosity occurs when individuals are homozygous at particular loci by descent from a common ancestor. In an inbred pedigree containing two or more children with a rare AR disorder, it is very likely that the children will be homozygous not only at the disease locus but also at closely linked loci. In other words, all affected relatives in an inbred family will be homozygous for markers within the region surrounding the disease locus. Thus a search can be made for shared areas of homozygosity in affected relatives using highly polymorphic markers such as microsatellites (p. 69). In a pedigree with a relatively large number of affected individuals, only a small number of shared homozygous regions will be identified; one of these can be expected to harbor the relevant disease locus, which can then be isolated using physical mapping strategies.

Autozygosity mapping can be applied in both small inbred families (Figure 8.8) and in genetic isolates (p. 133) with a shared common genetic ancestry (e.g., the Old Order Amish). It is a particularly powerful technique in large inbred families in which more than one branch has affected individuals. Several of the genes that cause AR sensorineural hearing loss have been mapped in this way, as well as a number of skeletal dysplasias and primary microcephalies, for example.

FIGURE 8.8 Autozygosity mapping in a family with spondylocostal dysostosis. The father of individual I₁ is the brother of I₂’s grandfather. The region of homozygosity is defined by markers D15S155 and D15S127. A mutation in the MESP2 gene was subsequently shown to be the cause of spondylocostal dysostosis in this pedigree.

Linkage Disequilibrium

Linkage disequilibrium is defined formally as the association of two alleles at linked loci more frequently than would be expected by chance, and is also referred to as allelic association. The concept and the term relate to the study of diseases in populations rather than families. In the latter, an association between specific alleles and the disease in question holds true only within an individual family; in a separate affected family a different pattern of alleles, or markers, at the same locus may show association with the disease—because the alleles themselves are polymorphic.

The rationale for studying allelic association in populations is based on the assumption that a mutation occurred in a founder case some generations previously and is still causative of the disease. If this is true, the pattern of markers in a small region close to the mutation will have been maintained and thus constitutes what is termed the founder haplotype. The underlying principles used in mapping are the same as those for linkage analysis in families, the difference being the degree of relatedness of the individuals under study. In the pedigree shown in Figure 8.6, support was obtained for linkage of the disease gene with the B marker allele. Assume that further studies confirm linkage of these loci and that the A and B alleles have an equal frequency of 0.5. It would be reasonable to expect that the disease gene would be in coupling with allele A in approximately 50% of families and with allele B in the remaining 50%. If, however, the disease allele was found to be in coupling exclusively with one particular marker allele, this would be an example of linkage disequilibrium.

The demonstration of linkage disequilibrium in a particular disease suggests that the mutation causing the disease has occurred relatively recently and that the marker locus studied is very closely linked to the disease locus. There may be pitfalls, however, in interpreting haplotype data that suggest linkage disequilibrium. Other possible reasons for linkage disequilibrium include: (1) the rapid growth of genetically isolated populations leading to large regions of allelic association throughout the genome; (2) selection, whereby particular alleles enhance or diminish reproductive fitness; and (3) population admixture, where population subgroups with different patterns of allele frequencies are combined into a single study. Allowance for the latter problem can be made by using family-based controls and analyzing the transmission of alleles using a method called the transmission/disequilibrium test. This uses the fact that transmitted and non-transmitted alleles from a given parent are paired observations, and examines the preferential transmission of one allele over the other in all heterozygous parents. The technique has been applied, amongst others, to studies based on sibling pairs that are discordant for the disease or condition under study.

Medical and Societal Intervention

Recent developments in molecular biology, such as the human genome project (p. 9) and pilot studies using gene therapy (p. 350), have reawakened concern that future generations could have to cope with an ever increasing burden of genetic disease. The term eugenics was first used by Charles Darwin’s cousin, Francis Galton, to refer to the improvement of a population by selective breeding. The notion that this should be applied to human populations became popular during the early years of the twentieth century, culminating in the horrifying practices of Nazi Germany. Ensuing revulsion led to the abandonment of eugenic programs in humans, with universal condemnation and agreement that such programs have no place in modern medical practice. Sadly, however, these practices have continued by groups engaged in territorial conflicts—somewhat sanitized by the term ‘ethnic cleansing.’

Doctors caring for patients and families with hereditary disease inevitably give priority to treatment and improving survival. By so doing biological fitness may be increased, leading to increased numbers of ‘bad genes’ in society, potentially adding adversely to humanity’s future genetic load. Such long-term consequences generally carry no weight, but the approach has sometimes been interpreted as dysgenic.

The ethical debate is important but it is worth considering the possible long-term effects of artificial selection for or against genetic disorders, according to pattern of inheritance.