Population and Mathematical Genetics

Published on 16/03/2015 by admin

Filed under Basic Science

Last modified 22/04/2025

Print this page

rate 1 star rate 2 star rate 3 star rate 4 star rate 5 star
Your rating: none, Average: 0 (0 votes)

This article have been viewed 2292 times

CHAPTER 8 Population and Mathematical Genetics

In this chapter, some of the more mathematical aspects of gene inheritance are considered, together with how genes are distributed and maintained at particular frequencies in populations. This subject constitutes what is known as population genetics. Genetics lends itself to a numerical approach, with many of the most influential and pioneering figures in human genetics having come from a mathematical background. They were particularly attracted by the challenges of trying to determine the frequencies of genes in populations and the rates at which they mutate. Much of this early work impinges on the specialty of medical genetics, and in particular on genetic counseling, and by the end of this chapter it is hoped that the reader will have gained an understanding of the following.

Allele Frequencies in Populations

On first reflection, it would be reasonable to predict that dominant genes and traits in a population would tend to increase at the expense of recessive ones. On average, three-quarters of the offspring of two heterozygotes will manifest the dominant trait, but only one-quarter will have the recessive trait. It might be thought, therefore, that eventually almost everyone in the population would have the dominant trait. However, it can be shown that in a large randomly mating population, in which there is no disturbance by outside influences, dominant traits do not increase at the expense of recessive ones. In fact, in such a population, the relative proportions of the different genotypes (and phenotypes) remain constant from one generation to another. This is known as the Hardy-Weinberg principle, as it was proposed, independently, by an English mathematician, G. H. Hardy, and a German physician, W. Weinberg, in 1908. This is a very important principle in human genetics.

The Hardy-Weinberg Principle

Consider an ‘ideal’ population in which there is an autosomal locus with two alleles, A and a, that have frequencies of p and q, respectively. These are the only alleles found at this locus, so that p + q = 100%, or 1. The frequency of each genotype in the population can be determined by construction of a Punnett square, which shows how the different genes can combine (Figure 8.1).

From Figure 8.1, it can be seen that the frequencies of the different genotypes are:

Genotype Phenotype Frequency
AA A p2
Aa A 2pq
Aa a q2

If there is random mating of sperm and ova, the frequencies of the different genotypes in the first generation will be as shown. If these individuals mate with one another to produce a second generation, Punnett square can again be used to show the different matings and their frequencies (Figure 8.2).

From Figure 8.2 the total frequency for each genotype in the second generation can be derived (Table 8.1). This shows that the relative frequency or proportion of each genotype is the same in the second generation as in the first. In fact, no matter how many generations are studied, the relative frequencies will remain constant. The actual numbers of individuals with each genotype will change as the population size increases or decreases, but their relative frequencies or proportions remain constant. This is the fundamental tenet of the Hardy-Weinberg principle. When studies confirm that the relative proportions of each genotype remain constant with frequencies of p2, 2pq, and q2, then that population is said to be in Hardy-Weinberg equilibrium for that particular genotype.

Factors that Can Disturb Hardy-Weinberg Equilibrium

So far, this relates to an ‘ideal’ population. By definition such a population is large and shows random mating with no new mutations and no selection for or against any particular genotype. For some human characteristics, such as neutral genes for blood groups or enzyme variants, these criteria can be fulfilled. However, several factors can disturb Hardy-Weinberg equilibrium, either by influencing the distribution of genes in the population or by altering the gene frequencies. These factors include:

Selection

In the ‘ideal’ population there is no selection for or against any particular genotype. In reality, for deleterious characteristics there is likely to be negative selection, with affected individuals having reduced reproductive (= biological = ‘genetic’) fitness. This implies that they do not have as many offspring as unaffected members of the population. In the absence of new mutations, this reduction in fitness will lead to a gradual reduction in the frequency of the mutant gene, and hence disturbance of Hardy-Weinberg equilibrium.

Selection can act in the opposite direction by increasing fitness. For some autosomal recessive disorders there is evidence that heterozygotes show a slight increase in biological fitness compared with unaffected homozygotes—referred to as heterozygote advantage. The best understood example is sickle-cell disease, in which affected homozygotes have severe anemia and often show persistent ill-health (p. 159). However, heterozygotes are relatively immune to infection with Plasmodium falciparum malaria because their red blood cells undergo sickling and are rapidly destroyed when invaded by the parasite. In areas where this form of malaria is endemic, carriers of sickle-cell anemia (sickle cell trait), have a biological advantage compared with unaffected homozygotes. Therefore, in these regions the proportion of heterozygotes tends to increase relative to the proportions of normal and affected homozygotes, and Hardy-Weinberg equilibrium is disturbed.

Validity of Hardy-Weinberg Equilibrium

It is relatively simple to establish whether a population is in Hardy-Weinberg equilibrium for a particular trait if all possible genotypes can be identified. Consider a system with two alleles, A and a, with three resulting genotypes, AA, Aa/aA, and aa. Among 1000 individuals selected at random, the following genotype distributions are observed:

AA 800
Aa/aA 185
aa 15

From these figures, the incidence of the A allele (p) equals [(2 × 800) + 185]/2000 = 0.8925 and the incidence of the a allele (q) equals [185 + (2 × 15)]/2000 = 0.1075.

Now consider what the expected genotype frequencies would be if the population were in Hardy-Weinberg equilibrium, and compare these with the observed values:

Genotype Observed Expected
AA 800 796.5 (p2 × 1000)
Aa/aA 185 192 (2pq × 1000)
aa 15 11.5 (q2 × 1000)

These observed and expected values correspond closely and formal statistical analysis with a χ2 test would confirm that the observed values do not differ significantly from those expected if the population is in equilibrium.

Next consider a different system with two alleles, B and b. Among 1000 randomly selected individuals the observed genotype distributions are:

BB 430
Bb/bB 540
bb 30

From these values, the incidence of the B allele (p) equals [(2 × 430) + 540]/2000 = 0.7 and the incidence of the b allele (q) equals [540 + (2 × 30)]/2000 = 0.3.

Using these values for p and q, the observed and expected genotype distributions can be compared:

Genotype Observed Expected
BB 430 490 (p2 × 1000)
Bb/bB 540 420 (2pq × 1000)
bb 30 90 (q2 × 1000)

These values differ considerably, with an increased number of heterozygotes at the expense of homozygotes. Such deviation from Hardy-Weinberg equilibrium should prompt a search for factors that could result in increased numbers of heterozygotes, such as heterozygote advantage or negative assortative mating—i.e., the attraction of opposites!

Despite the number of factors that can disturb Hardy-Weinberg equilibrium, most populations are in equilibrium for most genetic traits, and significant deviations from expected genotype frequencies are unusual.

Applications of Hardy-Weinberg Equilibrium

Estimation of Carrier Frequencies

If the incidence of an AR disorder is known, it is possible to calculate the carrier frequency using some relatively simple algebra. For example, if the disease incidence is 1 in 10,000, then q2 = image and q = image. Because p + q = 1, therefore p = image. The carrier frequency can then be calculated as 2 × image × image (i.e., 2pq), which approximates to 1 in 50. Thus, a rough approximation of the carrier frequency can be obtained by doubling the square root of the disease incidence. Approximate values for gene frequency and carrier frequency derived from the disease incidence can be extremely useful in genetic risk counseling (p. 266) (Table 8.2). However, if the disease incidence includes cases resulting from consanguineous relationships, then it is not valid to use the Hardy-Weinberg principle to calculate heterozygote frequencies because a high incidence of consanguinity disturbs the equilibrium by leading to a relative increase in the proportion of affected homozygotes.

Table 8.2 Approximate Values for Gene Frequency and Carrier Frequency Calculated from the Disease Incidence Assuming Hardy-Weinberg Equilibrium

Disease Incidence (q2) Gene Frequency (q) Carrier Frequency (2pq)
1/1000 1/32 1/16
1/2000 1/45 1/23
1/5000 1/71 1/36
1/10,000 1/100 1/50
1/50,000 1/224 1/112
1/100,000 1/316 1/158

For an X-linked recessive (XLR) disorder, the frequency of affected males equals the frequency of the mutant allele, q. Thus, for a trait such as red-green color blindness, which affects approximately 1 in 12 male western European whites, q = image and p = image. This means that the frequency of affected females (q2) and carrier females (2pq) is image and image, respectively.

Estimation of Mutation Rates

Direct Method

If an autosomal dominant (AD) disorder shows full penetrance, and is therefore always expressed in heterozygotes, an estimate of its mutation rate can be made relatively easily by counting the number of new cases in a defined number of births. Consider a sample of 100,000 children, 12 of whom have a particular AD disorder such as achondroplasia (p. 93). Only two of these children have an affected parent, so that the remaining 10 must have acquired their disorder as a result of new mutations. Therefore 10 new mutations have occurred among the 200,000 genes inherited by these children (because each child inherits two copies of each gene), giving a mutation rate of 1 per 20,000 gametes per generation. In fact, this example is unusual because all new mutations in achondroplasia occur on the paternally derived chromosome 4; therefore the mutation rate is 1 per 10,000 in spermatogenesis and, as far as we know, zero in oogenesis.

Why are Some Genetic Disorders More Common than Others?

It follows that if a gene has a high mutation rate, the disease incidence may be relatively high. However, factors other than the mutation rate and biological fitness may be involved, as mentioned previously. These are now considered in the context of population size.

Small Populations

Several rare AR disorders show a relatively high incidence in certain population groups (Table 8.3). High allele frequencies are usually explained by the combination of a founder effect together with social, religious, or geographical isolation—hence the term genetic isolates. In some situations, genetic drift may have played a role.

Table 8.3 Rare Recessive Disorders that Are Relatively Common in Certain Groups of People

Group Disorder Clinical Features
Finns Congenital nephrotic syndrome Edema, proteinuria, susceptibility to infection
  Aspartylglycosaminuria Progressive mental and motor deterioration, coarse features
  Mulibrey nanism Muscle, liver, brain and eye involvement
  Congenital chloride diarrhea Reduced Cl absorption, diarrhea
  Diastrophic dysplasia Progressive epiphyseal dysplasia with dwarfism and scoliosis
Amish Cartilage–hair hypoplasia Dwarfism, fine, light-colored and sparse hair
  Ellis–van Creveld syndrome Dwarfism, polydactyly, congenital heart disease
  Glutaric aciduria type 1 Episodic encephalopathy and cerebral palsy-like dystonia
Hopi and San Blas Indians Albinism Lack of pigmentation
Ashkenazi Jews Tay-Sachs disease Progressive mental and motor deterioration, blindness
  Gaucher disease Hepatosplenomegaly, bone lesions, skin pigmentation
  Dysautonomia Indifference to pain, emotional lability, lack of tears, hyperhidrosis
Karaite Jews Werdnig-Hoffmann disease Infantile spinal muscular atrophy
Afrikaners Sclerosteosis Tall stature, overgrowth of craniofacial bones with cranial nerve palsies, syndactyly
  Lipoid proteinosis Thickening of skin and mucous membranes
Ryukyan islands (off Japan) ‘Ryukyan’ spinal muscular atrophy Muscle weakness, club foot, scoliosis

For example, several very rare AR disorders occur at relatively high frequency in the Old Order Amish living in Pennsylvania—Christians originating from the Anabaptist movement who fled Europe during religious persecution in the eighteenth century. Original founders of the group must have carried abnormal alleles that became established at relatively high frequency due to the restricted number of partners available to members of the community.

Founder effects can also be observed in AD disorders. Variegate porphyria, which is characterized by photosensitivity and drug-induced neurovisceral disturbance, has a high incidence in the Afrikaner population of South Africa, believed to be due to one of the early Dutch settlers having transmitted the condition to a large number of descendants (p. 109).

Interestingly, the Hopi Indians of Arizona show a high incidence of albinism. Affected males were excused from outdoor farming activities because of the health and visual problems of bright sunlight, thus providing more opportunity to reproduce relative to unaffected group members.

Large Populations

When a serious AR disorder, resulting in reduced fitness in affected homozygotes, has a high incidence in a large population, the explanation is presumed to lie in either a very high mutation rate and/or a heterozygote advantage. The latter explanation is the more probable for most AR disorders (Table 8.4).

Heterozygote Advantage

For sickle cell (SC) anemia (p. 159) and thalassemia (p. 161), there is very good evidence that heterozygote advantage results from reduced susceptibility to Plasmodium falciparum malaria, as explained in Chapter 10. Americans of Afro-Caribbean origin are no longer exposed to malaria, so it would be expected that the frequency of the SC allele in this group would gradually decline. However, the predicted rate of decline is so slow that it will be many generations before it is detectable.

For several AR disorders the mechanisms proposed for heterozygote advantage are largely speculative (see Table 8.4). The discovery of the cystic fibrosis (CF) gene, with the subsequent elucidation of the role of its protein product in membrane permeability (p. 301), supports the hypothesis of selective advantage through increased resistance to the effects of gastrointestinal infections, such as cholera and dysentery, in the heterozygote. This relative resistance could result from reduced loss of fluid and electrolytes. It is likely that this selective advantage was of greatest value several hundred years ago when these infections were endemic in Western Europe. If so, a gradual decline in the incidence of CF would be expected. However, if this theory is correct one has to ask why CF has not become relatively common in other parts of the world where gastrointestinal infections are endemic, particularly the tropics; in fact, the opposite is the case, for CF is rare in these regions.

An alternative, but speculative, mechanism for the high incidence of a condition such as CF is that the mutant allele is preferentially transmitted at meiosis. This type of segregation distortion, whereby an allele at a particular locus is transmitted more often than would be expected by chance (i.e., in more than 50% of gametes), is referred to as meiotic drive. Firm evidence for this phenomenon in CF is lacking, although it has been demonstrated in the AD disorder myotonic dystrophy (p. 295).

A major practical problem when studying heterozygote advantage is that even a tiny increase in heterozygote fitness, compared with the fitness of unaffected homozygotes, can be sufficient to sustain a high allele frequency. For example, in CF, with an allele frequency of approximately 1 in 50, a heterozygote advantage of 2% to 3% would be sufficient to account for the high allele frequency.

Genetic Polymorphism

Polymorphism is the occurrence in a population of two or more genetically determined forms (alleles, sequence variants) in such frequencies that the rarest of them could not be maintained by mutation alone. By convention, a polymorphic locus is one at which there are at least two alleles, each with a frequency greater than 1%. Alleles with frequencies of less than 1% are referred to as rare variants.

In humans, at least 30% of structural gene loci are polymorphic, with each individual being heterozygous at between 10% and 20% of all loci. Known polymorphic protein systems include the ABO blood groups (p. 205) and many serum proteins, which may exhibit polymorphic electrophoretic differences—or isozymes.

DNA polymorphisms, including SNPs, have been crucial to positional cloning, gene mapping, and isolation of many disease genes (p. 75). They are also used in gene tracking (p. 70) in the clinical context of presymptomatic tests, prenatal diagnosis and carrier detection for many single-gene disorders where direct mutation analysis may not be possible. The value of a particular polymorphic system is assessed by determining its polymorphic information content (PIC). The higher the PIC value, the more likely it is that a polymorphic marker will be of value in linkage analysis and gene tracking.

Segregation Analysis

Segregation analysis refers to the study of the way in which a disorder is transmitted in families so as to establish the underlying mode of inheritance. The mathematical aspects of segregation analysis are very complex and far beyond the scope of this book—and most doctors! However, it is important that those who encounter families with genetic disease have some understanding of the principles involved and some awareness of the pitfalls and problems.

Autosomal Recessive Inheritance

For disorders thought to follow AR inheritance, formal segregation analysis is much more difficult. This is because some couples who are both carriers will by chance not have affected children, therefore not feature in ascertainment. To illustrate this, consider 64 possible sibships of size 3 in which both parents are carriers, drawn from a large hypothetical population (Table 8.5). The sibship structure shown in Table 8.5 is that which would be expected, on average.

In this population, on average, 27 of the 64 sibships will not contain any affected individuals. This can be calculated simply by cubing image—i.e., image × image × image = image. Therefore, when the families are analyzed, these 27 sibships containing only healthy individuals will not be ascertained—referred to as incomplete ascertainment. If this is not taken into account, a falsely high segregation ratio of 0.43 will be obtained instead of the correct value of 0.25.

Mathematical methods have been devised to cater for incomplete ascertainment, but analysis is usually further complicated by problems associated with achieving full or complete ascertainment. In practice ‘proof’ of AR inheritance requires accurate molecular or biochemical markers for carrier detection. Affected siblings (especially when at least one is female) born to unaffected parents usually suggests AR inheritance, but somatic and germline parental mosaicism (p. 121), non-paternity, and other possibilities need to be considered. There are some good examples of conditions originally reported to follow AR inheritance but subsequently shown to be dominant with germline or somatic mosaicism; for example, osteogenesis imperfecta and pseudoachondroplasia. However, a high incidence of parental consanguinity undoubtedly provides strong supportive evidence for AR inheritance, as first noted by Bateson and Garrod in 1902 (pp. 7, 113).

Genetic Linkage

Mendel’s third law—the principle of independent assortment—states that members of different gene pairs assort to gametes independently of one another (p. 5). Stated more simply, the alleles of genes at different loci segregate independently. Although this is true for genes on different chromosomes, it is not always true for genes that are located on the same chromosome (i.e., close together, or syntenic).

Two loci positioned adjacent, or close, to each other on the same chromosome, will tend to be inherited together, and are said to be linked. The closer they are, the less likely they will be separated by a crossover, or recombination, during meiosis I (Figure 8.5).

Linked alleles on the same chromosome are said to be in coupling, whereas those on opposite homologous chromosomes are described as being in repulsion. This is known as the linkage phase. Thus in the parental chromosomes in Figure 8.5, C, A and B, as well as a and b, are in coupling, whereas A and b, as well as a and B, are in repulsion.

Linkage Analysis

Linkage analysis has proved invaluable for mapping genes (see Chapter 5). It is based on studying the segregation of the disease with polymorphic markers from each chromosome—preferably in large families. Eventually a marker will be identified that co-segregates with the disease more often than would be expected by chance (i.e., the marker and disease locus are linked). The mathematical analysis tends to be very complex, particularly if many closely adjacent markers are being used, as in multipoint linkage analysis. However, the underlying principle is relatively straightforward and involves the use of likelihood ratios, the logarithms of which are known as LOD scores (logarithm of the odds).

LOD Scores

When studying the segregation of alleles at two loci that could be linked, a series of likelihood ratios is calculated for different values of the recombination fraction (θ), ranging from θ = 0 to θ = 0.5. The likelihood ratio at a given value of θ equals the likelihood of the observed data, if the loci are linked at recombination value of θ, divided by the likelihood of the observed data if the loci are not linked (θ = 0.5). The logarithm to the base 10 of this ratio is known as the LOD score (Z)—i.e., LOD (θ) = log10 [Lθ/L(0.5)]. Logarithms are used because they allow results from different families to be added together.

For example, when a research paper reports that linkage of a disease with a DNA marker has been identified with a LOD score (Z) of 4 at recombination fraction (θ) 0.05, this means that the results, in the families studied, indicate that it is 10,000 (104) times more likely that the disease and marker loci are closely linked (i.e., 5 cM apart) than that they are not linked. It is generally agreed that a LOD score of +3 or more is confirmation of linkage. This would yield a ratio of 1000 to 1 in favor of linkage; however, because there is a prior probability of only 1 in 50 that any two given loci are linked, a LOD score of +3 means that the overall probability that the loci are linked is approximately 20 to 1—i.e., [1000 × image]:1. The importance of taking prior probabilities into account in probability theory is discussed in the section on Bayes’ theorem (p. 339).

A ‘Simple’ Example

Consider a three-generation family in which several members have an AD disorder (Figure 8.6). A and B are alleles at a locus that is being tested for linkage to the disease locus.

To establish whether it is likely that these two loci are linked, the LOD score is calculated for various values of θ. The value of θ that gives the highest LOD score is taken as the best estimate of the recombination fraction. This is known as a maximum likelihood method.

To demonstrate the underlying principle, the LOD score is calculated for a value of θ equal to 0.05. If θ equals 0.05 then the loci are linked, in which case the disease gene and the B marker must be on the same chromosome in II2, as both of these characteristics have been inherited from the mother. Thus in II2 the linkage phase is known: the disease allele and the B allele are in coupling. Therefore the probability that III1 will be affected and will also inherit the B marker equals 0.95 (i.e., 1 – θ). A similar result is obtained for the remaining three members of the sibship in generation III, giving a value for the numerator of (0.95)4. If the loci are not linked, the likelihood of observing both the disease and marker B in III1 equals 0.5. A similar result is obtained for his three siblings, giving a value for the denominator of (0.5)4.

Therefore the LOD score for this family, given a value of θ = 0.05, equals log10 0.954/0.54 = log10 13.032 = 1.12. For a value of θ = 0, the LOD score equals log10 14/0.54 = log10 16 = 1.20. For a value of θ = 0.1, the LOD score equals log10 0.94/0.54 = log10 10.498 = 1.02. The highest LOD score is obtained for a value of θ equals 0, which is consistent with the fact that if the disease and marker loci are linked then no recombination has occurred between the two loci in members of generation III.

To confirm linkage other families would have to be studied by pooling all the results until a LOD score of +3 or greater was obtained. A LOD score of −2 or less is taken as proof that the loci are not linked. This less stringent requirement for proof of non-linkage (i.e., a LOD score of −2 compared with +3 for proof of linkage) is due to the high prior probability of image that any two loci are not linked.

Multipoint Linkage Analysis

Two-point linkage analysis is often used to map a disease locus to a specific chromosome region. This gives a rather rough or ‘coarse’ indication of the location of the disease locus. The next step often involves multipoint linkage analysis using a series of polymorphic markers that are known to map to the disease region. This process allows fine tuning of the probable position of the disease locus within the rough interval defined by the small number of polymorphic marker loci.

Using this approach the results of linkage studies with the various markers are analyzed by a computer program that calculates the overall likelihood of the position of the disease locus in relation to the marker loci. The results are presented in the form of a likelihood ratio known as a location score. This is calculated for different positions of the disease locus and a graph is drawn up of location score against map distance (Figure 8.7). On this graph the peaks represent possible positions of the disease locus, with the tallest peak being the most probable location. The troughs represent the positions of the polymorphic marker loci.

Multipoint linkage analysis is used to define the smallest possible interval in which a disease locus is located, so that physical mapping methods can then be applied to isolate the disease gene (see Chapter 5).

Autozygosity Mapping

This ingenious form of linkage analysis has been used to map many rare AR disorders. Autozygosity occurs when individuals are homozygous at particular loci by descent from a common ancestor. In an inbred pedigree containing two or more children with a rare AR disorder, it is very likely that the children will be homozygous not only at the disease locus but also at closely linked loci. In other words, all affected relatives in an inbred family will be homozygous for markers within the region surrounding the disease locus. Thus a search can be made for shared areas of homozygosity in affected relatives using highly polymorphic markers such as microsatellites (p. 69). In a pedigree with a relatively large number of affected individuals, only a small number of shared homozygous regions will be identified; one of these can be expected to harbor the relevant disease locus, which can then be isolated using physical mapping strategies.

Autozygosity mapping can be applied in both small inbred families (Figure 8.8) and in genetic isolates (p. 133) with a shared common genetic ancestry (e.g., the Old Order Amish). It is a particularly powerful technique in large inbred families in which more than one branch has affected individuals. Several of the genes that cause AR sensorineural hearing loss have been mapped in this way, as well as a number of skeletal dysplasias and primary microcephalies, for example.

Linkage Disequilibrium

Linkage disequilibrium is defined formally as the association of two alleles at linked loci more frequently than would be expected by chance, and is also referred to as allelic association. The concept and the term relate to the study of diseases in populations rather than families. In the latter, an association between specific alleles and the disease in question holds true only within an individual family; in a separate affected family a different pattern of alleles, or markers, at the same locus may show association with the disease—because the alleles themselves are polymorphic.

The rationale for studying allelic association in populations is based on the assumption that a mutation occurred in a founder case some generations previously and is still causative of the disease. If this is true, the pattern of markers in a small region close to the mutation will have been maintained and thus constitutes what is termed the founder haplotype. The underlying principles used in mapping are the same as those for linkage analysis in families, the difference being the degree of relatedness of the individuals under study. In the pedigree shown in Figure 8.6, support was obtained for linkage of the disease gene with the B marker allele. Assume that further studies confirm linkage of these loci and that the A and B alleles have an equal frequency of 0.5. It would be reasonable to expect that the disease gene would be in coupling with allele A in approximately 50% of families and with allele B in the remaining 50%. If, however, the disease allele was found to be in coupling exclusively with one particular marker allele, this would be an example of linkage disequilibrium.

The demonstration of linkage disequilibrium in a particular disease suggests that the mutation causing the disease has occurred relatively recently and that the marker locus studied is very closely linked to the disease locus. There may be pitfalls, however, in interpreting haplotype data that suggest linkage disequilibrium. Other possible reasons for linkage disequilibrium include: (1) the rapid growth of genetically isolated populations leading to large regions of allelic association throughout the genome; (2) selection, whereby particular alleles enhance or diminish reproductive fitness; and (3) population admixture, where population subgroups with different patterns of allele frequencies are combined into a single study. Allowance for the latter problem can be made by using family-based controls and analyzing the transmission of alleles using a method called the transmission/disequilibrium test. This uses the fact that transmitted and non-transmitted alleles from a given parent are paired observations, and examines the preferential transmission of one allele over the other in all heterozygous parents. The technique has been applied, amongst others, to studies based on sibling pairs that are discordant for the disease or condition under study.

Medical and Societal Intervention

Recent developments in molecular biology, such as the human genome project (p. 9) and pilot studies using gene therapy (p. 350), have reawakened concern that future generations could have to cope with an ever increasing burden of genetic disease. The term eugenics was first used by Charles Darwin’s cousin, Francis Galton, to refer to the improvement of a population by selective breeding. The notion that this should be applied to human populations became popular during the early years of the twentieth century, culminating in the horrifying practices of Nazi Germany. Ensuing revulsion led to the abandonment of eugenic programs in humans, with universal condemnation and agreement that such programs have no place in modern medical practice. Sadly, however, these practices have continued by groups engaged in territorial conflicts—somewhat sanitized by the term ‘ethnic cleansing.’

Doctors caring for patients and families with hereditary disease inevitably give priority to treatment and improving survival. By so doing biological fitness may be increased, leading to increased numbers of ‘bad genes’ in society, potentially adding adversely to humanity’s future genetic load. Such long-term consequences generally carry no weight, but the approach has sometimes been interpreted as dysgenic.

The ethical debate is important but it is worth considering the possible long-term effects of artificial selection for or against genetic disorders, according to pattern of inheritance.

AD Disorders

If everyone with an AD disorder were successfully encouraged not to reproduce, the incidence of that disorder would decline rapidly, with all future cases being the result only of new mutations. This would have a particularly striking effect on the incidence of relatively mild conditions such as familial hypercholesterolemia, in which genetic fitness is close to 1.

Alternatively, if successful treatment became available for all patients with a serious AD disorder that at present is associated with a marked reduction in genetic fitness, there would be an immediate increase in the frequency of the disease gene followed by a more gradual leveling off at a new equilibrium level. If, at one time, all those with a serious AD disorder died in childhood (f = 0), then the incidence of affected individuals would be 2µ. If treatment raised the fitness from 0 to 0.9, the incidence of affected children in the next generation would rise to 2µ due to new mutations plus 1.8µ inherited, which equals 3.8µ. Eventually a new equilibrium would be reached, by which time the disease incidence would have risen tenfold to 20µ. This can be calculated relatively easily with the formula µ = [I(1 − f)]/2 (p. 133), which can also be expressed as I = 2µ/(1 – f). The net result would be that the proportion of affected children who died would be lower (from 100% down to 10%), but the total number affected would be much greater, although the actual number who died from the disease would remain unchanged at 2µ.

Conclusion

In reality it is extremely difficult to predict the long-term impact of medical intervention on the incidence and burden of genetic disease. Although it is true that improvements in medical treatment could result in an increased genetic load in future generations, it is equally possible that successful gene therapy will ease the overall burden of these disorders in terms of human suffering. Some of these arguments could have been made many years ago for other major medical developments, such as the discovery of insulin and antibiotics, which have had immeasurable financial implications in terms of the pharmaceutical industry as well as contributing to an aging population. Ultimately, how society copes with these advances and challenges provides a measure of civilization.

Further Reading

Allison AC. Protection afforded by sickle-cell trait against subtertian malarial infection. BMJ. 1954;i:290-294.

A landmark paper providing clear evidence that the sickle-cell trait provides protection against parasitemia by falciparum malaria.

Emery AEH. Methodology in medical genetics, 2nd ed. Edinburgh: Churchill Livingstone; 1986.

A useful handbook of basic population genetics and mathematical methods for analyzing the results of genetic studies.

Francomano CA, McKusick VA, Biesecker LG, Medical genetic studies in the Amish: historical perspective, eds, Am J Med Genet C Semin Med Genet; 121:2003:1-4

Haldane JBS. The rate of spontaneous mutation of a human gene. J Genet. 1935;31:317-326.

The first estimate of the mutation rate for hemophilia using an indirect method.

Hardy GH. Mendelian proportions in a mixed population. Science. 1908;28:49-50.

A short letter in which Hardy pointed out that in a large randomly mating population dominant ‘characters’ would not increase at the expense of recessives.

Khoury MJ, Beaty TH, Cohen BH. Fundamentals of genetic epidemiology. New York: Oxford University Press; 1993.

A comprehensive textbook of population genetics and its areas of overlap with epidemiology.

Ott J. Analysis of human genetic linkage. Baltimore: Johns Hopkins University Press; 1991.

A detailed mathematical explanation of linkage analysis.

Vogel F, Motulsky AG. Human genetics, problems and approaches, 3d ed. Berlin: Springer; 1997.

The definitive textbook of human genetics with extensive coverage of mathematical aspects.

Elements