CHAPTER 5 Mapping and Identifying Genes for Monogenic Disorders
The identification of the gene associated with an inherited single gene (monogenic) disorder, as well as having immediate clinical diagnostic application, will enable an understanding of the developmental basis of the pathology with the prospect of possible therapeutic interventions. The molecular basis for more than 2700 disease phenotypes is now known.
In the 1990s a genome-wide set of microsatellites was constructed with approximately 1 marker per 10 centimorgans (cM). These 350 markers could be amplified by polymerase chain reaction (PCR) and facilitated genetic mapping studies that led to the identification of thousands of genes. This approach has been superseded by DNA microarrays or ‘single nucleotide polymorphism (SNP) chips’. Although SNPs (p. 67) are less informative than microsatellites, they can be scored automatically and microarrays are commercially available with several million SNPs distributed throughout the genome.
The common step for all approaches to identify human disease genes was the identification of a candidate gene (Figure 5.1). Candidate genes may be suggested from animal models of disease or by homology, either to a paralogous human gene (e.g., where multigene families exist) or to an orthologous gene in another species. With the sequencing of the human genome now complete, it is also possible to find new disease genes by searching through genetic databases (i.e., ‘in silico’).
Position-Independent Identification of Human Disease Genes
Functional Cloning
Functional cloning describes the identification of a human disease gene through knowledge of its protein product. From the amino-acid sequence of a protein, oligonucleotide probes could be synthesized to act as probes for screening complementary DNA (cDNA) libraries (p. 56).
Use of Animal Models
The recognition of phenotypic features in a model organism, such as the mouse, which are similar to those seen in persons affected with an inherited disorder, allowed the possibility of cloning the gene in the model organism to lead to more rapid identification of the gene responsible in humans. An example of this approach was the mapping of the gene responsible for the inherited disorder of pigmentation and deafness known as Waardenburg syndrome (p. 91) to the long arm of human chromosome 2. This region of chromosome 2 shows extensive homology, or what is known as synteny, to the region of mouse chromosome 1 to which the gene for the murine pigmentary mutant known as Splotch had been assigned. The mapping of the murine Pax3 gene, which codes for a transcription factor expressed in the developing nervous system, to this region suggested it as a positional candidate gene for the disorder. It was suggested that the pigmentary abnormalities could arise on the basis that melanocytes, in which melanin synthesis takes place, are derived from the neural crest. Identification of mutations in PAX3, the human homolog, confirmed it as the gene responsible for Waardenburg syndrome.
Next-Generation ‘Clonal’ Sequencing
This new sequencing technology shows great promise for elucidating the remaining ~55% of single gene disorders where the genetic aetiology remains unknown (Figure 5.2). The first success was in the identification of mutations in the DHODH gene that cause Miller syndrome by ‘exome’ sequencing. Around 164,000 regions encompassing exons and their conserved splice sites (a total of 27 Mb) were sequenced in a pair of affected siblings and probands from two additional families. Non-synonymous variants, splice donor/acceptor, or coding insertion/deletion mutations were identified in nearly 5000 genes in each of the two affected siblings. Filtering these variants against public databases (dbSNP and HapMap) yielded novel variants in less than 500 genes. Analysis of pooled data from the four affected patients revealed just one gene, DHODH, which contained two mutated alleles in each of the four individuals.
Positional Cloning
Linkage Analysis
Genetic mapping, or linkage analysis (p. 137), is based on genetic distances that are measured in centimorgans (cM). A genetic distance of 1 cM is the distance between two genes that show 1% recombination, that is, in 1% of meioses the genes will not be co-inherited and is equivalent to approximately 1 Mb (1 million bases). Linkage analysis is the first step in positional cloning that defines a genetic interval for further analysis.
Linkage analysis can be performed for a single, large family or for multiple families, although this assumes that there is no genetic heterogeneity (p. 378). The use of genetic markers located throughout the genome is described as a genome-wide scan. In the 1990s, genome-wide scans used microsatellite markers (a commercial set of 350 markers was popular), but microarrays with several million SNPs now provide greater statistical power.
Autozygosity mapping (also known as homozygosity mapping) is a powerful form of linkage analysis used to map autosomal recessive disorders in consanguineous pedigrees (p. 269). Autozygosity occurs when affected members of a family are homozygous at particular loci because they are identical by descent from a common ancestor.
Linkage of cystic fibrosis (CF) to chromosome 7 was found by testing nearly 50 white families with hundreds of DNA markers. The gene was mapped to a region of 500 kilobases (kb) between markers MET and D7S8 at chromosome band 7q31-32, when it became evident that the majority of CF chromosomes had a particular set of alleles for these markers (shared haplotype) that was found in only 25% of non-CF chromosomes. This finding is described as linkage disequilibrium and suggests a common mutation from a founder effect (p. 378). Extensive physical mapping studies eventually led to the identification of four genes within the genetic interval identified by linkage analysis, and in 1989 a 3-bp deletion was found within the cystic fibrosis transmembrane receptor (CFTR) gene. This mutation (p.Phe508del) was present in approximately 70% of CF chromosomes and 2% to 3% of non-CF chromosomes, consistent with the carrier frequency of 1 in 25 in whites.
Chromosome Abnormalities
Occasionally, individuals are recognized with single-gene disorders who are also found to have structural chromosomal abnormalities. The first clue that the gene responsible for Duchenne muscular dystrophy (DMD) (p. 307) was located on the short arm of the X chromosome was the identification of a number of females with DMD who were also found to have a chromosomal rearrangement between an autosome and a specific region of the short arm of one of their X chromosomes. Isolation of DNA clones spanning the region of the X chromosome involved in the rearrangement led in one such female to more detailed gene-mapping information as well as to the eventual cloning of the DMD or dystrophin gene (p. 307).
The occurrence of a chromosome abnormality and a single-gene disorder is rare, but identification of such individuals is important as it has led to the cloning of several other important disease genes in humans, such as tuberous sclerosis (p. 316) and familial adenomatous polyposis (p. 221).
Candidate Genes
Searching databases for genes with a function likely to be involved in the pathogenesis of the inherited disorder can also suggest what are known as candidate genes. If a disease has been mapped to a particular chromosomal region, any gene mapping to that region is a positional candidate gene. Data on the pattern of expression, the timing, and the distribution of tissue and cells types may suggest that a certain positional candidate gene or genes is more likely to be responsible for the phenotypic features seen in persons affected with a particular single-gene disorder. Several computer programs have been developed that can search genomic DNA sequence databases for sequence homology to known genes, as well as DNA sequences specific to all genes, such as the conserved intron–exon splice junctions, promoter sequences, polyadenylation sites and stretches of open reading frames (ORFs).
Confirmatory Testing that a Candidate Gene Is a Disease Gene
Mutations in candidate genes can be screened for by a variety of methods (p. 59) and confirmed by DNA sequencing (p. 61). Finding loss-of-function mutations or multiple different mutations that result in the same phenotype provides convincing evidence that a potential candidate gene is associated with a disorder. For example, in the absence of functional data to demonstrate the effect of the p.Phe508del mutation on the CFTR protein, confirmation that mutations in the CFTR gene caused cystic fibrosis was provided by the nonsense mutation p.Gly542X.
The Human Gene Map
The rate at which single-gene disorders and their genes are being mapped in humans is increasing exponentially (see Figure 1.6, p. 7). Many of the more common and clinically important monogenic disorders have been mapped to produce the ‘morbid anatomy of the human genome’ (Figure 5.3).