Mapping and Identifying Genes for Monogenic Disorders

Published on 16/03/2015 by admin

Last modified 22/04/2025

Print this page

This article have been viewed 4473 times

CHAPTER 5 Mapping and Identifying Genes for Monogenic Disorders

The identification of the gene associated with an inherited single gene (monogenic) disorder, as well as having immediate clinical diagnostic application, will enable an understanding of the developmental basis of the pathology with the prospect of possible therapeutic interventions. The molecular basis for more than 2700 disease phenotypes is now known.

The first human disease genes identified were those with a biochemical basis where it was possible to purify and sequence the gene product. The development of recombinant DNA techniques in the 1980s enabled physical mapping strategies and led to a new approach, positional cloning. This describes the identification of a gene purely on the basis of its location, without any prior knowledge of its function. Notable early successes were the identification of the dystrophin gene (mutated in Duchenne muscular dystrophy), the cystic fibrosis transmembrane regulatory gene, and the retinoblastoma gene. Patients with chromosome abnormalities or rearrangements have often provided important clues by highlighting the likely chromosomal region of a gene associated with disease.

In the 1990s a genome-wide set of microsatellites was constructed with approximately 1 marker per 10 centimorgans (cM). These 350 markers could be amplified by polymerase chain reaction (PCR) and facilitated genetic mapping studies that led to the identification of thousands of genes. This approach has been superseded by DNA microarrays or ‘single nucleotide polymorphism (SNP) chips’. Although SNPs (p. 67) are less informative than microsatellites, they can be scored automatically and microarrays are commercially available with several million SNPs distributed throughout the genome.

The common step for all approaches to identify human disease genes was the identification of a candidate gene (Figure 5.1). Candidate genes may be suggested from animal models of disease or by homology, either to a paralogous human gene (e.g., where multigene families exist) or to an orthologous gene in another species. With the sequencing of the human genome now complete, it is also possible to find new disease genes by searching through genetic databases (i.e., ‘in silico’).

FIGURE 5.1 Pathways toward human disease gene identification.

Recent developments in sequencing technology mean that exome sequencing (analysis of the coding regions of all known genes) and even whole genome sequencing are now feasible strategies for identifying disease genes by direct identification of the causal mutation in a family (or families) with multiple affected individuals. Consequently, the timescale for identifying human disease genes has decreased dramatically from a period of years (e.g., the search for the cystic fibrosis gene in the 1980s) to weeks or perhaps even days, now that the human genome sequence is available in public databases.

Position-Independent Identification of Human Disease Genes

Before genetic mapping techniques were developed, the first human disease genes were identified through knowledge of the protein product. For disorders with a biochemical basis, this was a particularly successful strategy.

Functional Cloning

Functional cloning describes the identification of a human disease gene through knowledge of its protein product. From the amino-acid sequence of a protein, oligonucleotide probes could be synthesized to act as probes for screening complementary DNA (cDNA) libraries (p. 56).

An alternative approach was to generate an antibody to the protein for screening of a cDNA expression library.

Use of Animal Models

The recognition of phenotypic features in a model organism, such as the mouse, which are similar to those seen in persons affected with an inherited disorder, allowed the possibility of cloning the gene in the model organism to lead to more rapid identification of the gene responsible in humans. An example of this approach was the mapping of the gene responsible for the inherited disorder of pigmentation and deafness known as Waardenburg syndrome (p. 91) to the long arm of human chromosome 2. This region of chromosome 2 shows extensive homology, or what is known as synteny, to the region of mouse chromosome 1 to which the gene for the murine pigmentary mutant known as Splotch had been assigned. The mapping of the murine Pax3 gene, which codes for a transcription factor expressed in the developing nervous system, to this region suggested it as a positional candidate gene for the disorder. It was suggested that the pigmentary abnormalities could arise on the basis that melanocytes, in which melanin synthesis takes place, are derived from the neural crest. Identification of mutations in PAX3, the human homolog, confirmed it as the gene responsible for Waardenburg syndrome.

Mapping Trinucleotide Repeat Disorders

A number of human diseases are attributable to expansions of trinucleotide repeats, and in particular CAG repeat expansions which cause extended polyglutamate tracts in Huntington disease and many forms of spinocerebellar ataxia. A method developed to seek novel trinucleotide repeat expansions in genomic DNA from affected patients led to the successful identification of a CTG repeat expansion in patients with spinocerebellar ataxia type 8.

Next-Generation ‘Clonal’ Sequencing

This new sequencing technology shows great promise for elucidating the remaining ~55% of single gene disorders where the genetic aetiology remains unknown (Figure 5.2). The first success was in the identification of mutations in the DHODH gene that cause Miller syndrome by ‘exome’ sequencing. Around 164,000 regions encompassing exons and their conserved splice sites (a total of 27 Mb) were sequenced in a pair of affected siblings and probands from two additional families. Non-synonymous variants, splice donor/acceptor, or coding insertion/deletion mutations were identified in nearly 5000 genes in each of the two affected siblings. Filtering these variants against public databases (dbSNP and HapMap) yielded novel variants in less than 500 genes. Analysis of pooled data from the four affected patients revealed just one gene, DHODH, which contained two mutated alleles in each of the four individuals.

FIGURE 5.2 A strategy for disease gene identification using exome sequencing.

Positional Cloning

Positional cloning describes the identification of a disease gene through its location in the human genome, without prior knowledge of its function. It is also described as reverse genetics as it involves an approach opposite to that of functional cloning, in which the protein is the starting point.

Linkage Analysis

Genetic mapping, or linkage analysis (p. 137), is based on genetic distances that are measured in centimorgans (cM). A genetic distance of 1 cM is the distance between two genes that show 1% recombination, that is, in 1% of meioses the genes will not be co-inherited and is equivalent to approximately 1 Mb (1 million bases). Linkage analysis is the first step in positional cloning that defines a genetic interval for further analysis.

Linkage analysis can be performed for a single, large family or for multiple families, although this assumes that there is no genetic heterogeneity (p. 378). The use of genetic markers located throughout the genome is described as a genome-wide scan. In the 1990s, genome-wide scans used microsatellite markers (a commercial set of 350 markers was popular), but microarrays with several million SNPs now provide greater statistical power.

Autozygosity mapping (also known as homozygosity mapping) is a powerful form of linkage analysis used to map autosomal recessive disorders in consanguineous pedigrees (p. 269). Autozygosity occurs when affected members of a family are homozygous at particular loci because they are identical by descent from a common ancestor.

Linkage of cystic fibrosis (CF) to chromosome 7 was found by testing nearly 50 white families with hundreds of DNA markers. The gene was mapped to a region of 500 kilobases (kb) between markers MET and D7S8 at chromosome band 7q31-32, when it became evident that the majority of CF chromosomes had a particular set of alleles for these markers (shared haplotype) that was found in only 25% of non-CF chromosomes. This finding is described as linkage disequilibrium and suggests a common mutation from a founder effect (p. 378). Extensive physical mapping studies eventually led to the identification of four genes within the genetic interval identified by linkage analysis, and in 1989 a 3-bp deletion was found within the cystic fibrosis transmembrane receptor (CFTR) gene. This mutation (p.Phe508del) was present in approximately 70% of CF chromosomes and 2% to 3% of non-CF chromosomes, consistent with the carrier frequency of 1 in 25 in whites.

Contig Analysis

The aim of linkage analysis is to reduce the region of linkage as far as possible to identify a candidate region. Before publication of the human genome sequence, the next step was to construct a contig. This contig would contain a series of overlapping fragments of cloned DNA representing the entire candidate region. These cloned fragments were then used to screen cDNA libraries, to search for CpG islands (which are usually located close to genes), for zoo blotting (selection based on evolutionary conservation) and exon trapping (to identify coding regions via functional splice sites). The requirement for cloning the region of interest led to the phrase ‘cloning the gene’ for a particular disease.

Chromosome Abnormalities

Occasionally, individuals are recognized with single-gene disorders who are also found to have structural chromosomal abnormalities. The first clue that the gene responsible for Duchenne muscular dystrophy (DMD) (p. 307) was located on the short arm of the X chromosome was the identification of a number of females with DMD who were also found to have a chromosomal rearrangement between an autosome and a specific region of the short arm of one of their X chromosomes. Isolation of DNA clones spanning the region of the X chromosome involved in the rearrangement led in one such female to more detailed gene-mapping information as well as to the eventual cloning of the DMD or dystrophin gene (p. 307).

At the same time as these observations, a male was reported with three X-linked disorders: DMD, chronic granulomatous disease, and retinitis pigmentosa. He also had an unusual X-linked red cell group known as the McLeod phenotype. It was suggested that he could have a deletion of a number of genes on the short arm of his X chromosome, including the DMD gene, or what is now termed a contiguous gene syndrome. Detailed prometaphase chromosome analysis revealed this to be the case. DNA from this individual was used in vast excess to hybridize in competitive reassociation, under special conditions, with DNA from persons with multiple X chromosomes to enrich for DNA sequences that he lacked, the so-called phenol enhanced reassociation technique, or pERT, which allowed isolation of DNA clones containing portions of the DMD gene.

The occurrence of a chromosome abnormality and a single-gene disorder is rare, but identification of such individuals is important as it has led to the cloning of several other important disease genes in humans, such as tuberous sclerosis (p. 316) and familial adenomatous polyposis (p. 221).

Candidate Genes

Searching databases for genes with a function likely to be involved in the pathogenesis of the inherited disorder can also suggest what are known as candidate genes. If a disease has been mapped to a particular chromosomal region, any gene mapping to that region is a positional candidate gene. Data on the pattern of expression, the timing, and the distribution of tissue and cells types may suggest that a certain positional candidate gene or genes is more likely to be responsible for the phenotypic features seen in persons affected with a particular single-gene disorder. Several computer programs have been developed that can search genomic DNA sequence databases for sequence homology to known genes, as well as DNA sequences specific to all genes, such as the conserved intron–exon splice junctions, promoter sequences, polyadenylation sites and stretches of open reading frames (ORFs).

Identification of a gene with homology to a known gene causing a recognized inherited disorder can suggest it as a possible candidate gene for other inherited disorders with a similar phenotype. For example, the identification of mutations in the connexin 26 gene, which codes for one of the proteins that constitute the gap junctions between cells causing sensorineural hearing impairment or deafness, has led to the identification of other connexins responsible for inherited hearing impairment or deafness.

Confirmatory Testing that a Candidate Gene Is a Disease Gene

Mutations in candidate genes can be screened for by a variety of methods (p. 59) and confirmed by DNA sequencing (p. 61). Finding loss-of-function mutations or multiple different mutations that result in the same phenotype provides convincing evidence that a potential candidate gene is associated with a disorder. For example, in the absence of functional data to demonstrate the effect of the p.Phe508del mutation on the CFTR protein, confirmation that mutations in the CFTR gene caused cystic fibrosis was provided by the nonsense mutation p.Gly542X.

Further support is provided by the observation that the candidate gene is expressed in the appropriate tissues and at the relevant stages of development. The production of a transgenic animal model by the targeted introduction of the mutation into the homologous gene in another species that is shown to exhibit phenotypic features similar to those seen in persons affected with the disorder, or restoration of the normal phenotype by transfection of the normal gene into a cell line, provides final proof that the candidate gene and the disease gene are one and the same.

The Human Gene Map

The rate at which single-gene disorders and their genes are being mapped in humans is increasing exponentially (see Figure 1.6, p. 7). Many of the more common and clinically important monogenic disorders have been mapped to produce the ‘morbid anatomy of the human genome’ (Figure 5.3).

FIGURE 5.3 A gene map of the human genome with examples of some of the more common or important single genes and disorders.

α1-AT	14q32	α₁-Antitrypsin deficiency
ABO	9q34	ABO blood group
ACTH	2p25	Adrenocorticotrophic hormone deficiency
ADA	20q13.11	Severe combined immunodeficiency, ADA deficiency
AHP	9q34	Acute hepatic porphyria
AIP	11q23.3	Acute intermittent porphyria
AKU	3q2	Alkaptonuria
ALD	Xq28	Adrenoleukodystrophy
APKD1	16p13	Adult polycystic kidney disease, locus 1
APKD2	4q21–23	Adult polycystic kidney disease, locus 2
APOB	2p24	Apolipoprotein B
APOE	19q.13.2	Apolipoprotein E
ARG1	6q23	Arginase deficiency, argininemia
ARSB	5q11–13	Mucopolysaccharidosis type VI, Maroteaux-Lamy syndrome
AS	15q11–13	Angelman syndrome
ATA	11q22.3	Ataxia telangiectasia
ATIII	1q23–25	Antithrombin III
ATRX	Xq13	α-Thalassemia mental retardation
AZF	Yq11	Azoospermia factor
BBS2	16q21	Bardet–Biedl syndrome
BLM	15q26.1	Bloom syndrome
BRCA1	17q21	Familial breast/ovarian cancer, locus 1
BRCA2	13q12.3	Familial breast/ovarian cancer, locus 2
BWS	11p15.4	Beckwith–Wiedemann syndrome
C3	19p13.2-13.3	Complement factor 3
C5	9q34.1	Complement factor 5
C6	5p13	Complement factor 6
C7	5p13	Complement factor 7
C9	5p13	Complement factor 9
CAH1	6p21.3	Congenital adrenal hyperplasia, 21-hydroxylase
CBS	21q22.3	Homocystinuria
CEP	10q25.2-26.3	Congenital erythropoietic porphyria
CFTR	7q31.2	Cystic fibrosis transmembrane conductance regulator
CKN2	10q11	Cockayne syndrome 2, late onset
CMH1	14q12	Hypertrophic obstructive cardiomyopathy type 1
CMH2	1q3	Hypertrophic obstructive cardiomyopathy type 2
CMH3	15q22	Hypertrophic obstructive cardiomyopathy type 3
CMT1A	17p11.2	Charcot–Marie–Tooth disease type 1A
CMT1B	1q22	Charcot–Marie–Tooth disease type 1B
CMT2	1p35–36	Charcot–Marie–Tooth disease type 2
COL1A1	17q21.31-22	Collagen type I, α₁ chain, osteogenesis imperfecta
COL1A2	7q22.1	Collagen type I, α₂ chain, osteogenesis imperfect
COL2A1	12q13.11-13.2	Collagen type II, Stickler syndrome
COL3A1	2q31	Collagen type III, α₁ chain, Ehlers-Danlos syndrome type IV
CYP11B1	8q21	Congenital adrenal hyperplasia, 11β-hydroxylase
DAZ	Yq11	Deleted in azoospermia
DFNB1/A3	13q12	Non-syndromic sensorineural deafness, first recessive, third dominant locus
DM	19q13.2-13.3	Myotonic dystrophy
DMD/BMD	Xp21.2	Dystrophin, Duchenne and Becker muscular dystrophy
DRPLA	12p13.1-12.3	Dentatorubropallidoluysian disease
EDSVI	1p36.2-36.3	Ehlers-Danlos syndrome type VI
EYA1	8q13.3	Brachio-otorenal syndrome
F5	1q23	Coagulation protein V
F7	13q34	Coagulation protein VII
F8	Xq28	Coagulation protein VIII, hemophilia A
F9	Xq27.1-27.2	Coagulation protein IX, Christmas disease, hemophilia B
F10	13q34	Coagulation protein X
F11	Xq27.1-27.2	Coagulation factor XI
F12	5q33-qter	Coagulation factor XII
FAP	5q21-22	Familial adenomatous polyposis, Gardner syndrome
FBN1	15q21.1	Fibrillin-1, Marfan syndrome
FBN2	5q23-31	Fibrillin-2, contractural arachnodactyly
FGFR1	8p11.1-11.2	Fibroblast growth factor receptor 1, Pfeiffer syndrome
FGFR2	10q26	Fibroblast growth factor receptor 2, Crouzon, Pfeiffer, Apert syndrome
FGFR3	4p16.3	Fibroblast growth factor receptor 3, achondroplasia, thanatophoric dysplasia
FH	19p13.1-13.2	Familial hypercholesterolemia
FRAXA (FMR1)	Xq27.3	Fragile X mental retardation
FRDA	9q13–21.1	Friedreich ataxia
FSHMD	4q35	Facioscapulohumeral muscular dystrophy
GAL	9p13	Galactosemia
GAP	9q31	Basal cell nevus syndrome, Gorlin syndrome
GLB1	3p21.33	GM1 gangliosidosis
G6PD	Xq28	Glucose-6-phosphate dehydrogenase
GUSB	7q21.11	Mucopolysaccharidosis type VII, Sly syndrome
HbB	11p15.5	β-Globin gene
HD	4p16.3	Huntington disease
HEXA	15q23–24	Hexosaminidase A, Tay-Sachs disease
HEXB	5q13	Hexosaminidase B, Sandhoff disease
HFE	6p21.3	Hemochromatosis
HGPRT	Xq26-27.2	Hypoxanthine guanine phosphoribosyl transferase, Lesch-Nyhan syndrome
HLA	6p21.3	Major histocompatibility locus
HPE3	7q36	Holoprosencephaly
IDUA	4p16.3	Mucopolysaccharidosis type I, Hurler syndrome
IGKC	2p12	Immunoglobulin κ light chain
IGLC1	22q11	Immunoglobulin λ light chains
INS	11p15.5	Insulin-dependent diabetes mellitus type 2
KRT5	12q11-13	Epidermolysis bullosa simplex, Koebner type
LGMD7	5q31	Limb-girdle muscular dystrophy
MCAD	1p31	Acyl coenzyme-A dehydrogenase, medium chain
MDS	17p13.3	Miller-Dieker lissencephaly syndrome
MEN1	11q13	Multiple endocrine neoplasia syndrome type 1
MHS	19q13.1	Malignant hyperpyrexia susceptibility, locus 1
MITF	3p14.1	Waardenburg syndrome type 2
MJD	14q24.3-31	Machado-Joseph disease, spinocerebellar ataxia type 3
MPS VI	5q11-13	Maroteaux-Lamy syndrome
MSH2	2p15-16	Hereditary non-polyposis colorectal cancer type 1
NCF2	1q25	Chronic granulomatous disease, neutrophil cytosolic factor-2 deficiency
NF1	17q11.2	Neurofibromatosis type I, von Recklinghausen disease
NF2	22q12.2	Neurofibromatosis type II, bilateral acoustic neuroma
NP	11p15.1-15.4	Niemann-Pick disease type A and B
NPC	18q11-12	Niemann-Pick disease type C
NPS	9q43	Nail-patella syndrome
OTC	Xp21.1	Ornithine transcarbamylase
p53	17p13.1	p53 protein, Li-Fraumeni syndrome
PKU	12q24.1	Phenylketonuria
PROC	2q13-14	Protein C, coagulopathy disorder
PROS	3p11.1-q11.2	Protein S, coagulopathy disorder
PRNP	20p12-pter	Prion disease protein
PWS	15q11	Prader-Willi syndrome
PXMP1	1p21–22	Zellweger syndrome type 2
RB	13q14.1-14.2	Retinoblastoma
RET	10q11.2	Familial medullary thyroid carcinoma, MEN 2A and 2B, familial Hirschsprung disease
RH	1p34–36.2	Rhesus null disease, Rhesus blood group
RP1	8p11-q21	Retinitis pigmentosa, locus 1
RP2	Xp11.3	Retinitis pigmentosa, locus 2
RP3	Xp21.1	Retinitis pigmentosa, locus 3
rRNA		Ribosomal RNA
SCA1	6p23	Spinocerebellar ataxia, locus 1
SCA2	12q24	Spinocerebellar ataxia, locus 2
SPH1	14q22-23.2	Spherocytosis type I
SMA	5q12.2-13.3	Spinal muscular atrophy
SOD1	21q22.1	Superoxide dismutase, familial motor neuron disease
SRY	Yp11.3	Sex-determining region Y, testis-determining factor
TBX5	12q21.3-22	Holt-Oram syndrome
TCOF1	5q32-33.1	Treacher-Collins syndrome
TRPS1	8q24.12	Trichorhinophalangeal syndrome
TSC1	9q34	Tuberous sclerosis, locus 1
TSC2	16p13.3	Tuberous sclerosis, locus 2
TYR	11q14-21	Oculocutaneous albinism
USH1A	14q32	Usher syndrome type IA
USH1B	11q13.5	Usher syndrome type IB
USH1C	11p15.1	Usher syndrome type IC
USH2	1q41	Usher syndrome type II
VWS	1q32	van der Woude syndrome
VHL	3p25–26	von Hippel-Lindau syndrome
VWF	12p13.3	von Willebrand disease
WD	13q14.3-21.1	Wilson disease
WRN	8p11.2-12	Werner syndrome
WS1	2q35	Waardenburg syndrome type 1
WT1	11p13	Wilms tumor 1 gene
ZWS1	7q11.23	Zellweger syndrome type 1

The Human Genome Project

Beginning and Organization of the Human Genome Project

The concept of a map of the human genome was proposed as long ago as 1969 by Victor McKusick (see Figure 1.5, p. 7), one of the founding fathers of medical genetics. Human gene mapping workshops were held regularly from 1973 to collate the mapping data. The idea of a dedicated human genome project came from a meeting organized by the US Department of Energy at Sante Fe, New Mexico, in 1986. The US Human Genome Project started in 1991 and is estimated to have cost around 2.7 billion US dollars. Other nations, notably France, the UK, and Japan, soon followed with their own major national human genome programs and were subsequently joined by a number of other countries. These individual national projects were all coordinated by the Human Genome Organization, which has three centers, one for the Americas based in Bethesda, Maryland, one for Europe located in London, and one for the Pacific in Tokyo.

Although the key objective of the Human Genome Project was to sequence all 3 × 10⁹ base pairs of the human genome, this was just one of the six main objectives/areas of work of the Human Genome Project.

Human Gene Maps and Mapping of Human Inherited Diseases

Designated genome mapping centers with ear-marked funding were involved in the coordination and production of genetic or recombination and physical maps of the human genome. The genetic maps initially involved the production of fairly low-level resolution index, skeleton or framework maps, which were based on polymorphic variable-number di-, tri-, and tetranucleotide tandem repeats (p. 17) spaced at approximately 10-cM intervals throughout the genome.

The mapping information from these genetic maps was integrated with high-resolution physical maps (Figure 5.4). Access to the detailed information from these high-resolution genetic and physical maps allowed individual research groups, often interested in a specific or particular inherited disease or group of diseases, rapidly and precisely to localize or map a disease gene to a specific region of a chromosome.

Figure 5.4 A summary map of human chromosome 3, estimated to be 210 Mb in size, which integrates physical mapping data covered by 24 YAC contigs and the Genethon genetic map with cumulative map distances.

(From Gemmill RM, Chumakov I, Scott P, et al 1995 A second-generation YAC contig map of human chromosome 3. Nature 377:299–319; with permission.)

The ability to introduce targeted mutations in specific genes, along with the production of transgenic animals (p. 102), for example in the mouse, allows the production of animal models to study the pathodevelopmental basis for inherited human disorders, as well as serve as a test system for the safety and efficacy of gene therapy and other treatment modalities (p. 350). Strategies using different model organisms in a complementary fashion, taking into account factors such as the ease or complexity of producing transgenic organisms and the generation times of different species, allow the possibility of relatively rapid analysis of gene expression, function and interactions in providing an understanding of the complex pathobiology of inherited diseases in humans.