Molecular Biology

The Genetic Basis of Lung Disease

Genetic factors play an important role in diseases that affect the airways (asthma, chronic obstructive pulmonary disease [COPD], cystic fibrosis, primary ciliary dyskinesia), parenchyma (pulmonary fibrosis, Birt-Hogg-Dubé syndrome, tuberous sclerosis), and vasculature (hereditary hemorrhagic telangiectasia) of the lung (Table 2-1). Such conditions include simple monogenic disorders such as Kartagener syndrome and α₁-antitrypsin deficiency, in which mutations of critical genes are sufficient to induce well-defined disease phenotypes. By contrast, many other disease processes affecting the lung are complex genetic traits in which inheritance subtly affects pathogenesis. This group of entities includes COPD, asthma, and idiopathic pulmonary fibrosis. Extending current understanding of the genetic basis of pulmonary conditions will be essential to provide new insights into their underlying pathophysiology, to make predictions about outcome, and to develop novel therapeutic strategies.

Table 2-1 Examples of Genetic Factors That Underlie Lung Disease

Identification of single-gene defects in families that show the same phenotype is now relatively straightforward, owing to completion of the human genome project and improvements in DNA sequencing. Consequently, the past 20 years have seen rapid progress in elucidation of the genetic basis of disease. This rate of progress can be appreciated by a consideration of the many years required to identify the gene associated with cystic fibrosis. Dorothy Hansine Andersen first defined the condition in 1938 when she described cystic fibrosis of the pancreas in association with lung and intestinal disease. Only later was it recognized to be a recessive condition. The sweat test that is used to diagnose the condition was developed after the detection of abnormal sweat electrolytes by Paul di Sant’ Agnese in 1952. The search for the cystic fibrosis gene started in the early 1980s, and the gene was localized to chromosome 7 in 1985 through recognition of linkage with the highly polymorphic gene paraoxonase in many populations. This achievement was followed by the identification of additional markers more closely linked to the cystic fibrosis locus, MET and D7S8, allowing prenatal diagnosis of the disorder and eventually leading directly to the mapping of the causative gene in 1989 by teams headed by Lap-Chi Tsui, Francis Collins, and Jack Riordan. This gene was called the cystic fibrosis transmembrane conductance regulator (CFTR), and now more than 1000 different mutations have been identified that cause cystic fibrosis.

By contrast, today, what had once taken many groups a decade to complete can be undertaken in a single laboratory in days. For example, modern exome sequencing enables all 180,000 exons encoded by the human genome to be characterized in an individual patient or an entire kindred. Although the exome equates to only 1% of the genome, or about 30 megabases, it is thought to contain 85% of the mutations responsible for mendelian disorders. This technology, for example, was recently used to identify the causative gene of Miller syndrome, a rare disorder that manifests with cleft palate, absent digits, and ocular anomalies. The entire exomes of four persons so affected were sequenced, allowing mutations to be identified in the causative gene encoding dihydroorotate dehydrogenase (DHODH).

The major challenges now are therefore no longer the single-gene disorders but complex genetic diseases such as cancer, COPD, asthma, and interstitial lung disease. These diseases are the result of interactions between multiple genes and environmental factors. Consequently, the diseases cluster within families but do not show a clear pattern of inheritance.

Single-Gene Disorders and Respiratory Disease

Many single-gene disorders have been linked with respiratory disease (see Table 2-1). They are perhaps best typified by the autosomal recessive condition α₁-antitrypsin deficiency. This condition shows a clear genotype-phenotype correlation with current understanding of the molecular basis providing new insights into the pathogenesis of disease. α₁-Antitrypsin is the archetypal member of the serine proteinase inhibitor (“serpin”) superfamily. It is synthesized in the liver and secreted into the plasma, where it is the most abundant circulating proteinase inhibitor. Most people of North European descent carry the normal M allele, but 1 in 25 carries the Z variant (Glu342Lys), which results in plasma α₁-antitrypsin levels in the homozygote that are 10% to 15% of the normal M allele. The Z mutation causes the accumulation of α₁-antitrypsin in the rough endoplasmic reticulum of the liver, predisposing the homozygote to the development of juvenile hepatitis, cirrhosis, and hepatocellular carcinoma. The greatly reduced circulating levels of α₁-antitrypsin are unable to protect the lungs against proteolytic damage by neutrophil elastase, predisposing the Z homozygote to the development of early-onset emphysema.

The structure of α₁-antitrypsin is based on a dominant β-pleated sheet A and nine α-helices (Figure 2-1). This scaffold supports an exposed mobile reactive loop that presents a peptide sequence as a pseudosubstrate for the target proteinase. After docking, the proteinase is inactivated by a mousetrap-type action that swings it from the top to the bottom of the serpin in association with the insertion of an extra strand into β-sheet A (see Figure 2-1). This six-stranded protein bound to its target enzyme is then recognized by hepatic receptors and cleared from the circulation. The structure of α₁-antitrypsin is central to its role as an effective antiproteinase but also renders it liable to undergo conformational change in association with disease. The Z mutation is at residue P₁₇ (17 residues proximal to the key P₁ amino acid that defines the inhibitory specificity of α₁-antitrypsin) at the head of a strand of β-sheet A and the base of the mobile reactive loop (see Figure 2-1). The mutation opens β-sheet A, thereby favoring the insertion of the reactive loop of a second α₁-antitrypsin molecule to form a dimer (see Figure 2-1). This dimer can then extend to form polymers that tangle in the endoplasmic reticulum of the liver to form the inclusion bodies resulting in liver disease. Support for this pathomechanism comes from the demonstration that Z α₁-antitrypsin formed chains of polymers when incubated under physiologic conditions. The rate was accelerated by raising the temperature to 41° C and could be blocked by peptides that compete with the loop for annealing to β-sheet A. The role of polymerization in vivo was clarified by the finding of α₁-antitrypsin polymers in inclusion bodies from the livers of Z α₁-antitrypsin homozygotes (see Figure 2-1).

Figure 2-1 The molecular basis of α₁-antitrypsin deficiency. α₁-Antitrypsin may be considered to act by a mousetrap mechanism. A, After docking (left), the target proteinase (gray) is inactivated by movement from the upper to the lower pole of the protein (right). This is associated with insertion of the reactive loop (red) as an extra strand into β-sheet A (green). The mousetrap mechanism may be triggered spontaneously by point mutations in association with disease. The Z mutation (Glu342Lys) of α₁-antitrypsin is at the head of a strand of β-sheet A (green) and the base of the reactive loop. B, Mutations in this region can destabilize β-sheet A to allow the insertion of a reactive loop of a second molecule (middle). This dimer then extends to form long chains of polymers (right). Each molecule of α₁-antitrypsin in the polymer is shown in a different color. It is these polymers that tangle in the endoplasmic reticulum to cause inclusions resulting in liver disease. C, An inclusion body (arrow) from the liver of a patient with α₁-antitrypsin deficiency (left). The inclusions are composed of chains of molecules of α₁-antitrypsin (right).

(Modified from Gooptu B, Lomas DA: Conformational pathology of the serpins—themes, variations and therapeutic strategies, Annu Rev Biochem 78:147–176, 2009.)

Although many α₁-antitrypsin deficiency variants have been described, only three other mutants of α₁-antitrypsin have similarly been associated with plasma deficiency and hepatic inclusions: α₁-antitrypsin Siiyama (Ser53Phe), α₁-antitrypsin Mmalton (Phe52 deleted), and α₁-antitrypsin King’s (His334Asp). All of these mutants lie in the shutter domain that controls opening of β-sheet A. They destabilize the molecule to allow the formation of loop-sheet polymers in vivo. Further investigations have shown that polymerization also underlies the mild plasma deficiency of the S (Glu264Val) and I (Arg39Cys) variants of α₁-antitrypsin. The point mutations that are responsible for these variants have less effect on β-sheet A than does the Z variant. Thus, the associated rate of polymer formation is much slower than that for Z α₁-antitrypsin, which results in less retention of protein within hepatocytes, milder plasma deficiency, and the lack of a clinical phenotype. However, if a mild, slowly polymerizing I or S variant of α₁-antitrypsin is inherited with a rapidly polymerizing Z variant, then the two can interact to form heteropolymers within hepatocytes. These polymers underlie the inclusions that cause cirrhosis.

Emphysema associated with α₁-antitrypsin deficiency results from lack of protection against proteolytic attack in the lungs associated with reduced levels of circulating proteinase inhibitor. This is particularly the case with individuals who smoke tobacco. The Z α₁-antitrypsin that does escape from the liver into the circulation is less efficient in protecting the tissues from enzyme damage and, like M α₁-antitrypsin, may be inactivated by oxidation of the P1 methionine residue. The demonstration that Z α₁-antitrypsin can undergo a spontaneous conformational transition in association with liver disease raised the possibility that this might also occur within the lung. Indeed, polymers have been detected in bronchoalveolar lavage fluid in patients with Z α₁-antitrypsin deficiency. This observation may have important implications for the pathogenesis of disease, because polymerization obscures the reactive loop of α₁-antitrypsin, rendering the protein inactive as an inhibitor of proteolytic enzymes. Thus, the spontaneous polymerization of α₁-antitrypsin within the lung will exacerbate the already reduced antiproteinase screen, thereby increasing the susceptibility of the tissues to proteolytic attack and increasing the rate of progression of emphysema. Finally, the α₁-antitrypsin polymers themselves are inflammatory for neutrophils, which will also increase the proteolytic load in the lung. Recent data suggest that cigarette smoke can induce the intrapulmonary polymerization of Z α₁-antitrypsin, thereby exacerbating the lung damage associated with smoking.

Gene Hunting for Complex Genetic Diseases and Its Pitfalls: COPD and Asthma

One approach to looking for genes associated with complex genetic disorders is by means of an association study, which analyzes genetic variation between cases and controls (i.e., without disease) matched for various factors. The genetic variation commonly used in such studies is the single-nucleotide polymorphism (SNP) (DNA sequence variation) found approximately every 300 base pairs across the genome. These studies have been undertaken in patients with COPD matched with control subjects who do not have COPD but who are the same age and have the same smoking history and the same ethnic background. The early studies typically were small (100 to 150 cases plus controls) and often were confounded by failure to match carefully cases and controls. To increase the likelihood of finding a disease-associated gene, such studies frequently included SNPs in multiple genes in the same cohort. However, such multiple comparisons can result in false-positive results. With study of sufficient numbers of genes, purely by chance a variant will arise that erroneously appears to be associated with the disease being studied. Careful statistical analysis is necessary to avoid this problem.

The analysis was made more complex in COPD by the inherent complexity of the disease phenotype—a heterogeneous mix of airway disease and emphysema. Indeed, larger family-based studies have shown the independent clustering of the airway disease and emphysema components of COPD within families. This finding suggests that different genetic factors predispose to each of these components of the phenotype. The only way to overcome the inherent variation in COPD is to focus on groups of patients with well-characterized disease components or to undertake studies with large sample sizes and then to replicate any positive findings in other cohorts. This is now the case with candidate gene studies, and good evidence has emerged to show that heterozygosity for α₁-antitrypsin deficiency (phenotype PiMZ) and polymorphisms in genes involved in oxidative stress—those encoding microsomal epoxide hydrolase (EPHX1), glutathione S-transferase (GST-P1 and GST-M1), heme oxygenase (HMOX1), and superoxide dismutase 3 (SOD3)—are associated with an increased risk of COPD (Figure 2-2). More recently, a minor allele of an SNP in the matrix metalloprotease-12 gene (MMP12) has been shown to protect against COPD in adult smokers.

Figure 2-2 Genes implicated in chronic obstructive pulmonary disease (COPD). When smoke enters the alveolus, many of its constituent compounds are absorbed. Some of these are detoxified by an array of enzymes; those that escape detoxification cause local damage and inflammation. The influx and activation of inflammatory cells lead to the liberation of proteases that attack the extracellular matrix, primarily elastin. The effect of these proteases is attenuated by endogenous antiprotease activities, whereas growth factor signals are thought to modulate the repair and remodeling of the extracellular matrix.

The limitation of association studies using a candidate gene approach is that they are by definition restricted to pathways already recognized to be associated with the disease—in the case of COPD, the proteinase-antiproteinase balance, the oxidative stress pathway, and the integrity of the extracellular matrix (see later). Consequently, this approach lacks the capacity to identify unanticipated players and is thus restricted to hypothesis testing, rather than hypothesis generation.

In recent years, the collection of large cohorts of patients combined with technologic advances has allowed unbiased genome-wide association studies of many patients with lung disease. It is currently possible to use microarrays to assay up to a million different SNPs in the genome in the same patient. The variation in SNPs is then compared between cases and controls. The largest study was undertaken in a cohort from Bergen, Norway, and then replicated in the International COPD Genetics Network, the National Emphysema Treatment Trial with controls from the Normative Ageing study, and then finally in the Boston Early Onset COPD cohort. Top hits from this analysis were SNPs in the α-nicotinic acetylcholine receptor CHRNA-3/5 and the hedgehog-interacting protein HHIP. The first of these (rs8034919) in the α-nicotinic acetylcholine receptor also was identified in three genome-wide association studies of lung cancer and also is thought to be important in peripheral vascular disease and nicotine addiction. It is possible that this SNP functions as a marker for an addiction gene. People who carry this SNP may require more cigarettes to satisfy nicotine addiction, may inhale more deeply, and may find it more difficult to withdraw from cigarette smoking. If this hypothesis is correct, the disease-associated allele of this gene would account for 12% of the population risk for COPD.

In interpreting any association study, it is important to consider two important caveats. First, many genetic associations studies report false-positive findings owing to a failure to appreciate the prior probability of an association and the power of the study to detect a meaningful effect. When the prior probability of an association is low, that is to say when there is little functional or epidemiological data to support an association, the numbers of subjects required to guard against a false-positive result increases. Consequently, the identification of a genetic association in a single study must always be treated with caution. Clearly, in the case of the α-nicotinic acetylcholine receptor, it is possible to construct very plausible models for its potential role in COPD, so the prior probability is not low. Moreover, those studies in which it was identified were well powered. The second caveat, however, relates to the phenomenon of linkage disequilibrium. The combination of more than one genetic variant or allele is called a haplotype. Some haplotypes occur in the population more often than one would expect by random association of alleles. This can be caused by, but is not restricted to, the inheritance of blocks of adjacent genes on a chromosome. Clearly, nearby genes are less likely to be separated by recombination during gametogenesis than are more distantly spaced genes. Each population of humans has its own characteristic set of common haplotypes. In this light, a disease-associated SNP can more accurately be viewed as a marker of the haplotype that is associated with the disease under study and the causative gene must be identified within that group. Indeed, in the case of disease-associated SNPs in the α-nicotinic acetylcholine receptor, there does appear to be linkage disequilibrium with SNPs in the iron-responsive element binding protein 2 IREB2. This was identified from expression analysis in lung tissue from persons with COPD and then confirmed in three separate COPD cohorts. IREB2 is localized to the human epithelial cell surface and may play a role in protecting against epithelial damage from oxidative stress.

It is, of course, possible that a haplotype identified in large genome-wide association studies may contain multiple disease-associated genes, so each one needs individual validation. Indeed, many diseases appear to involve the interaction of multiple disease-associated alleles, each with relatively small contributions when studied individually. The chances of identifying an allele that imparts a small relative risk for developing a disease are improved both by increasing the numbers of cases studied (increased power) and by carefully selecting cases of the same phenotype. With diseases such as COPD that are likely to represent the final common pathway of many forms of lung damage, this consideration is particularly important.

The analysis of still larger numbers of patients with COPD has identified a disease-associated SNP in FAM13A. The role of this gene in disease is unclear, but expression has been associated with hypoxia. FAM13A also has been associated with lung function in a second independent study. A detailed analysis of these genes in well-characterized cohorts showed that SNPs in the α-nicotinic acetylcholine receptor are associated with smoking intensity, airflow obstruction, and emphysema, and SNPs in the hedgehog-interacting protein are associated with systemic features of COPD (low body mass index) and exacerbations, whereas SNPs in FAM13A are associated with airflow obstruction.

Familial clustering of asthma has also been recognized for many years, and comparisons between monozygotic and dizygotic twins suggest that 70% of asthma-related population variance is accounted for by genetic factors. Classical positional cloning using linkage analysis of large families has identified several candidates, including ADAM33, CHI3L1, DPP10, and HLA-G, and more recently, asthma-specific genome-wide association studies have identified further disease-associated loci (see Table 2-1). The first of these was in the long arm of chromosome 17 and found to contain two genes, ORMDL3 and GSDMB, whose expression levels are altered in asthmatic persons. The pathways implicated by such studies can now be tested to determine what role they play in disease pathogenesis.

Cell Biology

Intracellular Signals

Oxidative Stress

Cigarette smoke contains 10¹⁷ free radicals per puff, including superoxide ions, hydrogen peroxide, hydroxyl radicals, nitric oxides, peroxynitrite, and semiquinone. Migrating neutrophils also can release superoxide radicals in response to inflammatory stimuli, including pathogens and smoke. Alveolar macrophages from the lungs of smokers are more activated compared with controls and release more reactive oxygen species (ROS) in vitro. These toxic products can all modify proteins, lipids, and DNA during oxidative stress. Oxidized proteins can be found in lung tissue and their level increases with worsening lung disease. This damage leads directly to cell death and emphysema. When ROS react with phospholipids in the cell membrane (lipid peroxidation), they generate products such as F₂ isoprostanes and malondialdehyde that can trigger intracellular signaling pathways. For example, isoprostanes cause muscle constriction and induce cell growth by way of prostaglandin receptors. Other diffusible peroxides act as chemoattractants, thereby contributing to inflammation.

Normally, homeostatic mechanisms maintain the reducing environment of the cytoplasm. Glutathione (GSH) is an abundant sulfhydryl chemical that exists in the cytosol predominantly in its reduced form (GSH), with only 1% in the oxidized disulfide-bonded form (GSSG). The cell maintains the ratio of GSH to GSSG strongly in favor of the reduced form by reducing GSSG to GSH, or by excreting GSSG. However, during the adaptive response to oxidative stress, de novo synthesis of GSH also is important. Alterations in GSH metabolism have been shown to affect the sensitivity of cells to oxidative damage. For example, ROS can induce signaling by a number of stress pathways including c-Jun N-terminal kinase (JNK), extracellular signaling kinase, and p38 kinase. These are linked to signaling cascades that ultimately regulate gene transcription.

Oxidative stress is an important activator of nuclear factor κB (NFκB). This proinflammatory transcription factor is held in the cytosol in unstressed cells through binding to its inhibitor, IκB. When cell surface receptors are activated, they can trigger an IκB kinase (IKK) that phosphorylates IκB, targeting it for degradation. NFκB is thus released to migrate to the nucleus, where it transactivates genes involved in many pathways, including the inflammatory response. The precise mechanism whereby NFκB is activated by oxidative stress is not fully understood but may involve the direct activation of IKK by ROS.

ROS also modulate gene transcription by modifying chromatin, so-called epigenetic regulation. Chromatin structure determines the access of transcription factors to target sequences within the promoters of genes and is subject to regulation. Posttranslational modification of histones can alter DNA coiling around them. For example, the relative activities of histone acetyltransferases (HATs) and histone deacetylases (HDACs) profoundly alter histone function and consequently gene transcription. Cigarette smoke and oxidative stress can enhance histone acetylation by impairing HDAC activity resulting in altered gene expression.

A number of genes have been studied that might plausibly modify the cells’ responses to cigarette smoke and ROS. These include genes involved in detoxification of toxins and genes involved in neutralization of ROS. Many toxins in cigarette smoke are subject to first-pass metabolism in the liver, and one of the enzymes involved in this is microsomal epoxide hydrolase (encoded by EPHX1) localized to 1q42.1, which has been studied intensely in the context of COPD (see Figure 2-2). Several EPHX1 SNPs have been described that affect its activity. One of these leads to a 40% loss of in vitro activity (Tyr113His, the “slow” allele), whereas another increases activity by 25% (His139Arg, the “fast” allele). A recent systematic metaanalysis found homozygosity for the “slow” (Tyr113His) allele to be protective against COPD (odds ratio, 0.5). Analysis of the National Emphysema Treatment Trial (NETT) dataset has suggested a role for EPHX1 polymorphism in both severity of COPD and the distribution of emphysematous changes. In addition to EPHX1, glutathione S-transferase (GST) comprises a large family of enzymes capable of catalyzing the conjugation of GSH to noxious compounds. The GSTs are highly polymorphic, and SNPs in GSTP1 have been associated with COPD, the distribution of emphysema, and more rapid decline in lung function. The null mutation of GSTM1 (localized to 1p13.1) also has been associated with COPD.

Several other proteins can loosely be considered as having antioxidant activity and thus protective against ROS. Heme oxygenase catalyzes the first step in heme degradation. Heme oxygenase 1 (encoded by HMOX1, localized to 22q13.1) is the inducible isoform that can be upregulated by a wide range of stresses. Bile pigments generated by heme cleavage are believed to have antioxidant properties; thus, HMOX-1 induction is protective during cellular oxidant injury, and overexpression of HMOX-1 in lung tissue protects against hyperoxia. The HMOX1 gene 5′-flanking region contains stretches of GC repeats that are highly polymorphic in length. A higher proportion of long repeats that are associated with impaired promoter activity has been observed in patients with COPD and increased severity of disease. By contrast, superoxide dismutase (SOD) directly catalyzes the conversion of superoxide to oxygen and hydrogen peroxide. The extracellular isoform (encoded by SOD3, localized to 4p15) is abundant in lung parenchyma, and in the cross-sectional Copenhagen Heart Study, the R213G allele that results in higher plasma levels was associated with significantly less severe COPD in smokers.

Oxidative stress also is important in the pathogenesis of emphysema associated with α₁-antitrypsin deficiency. Free radicals released from neutrophils or cigarettes can oxidize the key P1 methionine at residue 358, which is central to the inhibitory activity of α₁-antitrypsin. This change results in a 2000-fold reduction in the association rate constant with neutrophil elastase. A reduction in the intrapulmonary concentration of α₁-antitrypsin in persons with α₁-antitrypsin deficiency means that fewer free radicals are required to have a significant impact on the inactivation of α₁-antitrypsin. In addition to Met358, methionines at positions 226, 242, and 351 and the cysteine residue at 232 in α₁-antitrypsin are similarly available for oxidation. These molecules may be considered to function as a sump to “mop up” free radicals, thereby reducing their toxicity. However, persons with α₁-antitrypsin deficiency have fewer molecules to bind free radicals. Moreover, polymer formation masks two of the four methionines, thereby further reducing the capacity of α₁-antitrypsin to detoxify these toxic species. Thus, α₁-antitrypsin from persons with Z α₁-antitrypsin deficiency is more prone to oxidative damage and less able to protect the tissues from oxidative stress as a result of both local deficiency and polymer formation.

Endoplasmic Reticulum Stress

The early steps in the biogenesis of secreted and membrane proteins occur in the lumen of the endoplasmic reticulum, where resident proteins that make up the endoplasmic reticulum machinery assist in their folding, maturation, and complex assembly (Figure 2-3). Variation in the load of endoplasmic reticulum client proteins and in the function of its protein-folding machinery can lead to an imbalance between the two that is referred to as endoplasmic reticulum stress. This imbalance triggers a cellular response, mediated by signaling pathways that restore balance between the protein-folding environment in the organelle by increasing the expression of genes that enhance most aspects of endoplasmic reticulum function and by transiently repressing the biosynthesis of new client proteins. This response has been termed the unfolded protein response (UPR) and is mediated by three signaling molecules, PERK, IRE1, and ATF6, located in the endoplasmic reticulum membrane (see Figure 2-3). It is now clear that the UPR plays a role in many human diseases, including many that affect the lung.

Figure 2-3 Unfolded protein response and endoplasmic reticulum (ER) overload response. Unfolded protein response (UPR): In the resting endoplasmic reticulum, the chaperone BiP holds the stress-signaling molecules PERK, IRE1, and ATF6 inactive. When proteins misfold in the endoplasmic reticulum, they sequester BiP (purple). This enables PERK, IRE1, and ATF6 to become active and signal to the cytosol. PERK phosphorylates a cytosolic translation initiation factor eIF2α, thereby halting most protein synthesis. In parallel, the translation of a subset of proteins increases. These include the transcription factor ATF4, which transactivates genes of the integrated stress response (ISR). When IRE1 is activated, it splices out an intron from the messenger RNA (mRNA) encoding the transcription factor XBP1. This causes a frameshift in the mRNA, accompanied by the translation of an activated form of XBP1, which can transactivate UPR genes. When ATF6 is released from BiP, it migrates initially to the Golgi apparatus, where it is cleaved by proteases to release a soluble fragment, ATF6c. This migrates to the nucleus, where it transactivates UPR genes. Endoplasmic reticulum overload response (EOR): When excess folded proteins accumulate in the endoplasmic reticulum, they cause calcium-dependent activation of nuclear factor κB (NFκB) through a poorly understood mechanism.

Cigarette smoke can directly induce endoplasmic reticulum stress in cells. When cultured airway epithelial cells are treated with cigarette smoke extracts, they activate the UPR. Similar responses have been observed in vivo in the lungs of cigarette smoke–exposed mice and even in the lungs of human smokers. Overexpression of the endoplasmic reticulum chaperones BiP in cultured bronchial epithelial cells protects them from smoke-induced apoptosis, supporting a role for endoplasmic reticulum stress in cigarette cytotoxicity. Precisely how smoke induces endoplasmic reticulum stress remains to be determined, but the protective effects of coadministered N-acetylcysteine or GSH suggest that oxidation of an unknown target is likely to be important.

The existence of IRE1β, a lung- and gut-specific IRE1 isoform, suggests that endoplasmic reticulum stress has important consequences for mucosal tissues. IRE1 signals through splicing the messenger RNA (mRNA) for the transcription factor XBP-1, which transactivates many UPR genes but it is not clear why airways and bowel require a tissue-specific IRE1 isoform. A clue may come from the XBP1-mutant mouse, which exhibits impaired mucosal defense against Listeria monocytogenes and has poorly bactericidal gut secretions. These observations suggest that the IRE1-XBP1 pathway may play an important role in host-pathogen interactions at epithelial surfaces. Endoplasmic reticulum stress can itself affect the acquired immune response. It is clear that the IRE1-XBP-1 pathway is crucial for differentiation programs that require expansion of the endoplasmic reticulum—for example, during the differentiation of B lymphocytes into plasma cells. This requirement can be explained by the regulation of many lipid synthetic genes by XBP-1. In addition, XBP-1–dependent processes appear to be responsible for heightened inflammatory signaling in inflamed airway epithelium. When forced to express active XBP-1, bronchial epithelial cells show elevated bradykinin-induced IL-8 release.

One of the disease-causing mutations of the surfactant protein C gene SFTPC is a deletion of exon 4. This change generates a protein that fails to exit the endoplasmic reticulum and induces endoplasmic reticulum stress. When expressed transiently in cultured cells, this mutant of SFTPC accumulates as large ubiquitinated inclusions and inhibits normal proteasome function, ultimately killing the cell. When cell lines that stably express this mutant are infected with respiratory syncytial virus, the cells accumulate high levels of mutant protein, activate the UPR, and show increased toxicity compared with wild type SFTPC–expressing cells. Of interest, evidence of UPR activation has been seen in a majority of cases of interstitial lung disease, both with and without SFTPC mutations. This finding may suggest an even greater role for endoplasmic reticulum stress in idiopathic pulmonary fibrosis. Very recently, it was shown that endoplasmic reticulum stress caused by a variety of insults, including mutant SFPC, can induce epithelial to mesenchyme transition (EMT). This is the process by which epithelia can transdifferentiate into cells of a more fibroblast-like phenotype. This mechanism has been suggested to contribute to pulmonary fibrosis and would explain why treatment strategies for idiopathic pulmonary fibrosis involving antiinflammatory drugs have been less than entirely successful. It may instead prove more beneficial to prevent EMT by ameliorating endoplasmic reticulum stress.

Cystic fibrosis is caused by CFTR mutations that impede protein folding. High levels of ΔF508 CFTR expression, but not of wild type protein, induce a UPR in cultured cells. However, rather than CFTR expression affecting endoplasmic reticulum stress, the clinically relevant relationship may be the converse. Recent data have suggested endoplasmic reticulum stress affects CFTR expression. In cells treated with agents that induce the UPR, levels of mature CFTR protein are markedly diminished. This involves a selective reduction of genomic CFTR expression; an effect that is not seen with recombinantly expressed CFTR or with endogenous control genes. Repression of the CFTR promoter is achieved both by selective recruitment of ATF6 and by epigenetic changes, including altered DNA methylation and histone deacetylation. This is especially unfortunate, because mucopurulent secretions from patients with cystic fibrosis are sufficient to induce endoplasmic reticulum stress in human bronchial epithelial cells, suggesting that chronic airway sepsis may actually contribute to further impairment of CFTR expression in those cases in which milder mutations allow some of the protein to reach the cell surface. This effect of endoplasmic reticulum stress on CFTR also may explain previous studies that have identified impaired CFTR expression and function in the upper airway of smokers who do not have the disease. This finding had been attributed to oxidant effects alone but now might be explained equally well as a response to smoke-induced endoplasmic reticulum stress. Whether this effect contributes to the pathologic lung changes associated with smoking is unclear, but if it does so, it would provide a novel therapeutic target.

Hypoxia is a far more common cause of protein misfolding than these single-gene disorders. The endoplasmic reticulum requires large amounts of energy to function, so it is one of the first organelles to malfunction when energy supplies are disrupted. This effect can follow nutrient deprivation or hypoxia and appears to play a role in tissue survival during ischemia. Cancers provide a good example of this mechanism. Tumors frequently outgrow their blood supply, so their cores become hypoxic. It has been found that tumors from animals with defective endoplasmic reticulum stress signaling fail to grow well, and most lung cancers show evidence of UPR activation. Consequently, modulation of endoplasmic reticulum stress may offer a target for treating thoracic malignancies. Attempts to identify antimesothelioma therapies found that proteasome inhibition with bortezomib could cause cell cycle arrest and death of cultured mesothelioma lines. The mechanism is not certain, but because bortezomib induces the UPR in several cancer models, endoplasmic reticulum stress may plausibly be involved. In some cancers, bortezomib appears to target hypoxic cells preferentially, perhaps because of their basal endoplasmic reticulum stress.

Endoplasmic Reticulum Overload

Remarkably, when Z α₁-antitrypsin polymerizes within the endoplasmic reticulum of its cell of synthesis, it fails to induce a strong UPR. Instead, the predominant signaling response appears to be activation of NFκB through a poorly understood mechanism that has variously been termed the endoplasmic reticulum overload response or the ordered polymer response (Figure 2-3). In the liver, this ultimately can lead to cirrhosis. Z α₁-antitrypsin appears also to be synthesized locally in the lung by some cell types, including bronchial and alveolar epithelial cells and macrophages. These too are likely to activate NFκB signaling cascades that would increase the production of inflammatory mediators and further amplify neutrophil recruitment and tissue damage. Chronic activation of NFκB would accelerate apoptosis within alveolar cells and thus contribute to the pathogenesis of emphysema. Because this effect would occur in all alveolar cells, it provides another explanation for the panlobular distribution of emphysema that characterizes α₁-antitrypsin deficiency. Although Z α₁-antitrypsin serves as an excellent model disease to study endoplasmic reticulum overload, it is likely that more common diseases such as some viral infections involve this form of signaling.

Maintenance of the Extracellular Matrix

As in all tissues, cells of the lung communicate with one another through direct contact and by way of released diffusible and matrix molecules. This communication network is important during the inflammatory response but also is required for the maintenance of normal lung architecture. The extracellular matrix comprises a complex network of scaffolding proteins, principally elastin and collagen. The elastin filaments form from tropoelastin monomers that self-assemble into aggregates and then fuse with microfilaments. Multiple covalent cross-links between the lysines in neighboring filaments provide stability. Cutis laxa is a family of autosomal dominant, X-linked, and recessive human diseases characterized by excessively slack connective tissues. Several families with the milder autosomal dominant form show early-onset pulmonary pathology including emphysema, particularly if inherited with the Z allele of α₁-antitrypsin. Mutations have been identified within the ELN (elastin) gene that cause mild cutis laxa and early-onset COPD (see Table 2-1). The ELN gene maps to 7q11.23 in humans, but because chromosome 7 has not been identified in linkage analysis as a site associated with COPD, it is likely that ELN mutations are a rare cause of this disease.

Elastin fibers bind other proteins, including fibulins, which in turn bind multiple extracellular matrix components and the basement membrane (Figure 2-4). The fibulins are a family of six proteins, at least two of which (those encoded by FBLN4, mapped to 11q13, and FBLN5, on 14q32.1) are mutated in severe autosomal recessive forms of cutis laxa and whose phenotype often includes early-onset emphysema. Both pathogenic mutations are located within an epidermal growth factor–like domain of each protein, suggesting these are critical for fibulins to maintain the integrity of the extracellular matrix within the lung. Of interest, analogous mutations in fibrillin, which bares homology to the fibulins, cause Marfan syndrome. Moreover, mutations of fibrillin (encoded by FBN1, localized to 15q21.1) have been described in neonatal Marfan syndrome with very-early-onset emphysema.

Figure 2-4 Extracellular matrix structure. The extracellular matrix of the lung is composed primarily of collagen and elastin fibers. Large fibrils of collagen assemble within the endoplasmic reticulum and traverse the Golgi apparatus, ultimately to be secreted. Elastin filaments form in association with microfilaments and integrin-anchored fibulin 5. AD, autosomal dominant; AR, autosomal recessive.

Menkes disease, characterized by abnormal hair and specific dysmorphic features, is caused by mutations in an intracellular copper transporter (encoded by ATP7A, localized to Xq13.3). The clinical features are due to defective connective tissue synthesis believed to be the result of dysfunction of lysyl oxidase. This copper-dependent enzyme is required for proper cross-linking of both collagen and elastin fibers. A recent case report described a child with Menkes disease and severe bilateral panlobular emphysema who died at only 14 months of age. Gene sequencing revealed a splice-site mutation in ATP7A, suggesting that proper extracellular matrix cross-linking is vital for stability of the lung parenchyma.

In contrast with animal models of COPD, mutations in collagen have not been identified in humans. This difference does not appear to be due to an incompatibility of mutated collagen with survival, because numerous collagen mutations have been described that cause other human diseases. Instead, it may reflect a more important role for elastin integrity in emphysema in humans than in mice.

Noxious stimuli such as cigarette smoke that cause lung inflammation help establish chemotactic gradients of interleukin-8 (IL-8) and leukotriene B₄ (LTB₄) that encourage macrophages and neutrophils to migrate from capillaries into the small airways and alveoli. Neutrophils initially are concentrated in the centrilobular regions of the lung parenchyma where they release serine and cathepsin proteinases. These enzymes damage and degrade elastin and other structural proteins, thereby causing disease. The degraded elastin fragments themselves act as chemoattractants that recruit additional inflammatory cells. A direct correlation has been noted between the numbers of neutrophils within the interstitium and the severity of emphysema.

Repair

The lung comprises more than 40 specialized cell types, each with its own individual functions and distribution. Of importance, only a subset of these possesses replication potential. Those that can divide must serve as a stem cell population for the other, terminally differentiated cell types. Regeneration of the lung is a source of much debate. It appears that in humans, the primary regenerative capacity of airway epithelia comes from resident precursor cells. Limited colonization of the airway by exogenous precursors has been described in humans. For example, male epithelial cells have been identified in the lungs of women who have just given birth to a male infant, and chimerism of the bronchial epithelium has been detected in bone marrow recipients. Conversely, limited numbers of recipient-derived cells have also been detected in engrafted transplant lungs. In the absence of lung damage, however, such engraftment of non–lung-derived cells appears to be a rare event.

Alveoli are composed of capillaries and lymphatics encased in a thin epithelial layer. More than 90% of the alveolar surface is composed of type I pneumocytes (Figure 2-5). These are terminally differentiated, large flat squamous epithelial cells that possess a relatively simple ultrastructure. Their function is to allow gaseous exchange between the alveolar gas and the bloodstream; consequently, they require little more than a nucleus and cell membrane with a few mitochondria and a limited secretory pathway. They are unable to replicate and are susceptible to noxious insults from inhaled toxins such as cigarette smoke and therefore must be replenished by the major stem cell found within the alveolus, which is the type II pneumocyte. These are small cuboidal cells located predominantly at the alveolar septal junctions. Although contributing little surface area to the lung, they are abundant, making up half of the alveolar cells by number. Tight gap junctions separate their polarized apical and basolateral domains, enabling selective secretion toward their apical surface, readily identified by its many microvilli.

Figure 2-5 Pneumocytes. Alveoli are composed primarily of type I and type II pneumocytes. Type I pneumocytes are thin, terminally differentiated nonsecretory cells that make up 90% of the alveolar surface area. Type II pneumocytes contain lamellar bodies composed of surfactant proteins and lipids. After injury, these cells can divide and give rise to type I pneumocytes.

Type II pneumocytes have two primary functions: to secrete surfactant and to act as the sole stem cell of the alveolus. Surfactant is a complex mixture of phospholipids (mainly dipalmitoylphosphatidylcholine and phosphatidylglycerol) and surfactant proteins A to D (encoded by SFTPA, SFTPB, SFTPC, and SFTPD, respectively). Intracellular surfactant inclusions, or multilamellar bodies, give these cells their granular appearance. After exocytosis at the apical surface, these spheroid lamellar bodies form a membrane lattice called tubular myelin that plays an important role in reducing the surface tension of the alveolar lining fluid and in host defense. Surfactant proteins A and D are members of the hydrophilic collectin family that have a carboxyl-terminal (C-terminal) lectin domain able to bind and opsonize many inhaled pathogens. In addition, they also have direct toxicity against gram-negative organisms. By contrast, surfactant proteins B and C both are small hydrophobic proteins that interact with surfactant lipids.

Because type II cells first appear in significant numbers after 24 weeks of gestation, prematurely newborn are impaired in surfactant production, which can lead to the development of respiratory distress syndrome, the most common cause of respiratory death in infants in the Western world. Those children who survive are at increased risk for respiratory disability as a result of bronchopulmonary dysplasia. Fortunately, the deficit can now be corrected by instilling surfactant into the trachea. But despite a marked improvement in survival as a result of this therapy, a significant minority (5% to 25%) nevertheless develop long-term complications. Family studies have shown clustering of disease, which suggests a genetic susceptibility to respiratory distress in infancy. The most likely causes of this susceptibility are mutations in the surfactant protein genes and some extrapulmonary gene products (such as granulocyte-macrophage colony-stimulating factor and its receptor). Mutations in the hydrophobic SFTPB and SFTPC are clearly linked to neonatal respiratory disease. The substitution of GAA for C in codon 121 in SFTPB gene underlies 60% of hereditary deficiency of surfactant protein B. This substitution results in a truncated protein, a lack of protein on immunohistochemical staining of lung tissue, and a loss-of-function phenotype, in which the homozygote experiences onset of respiratory distress within the first 12 to 24 hours of life and death by 1 to 6 months of age. A total of 27 mutations have now been described in the SFTPB gene, some of which have a milder effect on the production of surfactant protein B, with a later onset of respiratory failure. One family also has been described with a splice site mutation in intron 4 of the SFTPC gene. Infants with this mutation do not suffer neonatal respiratory distress but instead develop interstitial lung disease in the first year of life with an autosomal dominant pattern of inheritance. Furthermore, mutations within the genes encoding granulocyte-macrophage colony-stimulating factor or its receptor cause pulmonary alveolar proteinosis in children, a condition characterized by the overproduction of surfactant.

Idiopathic pulmonary fibrosis is a progressive interstitial lung disease characterized by inflammation and fibrosis of the alveolar walls. Although most cases of pulmonary fibrosis are sporadic, some rare familial cases have been described. In a number of early-onset familial cases of idiopathic pulmonary fibrosis, causative mutations have been identified in the SFTPC gene. This encodes a large transmembrane pro-protein that is proteolytically processed to the mature small surfactant C protein. As discussed earlier, one of its disease-causing mutants, characterized by a deletion of exon 4, misfolds to induce endoplasmic reticulum stress.

When type I cells are injured, type II cells can proliferate and differentiate into the larger type I cells. In bronchi, intermediate and basal cells form the pool of progenitor cells, whereas in terminal bronchioles, the Clara cell plays this role. These latter cells are highly metabolically active and serve to detoxify many inhaled chemicals by way of their cytochrome P-450 monooxygenase system in their smooth endoplasmic reticulum. They are able to multiply and differentiate into both ciliated and nonciliated bronchial epithelial cells. This response involves first dedifferentiation and then loss of their secretory granules and smooth endoplasmic reticulum, followed by reentry into the cell cycle. However, not all Clara cells behave identically. A variant that shows resistance to naphthalene, a toxin that targets the cytochrome P-450 isoenzyme found in Clara cells, may represent the main progenitor subtype and has been called either the toxin-resistant variant Clara cell or the bronchiolar stem cell.

Cells have been identified at the junction between terminal bronchioles and the alveoli that express markers of both Clara cells and type II pneumocytes. These have been named bronchoalveolar stem cells (BASCs), because when isolated and grown in vitro, they have been shown to be capable of self-renewal. At least in culture, they can differentiate into cells that express markers of bronchial epithelium, alveolar type I or type II cells. The identity of the progenitor cells of the interstitium, smooth muscle, and endothelium remains unclear. In adults, transdifferentiation of one cell type to another may be important. In disease, many examples of epithelial to mesenchymal transition have been described.

There is growing interest in the potential of stem cell technology in the treatment of lung disease. In young mice, the administration of retinoic acid derivatives can induce lung regeneration, presumably from endogenous stem cells; whether such regeneration can be induced in adult humans, however, is not clear. Instead, it may be possible to deliver exogenous stem cells to the lung. Among the potential sources for such cells, the most controversial are embryonic stem cells. When grown in vitro, these cells can be made to develop toward a type II pneumocyte–like phenotype expressing surfactant proteins that contain lamellar bodies or to form pseudoglandular structures. However, there are currently no clinical data to support their use. By contrast, mesenchymal stem cells derived from bone marrow or cord blood have been more extensively studied. In mouse models of acute lung injury and lung fibrosis, mesenchymal stem cells have yielded promising results. Moreover, in a clinical trial administering human mesenchymal stem cells to patients after acute myocardial infarction, an increase in forced expiratory volume in 1 second (FEV₁) was reported. Further trials designed specifically to test the effectiveness of such therapies in well-defined human lung disease are now required.

The major hurdle in administering exogenous cells is the potential for immune reaction and rejection. If the patient’s own cells can be used, this problem can be circumvented. In one recent case involving a 10-year-old boy with congenital tracheal stenosis, a segment of trachea was grown ex vivo on a collagen scaffold from stem cells derived from his bone marrow. This was subsequently implanted to correct the stenosis. Modern technologies now enable progenitor cells to be generated from adult peripheral tissues. Induced pluripotent stem cells (iPS cells) can be derived by expressing stem cell transcriptional regulators such as SOX-2 and Oct-3 in adult cells such as skin fibroblasts. The resulting dedifferentiated cells have the potential to differentiate into many other tissues—hence their pluripotency. This technology has been used to generate hepatocyte-like cells from the skin of patients with α₁-antitrypsin deficiency. Ultimately, when such iPS cells are reprogrammed to correct the genetic defect, it is hoped that they will provide a source of autologous tissue to replace damaged or malfunctioning organs.

Pitfalls and Controversies

Nature of the α₁-Antitrypsin Polymer

Although the “loop-sheet” model of polymerization described earlier has long been accepted as the mechanism of the retention of Z α₁-antitrypsin within cells, the field was recently revitalized by a new model for polymerization. This model was based on the finding that another serpin molecule, antithrombin, could be re-folded in vitro to form dimers linked by the swapping of a large hairpin structure rather than by the single strand of the reactive center loop. This “domain swap” model initially garnered significant support and led to revival of the molecular mechanism of α₁-antitrypsin deficiency. An important consequence of this renewed interest was the development of a novel monoclonal antibody called 2C1, which specifically recognizes α₁-antitrypsin polymers generated by heating the purified protein. This development was significant because up to that point, heating rather than refolding after chemical denaturation had been the source of most serpin polymers studied in vitro. The crucial observation that 2C1 also recognizes polymers formed within the livers of affected patients, while showing no reactivity against polymers formed by refolding in vitro, strongly supports the original loop-sheet hypothesis. However, this renewed bout of interest in serpin polymerization, while having been prompted most likely by an in vitro antithrombin artifact, has left researchers with an excellent new monoclonal antibody that will be of great use in the study (and perhaps future diagnosis) of this disease. This field remains ripe for future discoveries, and it is unlikely that the last word on serpin polymerization has yet been written.

Alveolar Regeneration in Response to Retinoic Acid

In 1997 it was shown that retinoic acid could induce alveolar regeneration in rats. This observation suggested a possible future therapy for emphysema in humans. However, subsequent studies that have attempted to replicate these findings have met with mixed results, some positive and others not. This inconsistency may reflect strain-specific differences, because it has been noted that the successful studies were restricted to certain strains of mice and rats. Recent observations have demonstrated that it is not responsiveness or lack thereof to retinoic acid per se that is strain-specific, but instead that each strain differs in its sensitivity to the agent. What is not clear is if the effects can be recapitulated in animals other than rodents. A single study in humans has so far failed to show clinical improvement in indices of emphysema, although that study has been criticized for methodologic flaws including lack of power. Larger-scale, well-controlled studies are necessary to settle this important controversy.

Stem Cell Biology

As discussed earlier, the potential for regeneration of lung tissue from a stem cell pool is a research subject of considerable interest, with ongoing controversy surrounding the origins of these cells. Indeed, opinion remains split on whether to call these “stem cells,” “progenitor cells,” or even “reparative cells.” Clearly, endogenous pulmonary cells are responsible for recovery of the lung from many insults, but it has not been established how these could be used therapeutically unless mechanisms can be found to augment their function. The use of iPS, bone marrow–derived, or even embryonic stem cells remains a controversial area but holds the potential to revolutionize therapy for emphysema.

Autophagy Versus Endoplasmic Reticulum–Associated Degradation in α₁-Antitrypsin Deficiency

In many diseases, the accumulation of aberrant proteins in the endoplasmic reticulum directly contributes to the pathogenesis of disease, be these misfolded surfactant proteins or polymers of α₁-antitrypsin. It is a goal of several laboratories to devise ways of helping the cell dispose of these retained proteins, but a useful therapy remains elusive. For some time, it was thought that polymers within the endoplasmic reticulum were too large to be degraded by the classical endoplasmic reticulum–associated degradation (ERAD) pathway and so an alternative disposal mechanism must be involved. Indeed, a number of reports described the activation of autophagy by retained serpin polymers. Autophagy is the process by which intracellular components are engulfed by double-membraned structures called autophagosomes that ultimately “digest” their contents in what translates literally as “self-eating.” However, it appears that the disposal of polymerogenic serpins shows significant cell type differences, and most in fact are degraded primarily by ERAD, with autophagy playing a secondary role. This area requires further study. An exciting recent report has described how the use of a commonly prescribed drug, carbamazepine, can stimulate both the ERAD and autophagy of polymerized α₁-antitrypsin and result in the resolution of liver disease in transgenic Z α₁-antitrypsin mice. The doses required currently are far higher than those that can safely be employed in humans, so further clinical trials are needed.

The Candidate Gene Approach

When hunting for a disease-associated mutation or polymorphism, it is essential to maintain a high degree of skepticism and statistical rigor. The involvement of a biologic statistician at an early stage of the study design is therefore essential. The attempt to link tumor necrosis factor (TNF)-α variants with the pathogenesis of COPD illustrates this pitfall well. The involvement of TNF-α was biologically plausible, because the levels of this multifunctional cytokine are elevated in bronchoalveolar lavage fluid, induced sputum samples, and lung biopsy specimens from patients with COPD. Moreover, well-studied promoter polymorphisms had been shown to alter expression levels and were linked with other inflammatory conditions. It was the observation of an association (with a staggering odds ratio of over 10) between a specific allele of TNF-α and “bronchitis” in Taiwanese men that ignited interest in this gene in COPD. That study was difficult to interpret, however, because a third of the men involved were “never smokers” and thus were unlikely to suffer from COPD. More than 10 subsequent studies have found little evidence that TNF-α polymorphisms are associated with, or modify the progression of, COPD. Although tempting as it may be to restrict study to a “favorite gene,” it is far more fruitful to look first in an unbiased fashion for associations and only then to focus on individual genes or pathways.

Future Directions

The first human genome took a global effort of 13 years to complete. Today, genomes can be sequenced in a single laboratory in less than a week. Before very long, sequencing studies will be affordable for most research groups and may potentially even enter clinical practice, truly revolutionizing medicine. Such genomic elucidation will determine far more than whether a few disease-causing mutations are present; it will make possible individually tailored treatments like never before. Pharmacogenetics will identify the most effective therapies and avoid the worse complications. In clinical practice, experienced prognostication and risk stratification will inform the hard decisions that both patient and physician need to make. In parallel, structural and cell biologists will continue to tease apart the components that make pneumocytes work and, more important, allow them to fail. These advances can be expected to lead to a new understanding of pathology and to provide signposts to new treatments. This century promises to be one in which all physicians can look forward to an exciting role as clinician scientists.

Molecular Biology

The Genetic Basis of Lung Disease

Genetic factors play an important role in diseases that affect the airways (asthma, chronic obstructive pulmonary disease [COPD], cystic fibrosis, primary ciliary dyskinesia), parenchyma (pulmonary fibrosis, Birt-Hogg-Dubé syndrome, tuberous sclerosis), and vasculature (hereditary hemorrhagic telangiectasia) of the lung (Table 2-1). Such conditions include simple monogenic disorders such as Kartagener syndrome and α₁-antitrypsin deficiency, in which mutations of critical genes are sufficient to induce well-defined disease phenotypes. By contrast, many other disease processes affecting the lung are complex genetic traits in which inheritance subtly affects pathogenesis. This group of entities includes COPD, asthma, and idiopathic pulmonary fibrosis. Extending current understanding of the genetic basis of pulmonary conditions will be essential to provide new insights into their underlying pathophysiology, to make predictions about outcome, and to develop novel therapeutic strategies.

Table 2-1 Examples of Genetic Factors That Underlie Lung Disease

Identification of single-gene defects in families that show the same phenotype is now relatively straightforward, owing to completion of the human genome project and improvements in DNA sequencing. Consequently, the past 20 years have seen rapid progress in elucidation of the genetic basis of disease. This rate of progress can be appreciated by a consideration of the many years required to identify the gene associated with cystic fibrosis. Dorothy Hansine Andersen first defined the condition in 1938 when she described cystic fibrosis of the pancreas in association with lung and intestinal disease. Only later was it recognized to be a recessive condition. The sweat test that is used to diagnose the condition was developed after the detection of abnormal sweat electrolytes by Paul di Sant’ Agnese in 1952. The search for the cystic fibrosis gene started in the early 1980s, and the gene was localized to chromosome 7 in 1985 through recognition of linkage with the highly polymorphic gene paraoxonase in many populations. This achievement was followed by the identification of additional markers more closely linked to the cystic fibrosis locus, MET and D7S8, allowing prenatal diagnosis of the disorder and eventually leading directly to the mapping of the causative gene in 1989 by teams headed by Lap-Chi Tsui, Francis Collins, and Jack Riordan. This gene was called the cystic fibrosis transmembrane conductance regulator (CFTR), and now more than 1000 different mutations have been identified that cause cystic fibrosis.

By contrast, today, what had once taken many groups a decade to complete can be undertaken in a single laboratory in days. For example, modern exome sequencing enables all 180,000 exons encoded by the human genome to be characterized in an individual patient or an entire kindred. Although the exome equates to only 1% of the genome, or about 30 megabases, it is thought to contain 85% of the mutations responsible for mendelian disorders. This technology, for example, was recently used to identify the causative gene of Miller syndrome, a rare disorder that manifests with cleft palate, absent digits, and ocular anomalies. The entire exomes of four persons so affected were sequenced, allowing mutations to be identified in the causative gene encoding dihydroorotate dehydrogenase (DHODH).

The major challenges now are therefore no longer the single-gene disorders but complex genetic diseases such as cancer, COPD, asthma, and interstitial lung disease. These diseases are the result of interactions between multiple genes and environmental factors. Consequently, the diseases cluster within families but do not show a clear pattern of inheritance.

Single-Gene Disorders and Respiratory Disease

Many single-gene disorders have been linked with respiratory disease (see Table 2-1). They are perhaps best typified by the autosomal recessive condition α₁-antitrypsin deficiency. This condition shows a clear genotype-phenotype correlation with current understanding of the molecular basis providing new insights into the pathogenesis of disease. α₁-Antitrypsin is the archetypal member of the serine proteinase inhibitor (“serpin”) superfamily. It is synthesized in the liver and secreted into the plasma, where it is the most abundant circulating proteinase inhibitor. Most people of North European descent carry the normal M allele, but 1 in 25 carries the Z variant (Glu342Lys), which results in plasma α₁-antitrypsin levels in the homozygote that are 10% to 15% of the normal M allele. The Z mutation causes the accumulation of α₁-antitrypsin in the rough endoplasmic reticulum of the liver, predisposing the homozygote to the development of juvenile hepatitis, cirrhosis, and hepatocellular carcinoma. The greatly reduced circulating levels of α₁-antitrypsin are unable to protect the lungs against proteolytic damage by neutrophil elastase, predisposing the Z homozygote to the development of early-onset emphysema.

The structure of α₁-antitrypsin is based on a dominant β-pleated sheet A and nine α-helices (Figure 2-1). This scaffold supports an exposed mobile reactive loop that presents a peptide sequence as a pseudosubstrate for the target proteinase. After docking, the proteinase is inactivated by a mousetrap-type action that swings it from the top to the bottom of the serpin in association with the insertion of an extra strand into β-sheet A (see Figure 2-1). This six-stranded protein bound to its target enzyme is then recognized by hepatic receptors and cleared from the circulation. The structure of α₁-antitrypsin is central to its role as an effective antiproteinase but also renders it liable to undergo conformational change in association with disease. The Z mutation is at residue P₁₇ (17 residues proximal to the key P₁ amino acid that defines the inhibitory specificity of α₁-antitrypsin) at the head of a strand of β-sheet A and the base of the mobile reactive loop (see Figure 2-1). The mutation opens β-sheet A, thereby favoring the insertion of the reactive loop of a second α₁-antitrypsin molecule to form a dimer (see Figure 2-1). This dimer can then extend to form polymers that tangle in the endoplasmic reticulum of the liver to form the inclusion bodies resulting in liver disease. Support for this pathomechanism comes from the demonstration that Z α₁-antitrypsin formed chains of polymers when incubated under physiologic conditions. The rate was accelerated by raising the temperature to 41° C and could be blocked by peptides that compete with the loop for annealing to β-sheet A. The role of polymerization in vivo was clarified by the finding of α₁-antitrypsin polymers in inclusion bodies from the livers of Z α₁-antitrypsin homozygotes (see Figure 2-1).

Figure 2-1 The molecular basis of α₁-antitrypsin deficiency. α₁-Antitrypsin may be considered to act by a mousetrap mechanism. A, After docking (left), the target proteinase (gray) is inactivated by movement from the upper to the lower pole of the protein (right). This is associated with insertion of the reactive loop (red) as an extra strand into β-sheet A (green). The mousetrap mechanism may be triggered spontaneously by point mutations in association with disease. The Z mutation (Glu342Lys) of α₁-antitrypsin is at the head of a strand of β-sheet A (green) and the base of the reactive loop. B, Mutations in this region can destabilize β-sheet A to allow the insertion of a reactive loop of a second molecule (middle). This dimer then extends to form long chains of polymers (right). Each molecule of α₁-antitrypsin in the polymer is shown in a different color. It is these polymers that tangle in the endoplasmic reticulum to cause inclusions resulting in liver disease. C, An inclusion body (arrow) from the liver of a patient with α₁-antitrypsin deficiency (left). The inclusions are composed of chains of molecules of α₁-antitrypsin (right).

(Modified from Gooptu B, Lomas DA: Conformational pathology of the serpins—themes, variations and therapeutic strategies, Annu Rev Biochem 78:147–176, 2009.)

Although many α₁-antitrypsin deficiency variants have been described, only three other mutants of α₁-antitrypsin have similarly been associated with plasma deficiency and hepatic inclusions: α₁-antitrypsin Siiyama (Ser53Phe), α₁-antitrypsin Mmalton (Phe52 deleted), and α₁-antitrypsin King’s (His334Asp). All of these mutants lie in the shutter domain that controls opening of β-sheet A. They destabilize the molecule to allow the formation of loop-sheet polymers in vivo. Further investigations have shown that polymerization also underlies the mild plasma deficiency of the S (Glu264Val) and I (Arg39Cys) variants of α₁-antitrypsin. The point mutations that are responsible for these variants have less effect on β-sheet A than does the Z variant. Thus, the associated rate of polymer formation is much slower than that for Z α₁-antitrypsin, which results in less retention of protein within hepatocytes, milder plasma deficiency, and the lack of a clinical phenotype. However, if a mild, slowly polymerizing I or S variant of α₁-antitrypsin is inherited with a rapidly polymerizing Z variant, then the two can interact to form heteropolymers within hepatocytes. These polymers underlie the inclusions that cause cirrhosis.

Emphysema associated with α₁-antitrypsin deficiency results from lack of protection against proteolytic attack in the lungs associated with reduced levels of circulating proteinase inhibitor. This is particularly the case with individuals who smoke tobacco. The Z α₁-antitrypsin that does escape from the liver into the circulation is less efficient in protecting the tissues from enzyme damage and, like M α₁-antitrypsin, may be inactivated by oxidation of the P1 methionine residue. The demonstration that Z α₁-antitrypsin can undergo a spontaneous conformational transition in association with liver disease raised the possibility that this might also occur within the lung. Indeed, polymers have been detected in bronchoalveolar lavage fluid in patients with Z α₁-antitrypsin deficiency. This observation may have important implications for the pathogenesis of disease, because polymerization obscures the reactive loop of α₁-antitrypsin, rendering the protein inactive as an inhibitor of proteolytic enzymes. Thus, the spontaneous polymerization of α₁-antitrypsin within the lung will exacerbate the already reduced antiproteinase screen, thereby increasing the susceptibility of the tissues to proteolytic attack and increasing the rate of progression of emphysema. Finally, the α₁-antitrypsin polymers themselves are inflammatory for neutrophils, which will also increase the proteolytic load in the lung. Recent data suggest that cigarette smoke can induce the intrapulmonary polymerization of Z α₁-antitrypsin, thereby exacerbating the lung damage associated with smoking.