Gene Expression

Published on 28/02/2015 by admin

Last modified 22/04/2025

Print this page

This article have been viewed 4009 times

CHAPTER 15 Gene Expression^*

Each organism, whether it has 600 genes (Mycoplasma), 6000 genes (budding yeast), or 25,000 genes (humans), depends on reliable mechanisms to turn these genes on and off. This is called regulation of gene expression. In simple organisms, such as bacteria and yeast, environmental signals, such as temperature or nutrient levels, control much of gene expression. In multicellular organisms, genetically programmed gene expression controls development from a fertilized egg. Within these organisms, cells send each other signals that control gene expression either through direct contact or via secreted molecules, such as growth factors and hormones.

Given the vast numbers of genes, even in simple organisms, regulation of gene expression is complicated. Control is exerted at multiple steps, including production of mRNA, translation, and protein turnover. This chapter focuses on the first of these regulatory steps: the transcription mechanisms that lead to the production of messenger RNA (mRNA) and other RNA transcripts. The past decade has seen the discovery of hundreds of key components in this process.

Proteins called transcription factors turn genes on or off by binding to partic-ular DNA sequences adjacent to the sequences encoding the protein or RNA product of the gene. The paradigm of this level of regulation is the bacterial repressor that controls expression of genes required for lactose metabolism in Escherichia coli. In eukaryotes, transcription factors are numerous, representing approximately 6% of human genes. They are also quite diverse, binding to a wide range of DNA regulatory sites. Fortunately, they fall into a limited number of families with similar structures and binding mechanisms. Three types of eukaryotic DNA-dependent RNA polymerases respond to these regulatory proteins and copy DNA sequence into RNA. Regulation of transcription factors is achieved by variations in a limited number of mechanisms that control their synthesis, transport from the cytoplasm into the nucleus, and activity through posttranslational modifications or binding to small molecular ligands.

One key level of regulation is transcription initiation, the first step in production of RNA transcripts. This chapter examines the basic features of both prokaryotic and eukaryotic transcription units and the transcription machinery. Regulatory transcription factors that control the expression of several selected genes are discussed in the context of how external signals can reprogram patterns of gene expression. Finally, the chapter addresses the mechanisms by which mutation of transcription factor genes leads to human disease.

The Transcription Cycle

Synthesis of RNA by RNA polymerases is a cyclic process that can be broken down into three sets of events: initiation, elongation, and termination (Fig. 15-1). Each of these events consists of multiple individual steps. In the first step of the initiation process, RNA polymerase locates and binds to the chromosome near the beginning of the gene, forming a preinitiation complex at a sequence termed a promoter. This binding must be highly specific to distinguish promoter from nonpromoter DNA. Next, a conformational change in the polymerase-promoter complex results in formation of an open complex in which the DNA duplex is unpaired, allowing RNA polymerase access to nucleotide bases that are complementary to the start of the message. After formation of a phosphodiester bond between the first two complementary ribonucleotides, the polymerase translocates one base and repeats the process of phosphodiester bond formation, resulting in elongation of the nascent RNA. The elongation reaction cycle continues at an average rate of about 20 to 30 nucleotides per second until the complete gene has been transcribed. Elongation is not a uniform reaction, however, as RNA polymerase pauses at certain sequences. These pauses are important for regulation of transcription. The final step in the transcription cycle, termination, occurs when the polymerase reaches a signal on DNA that causes an extended pause in elongation. Given enough time and the appropriate sequence context, the nascent transcript dissociates from the elongating RNA polymerase, and the DNA template returns to a base-paired duplex conformation. Ultimately, RNA polymerase dissociates from the template and is free to begin a new search for a promoter.

Figure 15-1 the transcription cycle. The transcription reaction consists of three basic steps in which the RNA polymerase initiates transcription at the promoter, elongates the nascent RNA copy of one of the DNA strands, and terminates transcription on completion of the message.

Each of the steps in the transcription cycle can potentially serve as the target of regulatory molecules. The frequency of initiation varies among different promoters as dictated by the need for the gene product. The initiation reaction is most often regulated, presumably because this prevents synthesis of messages that encode unneeded products. Elongation and termination can also be regulated, as can splicing and further processing of mRNAs (see Chapter 16). In eukaryotes, the sum of these nuclear regulatory steps, together with cytoplasmic regulation of mRNA stability and translation efficiency, contributes to the wide variation seen in the abundance of different mRNAs and proteins in particular types of cells.

The Transcription Unit

Coding information in genomes is transcribed in increments corresponding to one or a few genes. Gene-coding and regulatory (cis-acting) DNA sequences that direct transcription initiation, elongation, and termination are collectively called a transcription unit. Prokaryotic transcription units, called operons, contain more than one gene, often encoding physiologically related proteins (Fig. 15-2A). Operons are flanked by sequences that direct the initiation and termination of transcription. Figure 15-2B shows a simple eukaryotic transcription unit encoding the human hemoglobin β-chain. Although only a small fraction of this region encodes the β-globin polypeptide, the adjacent regulatory se-quences are crucial for proper expression of β-globin. Genetic defects resulting in decreased β-globin production are called β-thalassemias. Such mutations can occur either in the coding region, resulting in an unstable or truncated polypeptide, or in the adjacent control regions, leading to low levels of transcription or aberrant processing of the newly synthesized RNA (see Chapter 16). Thus, the transcription unit can be thought of as a linked series of modules, all of which must be functional for the gene to be transcribed at the correct level.

Figure 15-2 prokaryotic and eukaryotic transcription units. A, The two transcription units required for regulation of lactose metabolism in E. coli. The I gene encodes the lac repressor, while the Z, Y, and A genes encode β-galactosidase, lactose permease, and thiogalactoside transacetylase. All three genes are required for the cell to grow on media containing lactose and are coregulated as the lac operon. B, The nucleotide sequence of one of the two DNA strands is transcribed into a complementary pre-mRNA copy. The pre-mRNA is processed by removing introns and splicing together the protein-coding exons (orange). The DNA sequences required for expression of a functional β-globin protein are indicated in different colors (see key). Mutations in any of these sequences can lead to decreased β-globin expression.

Biogenesis of RNA

A typical cell contains more RNA than genomic DNA. This RNA consists of molecules ranging typically from several hundred to several thousand nucleotides long. In prokaryotes, newly synthesized mRNA is immediately translated by ribosomes that initiate translation even before transcription has terminated. In eukaryotes, RNA is distributed between the nucleus, where RNA synthesis occurs, and the cytoplasm, where most RNA is used to synthesize proteins. Eukaryotic cells have four different types of RNA:

1. Ribosomal RNA (rRNA [see Fig. 16-9]), the most abundant type, making up about 75% of the total

2. Small, stable RNAs, such as transfer RNA (tRNA [see Fig. 17-3]), small nuclear RNAs (snRNA [see Chapter 16]) involved in splicing, and 5S rRNA, which makes up about 15% of the total

3. mRNA and its precursor heterogeneous nuclear RNA (hnRNA), which account for only 10%

4. Small noncoding (ncRNAs) or micro RNAs (miRNA), which are involved in a variety of regulatory processes.

Transcription of eukaryotic DNA in the nucleus is linked to subsequent steps that process the nascent transcript in preparation for its eventual function (see Chapter 16 for a complete discussion of these steps). For mRNA precursors, this includes capping and methylation of the 5′ end of the nascent transcript. Most messages are also spliced to remove introns; the 3′ end of the message is then cleaved, and a stretch of adenosine residues is added. The mRNA is then transported to the cytoplasm, where it serves as the template for protein synthesis.

Eukaryotic ribosomal RNA is synthesized from a set of tandemly repeated genes as a single molecule, which is cleaved and modified to give the final 28S, 5.8S, and 18S RNAs (Fig. 15-3). These are assembled, together with 5S RNA and about 80 proteins, into ribosomes in the nucleolus. Transfer RNA is synthesized in the nucleus and transported to the cytoplasm, where it is charged with amino acids prior to participating in protein synthesis (see Chapter 17). snRNAs are synthesized and processed in the nucleus. From there, they migrate to the cytoplasm, where they acquire essential proteins, and then return to the nucleus, where they function in the enzymatic reactions of RNA processing (splicing; see Chapter 16). The postsynthetic processing pathway that a particular transcript follows is dictated, in part, by the transcription machinery that is used to initiate and elongate the transcript and by certain features of the nascent RNA.

Figure 15-3 ribosomal rna transcription unit. Ribosomal RNA is transcribed from a set of transcription units arrayed as tandem copies of the same transcription unit. A, Map showing the arrangement of sequences in a typical ribosomal DNA repeat. B, Electron micrograph showing two active rRNA transcription units. Note that each transcription unit is transcribed by multiple RNA polymerases. As the polymerases traverse the gene, the attached nascent RNA is extended, giving a tree-like appearance.

(B, Courtesy of Yvonne Osheim, University of Virginia, Charlottesville.)

RNA Polymerases

RNA polymerases synthesize a new strand of nucleic acid that is complementary to one of the chromosomal DNA strands. While the enzymatic reaction is similar to DNA replication (see Chapter 42), there are several important differences. First, RNA polymerases synthesize a strand of ribonucleotides. Second, unlike DNA polymerase, RNA polymerases can initiate transcription without a primer. Finally, unlike replication, the newly transcribed sequences do not remain base-paired with the template but are displaced after reaching a length of about 10 nucleotides. These properties are common to RNA polymerases in all cells; therefore, it is not surprising that all cellular RNA polymerases share common structural features.

Bacteria have a single RNA polymerase containing six polypeptides. Two copies of the a subunit and one each of the β, β′, and ω subunits form a five-subunit core enzyme that synthesizes RNA. The sixth subunit, σ, binds to the core enzyme to form a holoenzyme that is able to recognize promoter sequences and initiate transcription.

Most eukaryotes have three different RNA polymerases (some species of plants contain four). The largest subunits of the three eukaryotic RNA polymerases are closely related to the bacterial β and β′ subunits. RNA polymerases I, II, and III have up to 10 additional subunits, most of which are unique to each enzyme (Fig. 15-4A). The subunits of both prokaryotic and eukaryotic enzymes assemble into a structure that is roughly spherical, with a diameter of approximately 150 Å and a 25-Å-wide cleft, large enough to accommodate the DNA template (Fig. 15-4B). The site of nucleotide addition is located on the back wall of the cleft. The framework of this structure is provided by the two largest subunits, which make up the two lobes that clamp down on the template DNA.

Figure 15-4 multiple rna polymerases. A, Eukaryotic cells have three different polymerases that share three common subunits (numbers 5, 6, and 8) and have a number of other related, but distinct, subunits (indicated by related colors and distinct shading). B, A ribbon diagram of the structure of RNA polymerase II showing the arrangement of different subunits (colored as in part A). Metal ions are indicated as red balls. A prominent cleft, large enough to accommodate a DNA template, is formed between the two largest subunits. The model DNA fragment is shown for size comparison only. C, Conserved amino acid sequences are dispersed throughout the largest subunits. Red indicates sequences that are conserved among both prokaryotes and eukaryotes. Yellow represents sequences that are conserved among the three different eukaryotic RNA polymerases. H. halobium is Halobacterium halobium. D, Conserved residues are located on the inner surface of the RNA polymerase cleft.

(B, PDB file: 1I50. Reference: Cramer P, Bushnell DA, Kornberg RD: Structural basis of transcription: RNA polymerase II at 2.8 angstrom resolution. Science 292:1863–1876, 2001. D, From Zhang G, Campbell EA, Minakhin L, et al: Crystal structure of Thermus aquaticus core RNA polymerase at 3.3 Å resolution. Cell 98:811–824, 1999.)

The eukaryotic polymerases can be distinguished experimentally on the basis of their sensitivity to the fungal toxin α-amanitin, RNA polymerase II being the most sensitive and RNA polymerase I being the most resistant. RNA polymerase I localizes to the nucleolus, where it synthesizes rRNA. RNA polymerase II synthesizes mRNA and several snRNAs involved in RNA splicing in the nucleoplasm. RNA polymerase III synthesizes tRNA, 5S rRNA, and the 7S RNA of the signal recognition particle (see Fig. 20-5). The newly described RNA polymerase IV is present in plants, where it is involved in heterochromatin formation and gene silencing.

The multiple eukaryotic RNA polymerases apparently originated through duplication of primordial subunit genes, followed by evolution of specialized functions. For example, RNA polymerase I synthesizes one species, whereas RNA polymerase III synthesizes several hundred species of highly abundant transcripts. The pool of mRNAs is more complex, however. Human cells have approximately 20,000 different species of mRNA. The relative abundance of individual mRNAs can vary widely, often in response to external signals, from just a few copies to more than 10,000 copies per cell. Thus, RNA polymerase II must recognize thousands of different promoters and transcribe them with widely varying efficiencies. In contrast, RNA polymerases I and III are specialized for the high rates of transcription necessary to produce rRNAs (>100,000 copies per cell) and other abundant small, stable RNAs.

Specialization has been balanced, however, by the need to retain the structural elements required for RNA synthesis. In each eukaryotic RNA polymerase, the largest subunits are homologous to the bacterial β′ – and β-subunits that make up the catalytic core of prokaryotic RNA polymerases (Fig. 15-4C). The structure of a bacterial RNA polymerase reveals that the most conserved residues are located on the inner surfaces of the enzymes, where they are likely to be involved in the synthesis of RNA (Fig. 15-4D).

Transcription does not necessarily require such large enzymes. Bacteriophages have evolved structurally distinct, DNA-dependent RNA polymerases that are one fifth the size of the eukaryotic enzymes yet are able to carry out complete transcription cycles. The complexity of the eukaryotic enzymes is likely attributable to the need for regulation, with additional subunits acting as sites for interaction with regulatory proteins. Domains that differ among the three types of eukaryotic RNA polymerase are likely to interact with cofactors that are unique to a particular class of polymerase. One example of a class-specific domain is found in the largest subunit of RNA polymerase II, which has an unusual repetitive carboxyl-terminal domain (CTD) made up of tandem repeats of the consensus heptapeptide TyrSerProThrSerProSer. This domain has been implicated in the formation of an RNA polymerase II complex that contains many of the cofactors needed for initiation. The CTD is highly phosphorylated in vivo, and the timing of CTD phosphorylation suggests that this modification may be involved in the transition between the initiation and elongation steps of transcription. The CTD also binds to pre-mRNA processing factors, suggesting that it plays a role in coupling transcription and the subsequent processing of the nascent mRNA.

RNA Polymerase Promoters

Initiation of transcription requires RNA polymerase loading onto the chromosome at the promoter of a gene or operon. The promoter can be loosely defined as the sum of DNA sequences necessary for transcription initiation. This definition is not sufficient, however, as most genes are regulated (positively or negatively) at the transcription initiation level. In eukaryotic cells, packaging into chromatin represses most promoters, and activator proteins are required for recruiting RNA polymerase to the site of initiation. In prokaryotes, both activators and repressors modulate the frequency of initiation at promoters. Strong promoters drive the expression of genes whose products are required in abundance, whereas weaker promoters are selected for expression of rare proteins or RNAs. In multicellular organisms, a promoter may direct expression at an intermediate level in some cells, at an activated level in others, and at a repressed level in yet others.

Promoters in bacteria are recognized by direct interactions between specific DNA sequences and the RNA polymerase σ factor. The most common σ factor in E. coli (σ 70) recognizes two conserved six-base sequences located 10 bases (minus 10) and 35 (minus 35) upstream of the transcription start site (Fig. 15-5A). Once initiation has occurred, σ is no longer required and can dissociate from the core enzyme. Bacterial cells have several distinct σ factors, each of which binds the core enzyme and direct RNA polymerase to a subset of promoters that contain different recognition sequences, thereby promoting transcription of genes with related functions.

Figure 15-5 prokaryotic and eukaryotic promoters. The prokaryotic (A) and three eukaryotic (B–E) RNA polymerases recognize different promoter sequences. Positions of promoter elements are indicated with respect to the start of transcription (+1). For the RNA polymerase II promoter elements, the consensus sequences are shown. Not all polymerase II promoters contain all of these elements.

Eukaryotic RNA polymerase I and II promoter sequences are also situated upstream of the transcription start site. In contrast, RNA polymerase III promoters contain key promoter elements within the transcribed sequences. RNA polymerase I recognizes a single type of promoter located upstream of each copy of the long tandem array of pre-rRNA coding sequences (Fig. 15-5B). The core element of this promoter overlaps the transcription start site, while an upstream control element located approximately 100 base pairs (bp) from the start site stimulates transcription. RNA polymerase I is not required in yeast cells that contain a pre-rRNA gene under control of an RNA polymerase II promoter. Therefore, if RNA polymerase I does recognize other promoters, these transcripts are not required for viability.

Comparison of the first eukaryotic protein-coding gene sequences revealed a conserved consensus se-quence located approximately 30 bp upstream of the transcription start site of many RNA polymerase II–transcribed genes (Fig. 15-5C). This consensus sequence—TATAAAA—called a TATA box, shows some similarity to the bacterial -10 sequence. In addition to the TATA box, a less conserved promoter element, the initiator, is found in the vicinity of the transcription start site of many genes. RNA polymerase II–transcribed genes that do not contain TATA boxes often contain strong initiator elements. Together, these two elements account for the basal promoter activity of most protein-coding genes.

Both types of RNA polymerase III promoters have key elements within the transcribed sequences (Fig. 15-5D–E). tRNA genes contain two 11-bp elements, the A box and B box, centered about 15 bp from the 5′ and 3′ ends of the coding sequence, respectively. The 5S-rRNA gene contains a single internal element, the C box, located in the center of the coding region. Given the differences in classes of eukaryotic promoters, it is not surprising that different polymerases use different proteins to recognize the promoter sequences.

Transcription Initiation

The loading of RNA polymerase onto the double-stranded genomic DNA at a promoter sequence is best understood in prokaryotes and is discussed first before the discussion of eukaryotes. Initiation takes place in a series of defined steps (Fig. 15-1). First, holoenzyme binds to the double-stranded promoter, forming what is called the closed complex. The specificity and strength of this interaction are dictated by sequence-specific contacts between the σ factor and the bases in the -10 and -35 elements of the promoter (Fig. 15-6). The second step in initiation is the formation of an open complex in which a 14-bp region around the transcription start site is unpaired producing a transcription bubble. This unpairing is accompanied by a conformational change in the polymerase that positions the single-strand DNA template in the active site and narrows the DNA-binding cleft, effectively closing the polymerase clamp. In the next step, the DNA template in the active site base-pairs with the first two ribonucleotides, and the first phosphodiester bond is catalyzed. This process is repeated until the nascent RNA reaches a length of eight to nine bases, at which point addition of bases to the growing RNA chain results in the unpairing of one base of the RNA-DNA hybrid, and the nascent RNA begins to exit through a channel on the surface of the polymerase. The resulting conformational change in polymerase leads to the release of σ factor and formation of a stable ternary (three-way) complex containing RNA polymerase, the DNA template, and the nascent RNA.

Figure 15-6 rna polymerase initiation. A, While initiation of prokaryotic transcription is more completely understood, the conservation of RNA polymerase structure implies that the fundamental steps in initiation are conserved. B, In the closed complex, the double-stranded promoter DNA is recognized by σ factor domains on the surface of the holoenzyme. C, The open com-plex forms by unwinding DNA surrounding the transcription start site and positioning the single-stranded template in the active site of the polymerase. D, The initiation reaction in the context of the transcription cycle.

General Eukaryotic Transcription Factors

Purified eukaryotic RNA polymerase on its own cannot initiate transcription from promoters in vitro. Specific transcription can be obtained in vitro using extracts from nuclei, and fractionation of such extracts has led to the identification of additional factors necessary for specific transcription by purified RNA polymerase in vitro. Rather than a σ factor, eukaryotic RNA polymerases require multiple initiation factors. Most of these factors are unique to each RNA polymerase, and because they are required for transcription of most promoters (within each class), they are termed general transcription factors (GTFs). GTFs are remarkably conserved among different eukaryotes. Although most factors required for transcription by each class of polymerase are distinct, one of them, first identified as the TATA box–binding protein, participates in different protein complexes involved in each of the three polymerase systems. The next sections compare transcription by the three forms of eukaryotic RNA polymerase.

RNA Polymerase II Factors

The RNA polymerase II GTFs comprise more than 20 polypeptides with an aggregate molecular weight of more than 10⁶ D (Table 15-1). Before RNA polymerase II can initiate transcription in vitro, an ordered assembly of factors at the promoter must occur. Assembly of the RNA polymerase II preinitiation complex begins with the binding of TFIID, a large factor (˜700 kD) consisting of TATA box–binding protein (TBP) and a set of TBP-associated factors called TAFIIs (Fig. 15-7A). TBP alone is sufficient for basal transcription, while TAFs apparently serve as targets for further activation of transcription (see subsequent sections). TBP is the first polypeptide in the basal transcription machinery to recognize a specific DNA sequence during the initiation pro-cess. DNA binding is provided by a highly conserved C-terminal 180-amino-acid domain, which forms a saddle-shaped monomer with an axis of dyad symmetry (Fig. 15-7B). The underside of the TBP “saddle” binds to the minor groove of the TATA sequence, which is splayed open in the process. A pronounced DNA bend is produced at each end of the TATAAA element by the intercalation of phenylalanine side chains (Fig. 15-7C).

Table 15-1 SUMMARY OF EUKARYOTIC RNA POLYMERASE II GENERAL TRANSCRIPTION FACTORS

Figure 15-7 rna polymerase ii preinitiation complex on the adenovirus-2 major late promoter dna. A, The sequential assembly of general transcription factors leads to a preinitiation complex with the promoter region in the closed complex. Helicase activities present in TFIIH use the energy of ATP to unwind the promoter, leading to formation of an open complex. B, Binding of TBP leads to C, a pronounced bend in the DNA. D, TFIIB interacts both upstream and downstream of the TATA box and directs RNA polymerase to the transcription start site.

(B–D, PDB file: 1VOL. TBP + DNA coordinates courtesy of Stephen Burley, Rockefeller University, New York.)

The TFIID-TATA box complex serves as a binding site for additional positive and negative regulators. TFIIA binding stabilizes the TBP-DNA interaction and prevents the binding of repressors that arrest further initiation complex formation.

The next step in assembly of the initiation complex is binding of TFIIB, which binds to one side of TBP and makes contacts with DNA upstream and downstream of the TATA box (Fig. 15-7D). Mutations in the yeast gene that encodes TFIIB show altered mRNA start-site selection, indicating that TFIIB establishes the spacing between the TATA box and the transcription start site. TFIIB interacts directly with TBP and RNA polymerase II and is thus essential for the next steps in initiation complex assembly.

RNA polymerase II enters into the preinitiation complex (see Fig. 15-7A) in association with TFIIF. This factor is related to bacterial σ factor and acts to stabilize the interaction of RNA polymerase II with TFIIB and TBP. In addition, TFIIF binds to free polymerase and prevents interactions with nonpromoter DNA sites.

TFIIH and its stimulatory factor TFIIE are the final general factors to enter the preinitiation complex. Binding of these factors results in more stable protein DNA contacts in the vicinity of the transcription start site. TFIIH contains eight polypeptides, several of which also have functions outside of transcription initiation. TFIIH-associated helicases use the energy from ATP hydrolysis to unwind a short stretch of promoter DNA at the transcription start site. This unpairing of DNA allows RNA polymerase II to recognize the template strand, bind the complementary nucleotides, and synthesize the first few phosphodiester bonds. RNA polymerase II initiation requires hydrolysis of the β-γ phosphate bond in ATP, a reaction that is also catalyzed by TFIIH.

TFIIH also contains a protein kinase that phosphorylates the CTD. This is Cdk-activating kinase, itself a Cdk-cyclin complex that phosphorylates and activates other cyclin-dependent kinases (see Fig. 40-14). In the initiation complex, phosphorylation of the CTD is thought to release it from interactions with GTFs and allow the transition to the transcription elongation phase. Other TFIIH subunits have been identified as components of the DNA repair machinery. Several genes encoding TFIIH subunits are mutated in the human DNA excision repair disease xeroderma pigmentosa, suggesting that TFIIH might serve to link transcription to DNA repair (see later section).

Mediator and the Holoenzyme

In vivo, many of the steps described previously involve the assembly of large macromolecular complexes containing RNA polymerase II, several of the GTFs, other factors that alter chromatin structure, and various additional transcription factors. One of these complexes, the mediator, contains over 20 polypeptides (many with unknown function) but lacks RNA polymerase II and the GTFs. Mediator reversibly interacts with RNA polymerase II and other factors to form a “holoenzyme,” which requires additional factors to be competent for initiation (Fig. 15-8). RNA polymerase II holoenzyme responds to transcription activators (de-scribed in a subsequent section) in vitro, suggesting that one role for the multitude of proteins in this complex is to offer multiple interaction sites for recruitment of holoenzyme to the promoter. Alternatively, a mediator lacking RNA polymerase II can be recruited to the promoter, where it subsequently attracts the polymerase. Thus, the mediator links DNA-bound activators to the basal transcription machinery. In this sense, the mediator acts as a coactivator. Other coactivators present in the holoenzyme act as chromatin remodeling factors (see subsequent section) that act to control access of the transcription machinery to the DNA template.

Figure 15-8 rna polymerase ii holoenzyme. A, The three-dimensional structure of the yeast holoenzyme, reconstructed from electron micrographs of particles preserved in negative stain. B, The mediator complex assists RNA polymerase II in locating promoters through interactions with factors bound to promoter proximal and/or enhancer sequences. Interaction with TFIID, bound at the TATA box, is important in assembling a productive complex. TFIID is thought to remain bound to the TATA box and to facilitate subsequent rounds of initiation.

(A, Courtesy of Joshua Davis and Francisco Asturias, Scripps Research Institute, La Jolla, California.)

RNA Polymerase I Factors

Initiation at RNA polymerase I promoters can also proceed through an ordered assembly of transcrip-tion factors (Fig. 15-9). The upstream binding factor binds to the upstream control element and to part of the core element. This initial complex is stabilized by the SL1 complex of TBP with three RNA polymerase I–specific TAFs.

Figure 15-9 rna polymerase i preinitiation complex. A, Ribosomal RNA promoters assemble a preinitiation complex. (UCE, upstream control element.) B, This complex consists of an up-stream binding factor (UBF) and a multisubunit factor called SL1 (C) that contains TBP. D, Together, these factors recruit RNA polymerase I.

RNA Polymerase III Factors

The assembly of RNA polymerase III initiation complexes differs at various promoters (Fig. 15-10). Initiation at tRNA genes begins with the binding of TFIIIC to the A and B boxes. TFIIIB then binds upstream of the A box at a sequence determined both by an interaction with TFIIIC and through the DNA-binding capacity of TBP. Once the TFIIIC-TFIIIB complex has been assembled, RNA polymerase III can initiate transcription. Multiple rounds of initiation can occur on the stable tDNA-TFIIIC-TFIIIB complex.

Figure 15-10 rna polymerase iii preinitiation complexes. Initiation at RNA polymerase III promoters requires recognition of sequences within the transcribed sequences. These sequences differ for tRNA and 5S ribosomal genes. A, In the case of tRNA genes, only TFIIIC is required for specific binding. B, For 5S genes, the internal element is recognized by the specific DNA-binding factor TFIIIA. BRF, TFIIB-related factor.

Transcription of 5S rRNA genes requires an additional factor called TFIIIA. This protein was the first transcription factor and the first zinc finger protein to be identified. TFIIIA recognizes the C box located near the center of the 5S rRNA coding region. TFIIIC then binds by making contacts on each side of TFIIIA, in much the same way that the A and B boxes are contacted on tRNA genes. Finally, TFIIIB binds through interactions with TFIIIC and DNA, and the resulting preinitiation complex is recognized by RNA polymerase III.

Other Initiation Pathways

In addition to the three classical initiation pathways, transcription can be initiated in other ways. First, some RNA polymerase II promoters lack the TATA box element. In these cases, the initiator element provides the primary sequence target, and its recognition requires the function of one of several auxiliary factors that are thought to bind to the initiator. Despite the lack of a TATA box, these promoters still require TBP, presumably because it serves to stabilize the binding of re-quired TAFs.

Another unusual set of promoters drives expression of the snRNA genes. These promoters contain binding sites for both RNA polymerase II and RNA polymerase III factors, and they can be transcribed by either polymerase. Like other eukaryotic promoters TBP is required for transcription. Unlike the other systems, the snRNA promoters recruit a novel TBP complex, which contains a unique set of TAFs.

Summary of the Eukaryotic Basal Transcription Machinery

Despite the evolutionary divergence of the multiple eukaryotic RNA polymerases and the specialization of each polymerase for a unique set of promoters, the fundamental mechanisms of transcription have been conserved. This conservation is reflected not only in similar sequences of the subunits of the polymerases themselves but also in the presence of TBP and TFIIB homologs among the GTFs used by each class of polymerase. Indeed, Archaea, which have only a single RNA polymerase, contain both TBP and TFIIB. This observation suggests that initiation mechanisms employing GTFs evolved before the duplication of the RNA polymerases.

Why are so many factors required to make a transcript? Part of the complexity might be necessary to generate multiple sites for interaction with regulatory factors that could either activate or repress the assembly or function of the preinitiation complex. A second role for the complex set of factors could be to target polymerases to specific sites in the nucleus. Finally, some factors could help load elongation, splicing, or termination factors onto the RNA polymerases.

Transcription Elongation and Termination

The final stage of initiation leads to elongation and movement of the polymerase away from the promoter. This process of promoter clearance is associated with structural changes in the polymerase, which prepare the enzyme for efficient RNA synthesis and render it susceptible to the action of factors that regulate the elongation process. Such regulatory factors, together with structural features of the nascent transcript, influence elongation and can trigger the termination of transcription and the dissociation of the ternary elongation complex containing the DNA template, nascent RNA, and RNA polymerase. This termination reaction typically occurs at the 3′ end of the gene or operon and serves both to recycle RNA polymerase for additional initiation reactions as well as to ensure that adjacent genes are not inadvertently transcribed.

The Catalytic Cycle

The DNA-dependent RNA polymerases catalyze synthesis of an RNA polymer from ribonucleoside 5′-triphosphates (ATP, guanosine triphosphate [GTP], cytidine triphosphate [CTP], and uridine triphosphate [UTP]) according to the following reaction:

where (NMP)_n is the RNA polymer; NTP is ATP, UTP, CTP, or GTP; and PPi is pyrophosphate. Polymerase extends the RNA chain in the 5′ to 3′ direction by adding ribonucleotide units to the chain’s 3′ end. Selection of the incoming NTP is directed by the DNA template and takes place at the transcription bubble, an unpaired segment of the DNA template (Fig. 15-11). The 3′ hydroxyl group acts a nucleophile, attacking the a-phosphate of the incoming NTP in a reaction similar to that seen in DNA replication (see Fig. 42-1). This reaction proceeds in vivo at a rate of 30 to 100 nucleotides per second.

Figure 15-11 transcription elongation. A, Model of the transcription elongation complex consisting of RNA polymerase, template DNA, and nascent RNA transcript. RNA polymerases interact with the template upstream and downstream of the transcription bubble. B, The active site of RNA polymerase positions the growing end of the nascent transcript in the appropriate location for the addition of the next nucleoside triphosphate (NTP). After each single nucleotide addition, the polymerase may translocate forward and repeat the nucleotide addition (C), slide backward and pause for a variable time (D), or slide further backward, allowing removal of the transcript and termination of transcription (E).

The Transcription Elongation Complex

Efficient synthesis of RNA requires balancing two competing demands. First, the elongation complex must be very stable, because premature dissociation from DNA produces defective partial transcripts and requires the polymerase to restart transcription from the promoter. The complex must also be loosely bound so that the polymerase can easily translocate along the DNA template. The structure of RNA polymerase has evolved to meet these needs. The cleft formed at the interface between the two largest subunits is open when the polymerase is in the initiation complex. Once the first few RNA phosphodiester bonds are formed, the polymerase undergoes a conformational change. Subunits at the outer edge of the cleft close like jaws to encircle the DNA template. In this structure, the front end of the transcription bubble is positioned at the back wall of the cleft, close to the catalytic center. This structure is highly efficient and can function continuously for the 17 hours that are required to transcribe the >2 million bp mammalian dystrophin gene.

Pausing, Arrest, and Termination

Following the addition of each nucleotide, RNA polymerase may add an additional nucleotide, pause, move in reverse, or terminate (Fig. 15-11B). The relative probabilities of these alternative reactions depend on interactions between the transcription complex and the template, the nascent RNA transcript, and regulatory transcription factors.

RNA polymerase does not elongate at a constant rate but rather synthesizes RNA in short spurts between pauses. A pause of short duration can be caused by low NTP concentrations or alternatively by the transient unpairing of the 3′-end of the nascent transcript and template. Longer pauses are provoked by the formation, in the nascent RNA, of short (˜20 base) self-complementary sequences that can fold to form a stem-loop or hairpin, or the presence of a weak RNA-DNA hybrid. The presence of an unstable RNA-DNA hybrid can arise from the misincorporation of an NTP leading to an unpaired base in the hybrid. In this case, the RNA polymerase can backtrack or slide backward on the template (Fig. 15-11C). This backward movement of the transcription bubble is accompanied by a zippering movement of the RNA-DNA hybrid in which the nascent RNA in the exit channel rehybridizes with upstream template sequences while the 3′ end of the transcript unpairs from the hybrid and is extruded through the same channel that NTPs use to enter the active site. The backtracked transcription complex is said to be arrested. Transcription elongation factors bind in the NTP channel of arrested complexes and activate the RNA polymerase to cleave the backtracked RNA. The new 3′ terminal residue is correctly positioned for incorporation of the next complementary NTP. This editing process increases the fidelity of transcription. Pausing also occurs following transcription of U-rich sequences, and this is often associated with transcription termination.

Termination

When elongating RNA polymerase reaches the end of a gene or operon, specific sequences in the RNA trigger the release of the transcript and dissociation of the RNA polymerase. Bacteria have two types of termination signals, called terminators. The first are called intrinsic (or rho-independent) terminators, because they function in the absence of any protein factors (Fig. 15-12A). Intrinsic terminators consist of two sequence elements: a stable GC-rich hairpin and a run of about eight consecutive U residues. As the first of these elements is synthesized, it forms a hairpin, causing polymerase to pause with unstable U: A bps (with only two H-bonds [see Fig. 3-14]) in the hybrid. The nascent transcript is released from this unstable transcription complex. The second type of prokaryotic termination requires a protein factor called rho (Fig. 15-12B). Rho is a hexameric protein that binds cytosine-rich sequences and uses ATP hydrolysis to translocate along the nascent transcript in the 5′ to 3′ direction, essentially chasing the RNA polymerase. When polymerase pauses, rho can catch up and use the energy derived from ATP hydrolysis to pull the RNA out of the transcription elongation complex.

Figure 15-12 prokaryotic transcription termination. A, Rho-independent termination is direct-ed by sequences in the nascent transcript that operate in the ab-sence of any additional factors. B, The bacterial termination fac-tor rho translocates along the na-scent RNA and on reaching the RNA polymerase causes the disassembly of the elongation complex.

Eukaryotic RNA polymerases have evolved distinct mechanisms for termination. RNA polymerase III requires no protein factors but terminates efficiently after transcribing four to six consecutive U residues, presumably owing to instability of the RNA-DNA hybrid in the enzyme active site. RNA polymerase I terminates in response to a protein factor that blocks further elongation by binding to a DNA sequence downstream of the termination site, leaving an inherently unstable U-rich RNA-DNA hybrid in the active site. The RNA polymerase II termination mechanism is more complex, requiring a large multiprotein complex that recognizes the poly(A) addition in the nascent transcript (see Fig. 16-3 for pre-mRNA processing). Deletion or mutation of the poly(A) signal results in a failure to terminate messages at the appropriate site, indicating that RNA polymerase II termination is coupled to 3′-end processing.

Gene-Specific Transcription

Transcription initiation is the critical first step in determining which genes are expressed in which cells and at what level. Depending mainly on the sequence of the promoter, expression can be constitutive or influenced by regulatory proteins. This section discusses transcription regulatory proteins that either positively or negatively regulate specific genes. The discussion starts with a prokaryotic example and then expands to include a variety of eukaryotic regulators. Although the details differ in prokaryotes and eukaryotes, many of the basic principles are the same.

Regulation of Transcription Initiation in Prokaryotes

Prokaryotes typically regulate gene expression in response to signals that are produced in response to the internal metabolic state and to environmental cues such as the presence of nutrients in the growth medium (see Fig. 27-11). These signals inside the organism are transmitted to the appropriate genes through transcription regulatory proteins that bind to specific sequences near the genes they control to either activate or repress transcription. Both of these regulatory mechanisms come into play in regulation of the E. coli lactose (lac) operon (Fig. 15-2A). The genes expressed from this operon are required for cells to metabolize lactose but are not expressed in the absence of lactose. Genetic studies in the 1960s showed that the gene upstream of the lac operon (I in Fig. 15-2A) encodes a repressor (lac repressor) that blocks expression of the lac operon in the absence of lactose (Fig. 15-13). The lac repressor binds to a site called an operator that overlaps the RNA polymerase binding site in the lac promoter. In the presence of lactose, the repressor undergoes a conformational change that eliminates DNA binding allowing the recruitment of RNA polymerase to the promoter. Full expression of the lac operon requires the catabolite activator protein (CAP), which is also an allosteric DNA-binding protein that binds just upstream of the lac promoter. If cellular glucose levels diminish, the cAMP concentration rises, and CAP binds cAMP. This induces a structural alteration in CAP, allowing it to dimerize and bind specific DNA sequences. CAP bound to its site stabilizes the otherwise weak interaction of RNA polymerase with the promoter. The resulting activation allows maximum expression of the lac operon in the presence of lactose and the absence of glucose.

Figure 15-13 regulation of the lac operon. A, RNA polymerase (green) binding to the lac promoter is regulated by the binding of repressor or activator (CAP). B, Binding sites for CAP and the repressor at the lac operon. The main repressor-binding site overlaps the promoter and blocks access of RNA polymerase. Additional lac repressor-binding sites are located upstream and downstream of the promoter. Lac repressor can form a tetramer and thus bind two operators, forming a loop in the lac operon DNA. Inducer binding dramatically alters the conformation of the lac repressor diminishing its affinity for the operator. CAP binds just upstream of the promoter where it can stabilize the bound RNA polymerase.

In summary, control of lac gene expression by opposing repressor and activator function is an example of regulation at the first step in transcription initiation, binding of RNA polymerase to the promoter. Regulating access of RNA polymerase to promoters is a common form of transcription regulation in both prokaryotes and eukaryotes.

Eukaryotic Promoter Proximal and Enhancer Elements

In vivo techniques for analyzing eukaryotic promoter function led to the discovery of a number of regulatory elements in addition to the basal promoter elements. In these experiments, transgenes containing a promoter or its mutated derivative are introduced into eukaryotic cells by transfection or microinjection. Transcription directed by the cloned promoter is detected by various approaches that allow the transgene product to be identified from among the background of cellular transcripts. In one approach (Fig. 15-14A), the promoter drives expression of a bacterial reporter gene such as chloramphenicol acetyl-transferase (CAT), β-galactosidase, or luciferase. Eukaryotes lack these enzymes, so their expression can be assayed in extracts of transfected cells with little or no background activity. This approach applies only to RNA polymerase II, which produces translatable mRNAs. A more direct analysis, applicable to transcription by all three RNA polymerases, makes use of specific RNA or DNA probes to quantify RNAs transcribed from the transgene.

Figure 15-14 rna polymerase ii promoter regulatory elements. A, In vivo assays are used to identify key regulatory sequences. In the example shown, a promoter is placed in front of a gene encoding chloramphenicol acetyltransferase (CAT), and the resulting plasmid is transfected into cultured cells. This bacterial enzyme is easily assayed in eukaryotic cells because there is no endogenous activity. Targeted clusters of mutations, strategically placed throughout the promoter region, are tested for their effect on expression of the reporter gene. Mutations that reduce expression define important regulatory elements. B, The region immediately upstream of the metallothionein gene contains binding sites for several transcription factors. The elements are named for the factor that binds there: GRE (glucocorticoid response element), MRE (metal response element), and AP1, AP2, and SP1 (which bind protein factors with the same names as the DNA elements).

Such transgene experiments demonstrated that basal promoter elements are insufficient for full expression of these reporter genes. Deletion or mutation of regions upstream of the transcription start site revealed the existence of additional promoter elements. For RNA polymerase II, these elements fall into two classes; the elements in the first class are located from 50 to 100 bp upstream of the start of transcription and have been termed promoter proximal elements, while those in the second class, enhancers, are located at distances up to 10 kilobase (kb) from the start of transcription. All of these elements are composed of multiple binding sites for transcription regulatory proteins.

Promoter proximal elements are short (˜10 bp) sequences located within a few hundred bp upstream of the TATA box. One example of a promoter proximal element is the CCAAT box in the promoter of the herpes simplex virus thymidine kinase gene. This site was identified by a technique called linker-scanning, in which clustered mutations are introduced at regular intervals in the promoter (Fig. 15-14A). Mutations that result in a decrease in transcription define important sequences. In the case of the thymidine kinase promoter, the CCAAT and TATAAA sequences are required for full transcription. Thymidine kinase expression also requires the sequence GGCGCC, which serves as the binding site for SP1, a transcription factor involved in expression of a number of so-called housekeeping genes, whose products are involved in normal cellular functions. These promoter proximal elements are present in many different genes, where they are necessary for constitutive expression. Other promoter proximal elements are involved in regulated expression, for example, in re-sponse to cellular stress or exposure to heavy metals. Most promoters contain several different promoter proximal elements. This allows for combinatorial regulation of transcription levels by varying the relative abundance or activity of the various factors. The location of numerous regulatory elements directly upstream of the human metallothionein gene, whose product protects cells from the toxic effects of metals (Fig. 15-14B) suggests that a variety of different mechanisms regulate this gene.

Enhancers are clusters of regulatory elements in the DNA similar to promoter proximal elements, but they are considerably more complicated and have several distinguishing features. First, an enhancer increases the rate of initiation from a basal promoter even if it is located up to 10 kb away from the promoter. Second, enhancers work even if located internal to or downstream of the promoter. Finally, the enhancer element will work in either orientation relative to the promoter (Fig. 15-15A). Figure 15-15B shows an example of an enhancer sequence with a number of transcription factors (see the following section) bound, forming a complex called an enhanceosome. Enhancer elements are found in the vicinity of many but not all genes. In most cases, the enhancer works in a cell type–specific fashion. An example is a sequence in an intron of the immunoglobulin heavy chain gene that enhances transcription in lymphocytes but not in other cells. This regulation of enhancer function is likely to be accomplished by changes in the levels of various enhancer-binding factors in different tissues.

Figure 15-15 enhancer elements. A, These condensed clusters of factor-binding sites can influence expression when located far from the promoter in either the upstream or downstream position. In addition, they work in either orientation with respect to transcription. B, Model enhancer showing the tight packing of several different DNA-binding proteins. These complexes fold into structures that have been called enhanceosomes.

Both enhancers and promoter proximal elements can be grafted onto different basal promoters and maintain their function. Even though an enhancer may be more than 1000 bp away from the start site of transcription, it is thought that proteins bound to the enhancer create a loop in the intervening DNA and therefore make direct physical contact with proteins that are bound near the transcription start site.

Gene-Specific Eukaryotic Transcription Factors

Eukaryotic transcription factors bind specific DNA sequences located near the genes they regulate. This binding leads to activation or repression of expression by mechanisms more varied than in prokaryotes. In the simplest cases, the transcription factor interacts directly with the basal machinery. In more complex cases, this interaction may involve a coactivator or corepressor. Transcription factors may also act on the chromatin template rather than the basal transcription machinery. The 1990s witnessed the identification and characterization of hundreds of eukaryotic gene-specific transcription factors. Current estimates indicate that approximately 6% of the coding capacity of the human genome is devoted to transcription factors that recognize specific DNA sequences. The following sections discuss the identification of transcription factors, the functional organization of these proteins, and regulation of the basal transcription machinery and the chromatin template by transcription factors.

Methods for Identifying, Isolating, and Localizing Transcription Factors

Identifying and characterizing transcription factors requires techniques to detect and characterize specific DNA-protein complexes. In one such technique, DNA footprinting, protein is mixed with DNA that is radioactively labeled at one end (Fig. 15-16A–B). The resulting DNA-protein complex is then lightly digested with deoxyribonuclease to give, on average, one random cut per DNA molecule. The population of cleaved DNA molecules thus produced is then stripped of protein and separated by gel electrophoresis. The area protected from cleavage by a specific DNA-binding protein appears as a blank area or “footprint” that results from the protein’s blocking access to the nuclease, thus leaving a gap in the family of digestion products of differing lengths. A less precise but more versatile method of visualizing protein-DNA complexes is the DNA mobility shift assay (Fig. 15-16C). The principle of this technique is that fragments of DNA with a bound protein move more slowly during gel electrophoresis than the same DNA fragments without bound protein.

Figure 15-16 techniques for studying proteins that bind to specific dna sequences. a–b, Footprinting assay. A fragment of DNA thought to contain a specific protein-binding site is radiolabeled at one end of one strand. The labeled probe is then split into two fractions, and the DNA-binding protein is added to one fraction. The two samples are then randomly cleaved with nuclease or chemical reagents in such a way as to cleave only one bond per DNA fragment. High-resolution electrophoresis is used to separate the cleaved fragments, and autoradiography reveals a ladder of fragments that differ in length by a single base. B, Protein bound to DNA protects a limited region of DNA (its footprint) from cleavage, as revealed by the absence of bands in the radioactive ladder. C, Electrophoretic mobility shift assay. A short (20- to 50-bp), double-stranded DNA fragment is radiolabeled and bound to a protein sample. The complex is electrophoresed in a nondenaturing gel. The large protein bound to the DNA retards its mobility in the gel compared with the free DNA. D, Chromatin immunoprecipitation. Proteins are covalently cross-linked to DNA with formaldehyde and then randomly sheared to yield chromatin fragments containing a few hundred bp of DNA. These chromatin fragments are then immunoprecipitated with antibodies to a DNA-binding protein and the enrichment of particular sequences is examined by quantitative PCR or by hybridization to a microarray.

(D, Based on data from Stephen Hartman and Michael Snyder, Yale University, New Haven, Connecticut.)

Both techniques allow detection of specific DNA-binding proteins in crude cellular extracts and thus can be used as assays for protein purification. Transcription factors can also be cloned directly by screening expression libraries with labeled DNA oligonucleotides corresponding to the sequence of the regulatory element and detecting proteins that bind to them. These approaches have been used to isolate hundreds of specific DNA-binding proteins that play specific roles in transcription regulation.

The DNA sites that bind known transcription factors in vivo can be determined by using a technique called chromatin immunoprecipitation (ChIP; Fig. 15-16D). By using this approach, a transcription factor can be localized to a specific promoter at a specific time. The combination of ChIP with microarray approaches allows the distribution of the factor across the genome to be determined.

DNA-Binding Domains

Binding of proteins to specific DNA sequences requires recognition of a pattern of bases along the monotonous double helix. The richest source of DNA sequence variation comes from the chemical groups exposed in the major groove. Most specific DNA-binding proteins probe the major groove of double helix with a small structural element (usually, an a-helix) with a shape that is complementary to the surface topography of a particular DNA sequence. The correct DNA sequence is recognized through multiple interactions between amino acid side chains in the recognition helix and the chemical groups on the edges of DNA bases in the major groove. Single amino acid changes in the recognition helix can change the DNA sequence that is recognized. Protein-DNA complexes are stabilized by additional contacts between amino acid side chains and deoxyribose rings and phosphate groups or by bending of the DNA.

DNA recognition domains of specific transcription factors typically interact with only 3 to 6 bp of DNA. Given the size and complexity of the typical mammalian genome, a sequence must be approximately 16 bp long to occur by chance only once. How then can genes be specifically recognized among the very large number of close but nonidentical sequences? Two strategies in-crease the length of the specific sequence to be recognized. The recognition protein can either use several recognition elements or it can dimerize with itself or other DNA-binding proteins. Binding of protein dimers can lead to recognition of sequences with twofold rotational symmetry.

DNA-binding proteins can be grouped into families based on the structure of the domains used for DNA sequence recognition (Fig. 15-17 and Table 15-2). These include the helix-turn-helix (HTH) proteins, homeodomains, zinc finger proteins, steroid receptors, leucine zipper proteins, and helix-loop-helix proteins. Although these families include most of the known transcription factors, there remain other, uncharacterized recognition domains. Within a given family, the recognition domain of each transcription factor has an amino acid sequence that targets the protein to a particular DNA sequence. Conversely, different families of transcription factors can recognize the same promoter element. The following sections discuss some of the more common eukaryotic DNA-binding domains.

Figure 15-17 molecular structures of transcription factor dna-binding domains. Recognition of specific DNA sequences requires interactions between amino acid side chains in the protein and chemical groups on the DNA bases. In each of the examples shown here, an a-helix interacts with specific bases through contacts in the major groove. A, The homeodomain a-helix recognizes a specific six-base sequence. B, A protein with three zinc fingers recognizes three consecutive three-base sequences. C, The glucocorticoid receptor forms a dimer that recognizes the same six-base sequence (a hormone response element) in opposite orientations spaced three bases apart. D, A leucine zipper factor dimerizes to recognize a pair of four-base sites with opposite orientation spaced one base apart.

Table 15-2 MUTATION OF TRANSCRIPTION FACTOR GENES CAUSES HUMAN DISEASE

Homeodomain

This 60-amino-acid motif was discovered in Drosophila proteins that regulate development and has been found in a wide range of eukaryotic transcription factors, including more than 150 in the human genome. Recognition is provided by a helix-turn-helix (HTH) motif composed of two helices, one of which sits in the major groove of the DNA-binding site contacting a recognition sequence of 6 bp (Fig. 15-17A). The HTH structure is not a stable domain on its own but exists as part of a larger DNA-binding domain, such as the homeodomain. Additional binding affinity is provided in the homeodomain by a flexible arm that interacts with the minor groove.

Zinc Finger Proteins

The zinc finger protein sequence motif (Fig. 15-17B), first identified in the RNA polymerase III basal factor TFIIIA, has since been found in a variety of different RNA polymerase II factors, including more than 600 human transcription factors. Each “finger” consists of a 30-residue sequence with conserved pairs of cysteines and histidines that bind a single zinc ion. The tip of the finger sticks into the DNA major groove, where it contacts three bases. Most zinc finger proteins contain multiple fingers, allowing longer sequences to be recognized to increase specificity. A related structure is present in the steroid hormone receptor family, although in this case, four cysteine residues coordinate the zinc ion and the finger is composed of two helices rather than one. Steroid hormone receptors also contain a dimerization domain, allowing recognition of sequences with dyad symmetry (Fig. 15-17C).

Leucine Zipper Proteins

Leucine zipper domains are made up of two motifs: a basic region that recognizes a specific DNA sequence and a series of repeated leucine residues (leucine zipper) that mediate dimerization. These motifs form a continuous a-helix that can dimerize through formation of a coiled-coil structure involving specific contacts between hydrophobic leucine zipper domains (Fig. 15-17D; also see Fig. 3-10). CAAT/enhancer-binding protein, the factor that recognizes the CCAAT sequence, was the first member of this family to be discovered. Dimers of leucine zipper proteins recognize short, inverted, repeat sequences. The zipper family comprises many members, some of which can cross-dimerize and recognize asymmetrical sequences. Another family of factors comprises the helix-loop-helix proteins, which have the same type of basic region but differ in that they have two helical dimerization domains separated by a loop region.

Factor Interactions

An important aspect of transcription factor function is the ability to associate with other factors. Such associations can expand the repertoire of DNA sequences that can be specifically recognized. In the case of the leucine zipper proteins, formation of a heterodimer leads to recognition of a site that is different from either of the sites recognized by the two homodimers (Fig. 15-18). Thus, a diverse set of binding sites can be recognized by using a relatively small set of interacting factors. Such interactions need not be limited to related proteins, and small interactions surfaces involving only a few specific contacts often suffice.

Figure 15-18 transcription factor dimers that recognize novel targets. a–b, The homodimers of transcription factors 1 and 2 recognize different sites containing inverted four-base recognition elements. C, Heterodimers formed by factors 1 and 2 recognize a novel class of asymmetrical sites consisting of two different half-sites.

Transcription Factors as Modular Proteins

In addition to interaction with specific DNA sequences, transcription factors may also interact with regulatory molecules and/or the basal transcription machinery. Functional domains in transcription factors have been mapped by testing various domains in vivo (Fig. 15-19). Such chimeric factors will stimulate transcription as long as two functions are maintained: specific DNA binding and transcription activation. The surprising result of this type of analysis is that many transcription factors are modular proteins with discrete functional domains that can be exchanged without impairing activity. For example, exchanging the DNA-binding domains of the glucocorticoid and estrogen receptors creates a hybrid factor that recognizes estrogen-responsive promoters but activates transcription in response to glucocorticoid hormone.

Figure 15-19 transcription factors consist of discrete, functional modules. A, Domain characterization. Although the entire factor is required for activation, the bottom domain is sufficient for DNA binding. B, Domain swapping. The activation domain of one factor (activating gene 1) can be fused to the DNA-binding domain of a heterologous factor (activating gene 2). The resulting chimeric factor will activate only genes containing the recognition site for the DNA-binding domain (gene 2).

Transcriptional Activation

Binding of a transcription factor to DNA per se does not activate transcription (Fig. 15-19). A separate domain provides this function by interacting directly or indirectly with the basal transcription machinery to elevate the rate of transcription. The best-characterized activation domain is an acidic region derived from the herpesvirus VP16 protein. Acidic activation domains are generally disordered segments of polypeptide consisting of multiple acidic residues dispersed among a few key hydrophobic residues. Such domains activate transcription when experimentally grafted to a wide variety of different DNA-binding domains in a number of different cell types. Other types of activator domains have been characterized as being proline-rich or glutamine-rich.

The diverse activation domains use several mechanisms to promote transcription (Fig. 15-20). The most direct mechanism is recruitment of the basal transcription machinery. Recall that RNA polymerase II requires a number of additional factors for specific transcription. TFIID binds the TATA box and recruits the polymerase in a complex with the mediator. Interactions between activation domains and mediator or TFIID components in these complexes stabilize the preinitiation complex and produce higher rates of RNA polymerase II initiation. The chromatin immunoprecipitation technique (Fig. 15-16D) has been used to demonstrate recruitment of the transcription machinery to specific genes.

Figure 15-20 transcription activation mechanisms. A, Contact between transcription activators and mediator or TAF subunits, or both, leads to stable preinitiation complexes and elevated transcription. In some cases, a coactivator acts as an intermediary in this process. B, Histone acetylases in a coactivator loosen chromatin in the vicinity of the promoter, allowing assembly of preinitiation complexes. C, Recruitment of histone deacetylases in a corepressor represses transcription by compacting the chromatin in the vicinity of the promoter.

Transcriptional Repressors

As in prokaryotes, some eukaryotic transcription factors repress transcription. Unlike the lac repressor, however, the eukaryotic repressors generally do not act by blocking binding of RNA polymerase. Some eukaryotic repressors act by competing with activators for the same DNA sequence. Often, these repressors are related to the activator they block but simply lack the activation domain. Another type of eukaryotic repressor binds near the activator and interacts with the activator to mask its activation domain. Some repressors bind to specific sites on DNA and interact with coactivators in a manner that blocks their function. Finally, some repressors bind corepressors that can alter chromatin structure in a way that the transcription machinery is denied access.

Chromatin and Transcription

DNA in eukaryotic cells associates with an equal mass of protein to form chromatin (see Chapter 13). Packaging DNA in arrays of nucleosomes compacts the DNA, and the most obvious influence of chromatin on transcription is the ability of nucleosomes to restrict access of transcription proteins to the DNA template. Thus, if histone synthesis is artificially shut off, there is an increase in the basal expression of many genes. Additional evidence of the repressive nature of chromatin is seen in the resistance to nuclease digestion of unexpressed genes and the localization of unexpressed genes in highly condensed heterochromatin.

Gene activation often involves disruption or displacement of nucleosomes located on specific genes. Before the discussion of specific mechanisms, it is useful to consider some aspects of nucleosome structure. The nucleosome consists of DNA wrapped in a left-handed helix around an octamer of histone subunits (see Fig. 13-1). The histone core makes numerous contacts with the DNA minor groove and phosphate backbone, leading to tight binding that is not sequence specific. This aspect of the nucleosome allows for a dynamic association with DNA because binding to one position along the DNA strand is as energetically favorable as another. The N-terminal histone “tails” are highly conserved and play multiple roles in chromatin structure and gene regulation. Histone tails are important sites of modifications that regulate chromatin structure and transcription. Both the nonspecific nature of interactions between histones and DNA and the ability to modify the histone tails are exploited to regulate the ability of nucleosomes to block access of the transcription machinery to the DNA template.

Nucleosome remodeling complexes use the energy of ATP hydrolysis to alter the location of nucleosomes on the DNA template. These multiprotein complexes destabilize interactions between histones and DNA and “remodel” the chromatin to allow increased access to the template. In addition to facilitating transcription initiation through coactivator function, some remodeling complexes are required for transcription elongation and termination.

SWI/SNF complexes (see Chapter 13) are recruited to a specific subset of genes through interactions with transcription activators. The resulting remodeling of nucleosomes in the vicinity of the promoter is required for stable preinitiation complex formation at SWI/SNF-regulated genes. Other remodeling complexes are thought to regulate distinct sets of genes in a similar manner. SWI/SNF can also repress transcription at some promoters, presumably by repositioning nucleosomes to restrict access to the promoter.

Gene activation by nucleosome disruption can also be counteracted by factors that stabilize chromatin. In one example, broad regions of chromatin are silenced by recruitment of histone deacetylases that maintain heterochromatin domains (see Fig. 13-9).

Histone Modification and Chromatin Accessibility

The pattern of modification of the histone N-terminal tails forms the basis of a “histone code” that is read by the gene expression machinery. Modification of the histone tails is carried out by enzymes that are specific both for a particular modifying group and for specific residues within the tail of particular histones. For example, the histone methyl transferase SET1 is specific for lysine 4 in the histone H3 tail. The modifying enzymes are generally part of large complexes (Table 15-3) that are recruited to chromatin through interactions with gene regulatory proteins and thus, like the mediator, are considered coactivators (or corepressors).

Table 15-3 NUCLEOSOME-MODIFYING COMPLEXES

Proteins containing bromodomains or chromodomains interact with acetylated or methylated tails, respectively. Many of the protein complexes involved in gene regulation contain one or more of these domains. For example, the SAGA histone acetyltransferase complex contains a bromodomain that anchors the complex to chromatin, facilitating further modification of regions that are already acetylated. A subunit of TFIID also contains a bromodomain that can facilitate the binding of TFIID to acetylated nucleosomes associated with active chromatin. Similarly, a number of histone methyltransferases contain chromodomains and are therefore targeted to their substrates by preexisting histone methylation.

Combinatorial Control

The complexity of eukaryotic regulatory systems allows for the integration of multiple regulatory signals at individual genes. Such combinatorial control is seen in a limited way in prokaryotes. For example, the E. coli lac genes are regulated by both lactose and glucose. Only when glucose is absent and lactose is present do the activator (CAP) and repressor (lac repressor) function to maximize lac expression. Regulation of transcription initiation in eukaryotes is based on similar principles with DNA-binding activators and repressors controlling individual genes. For each eukaryotic gene, however, there are often binding sites for many more factors. Integration of the individual binding events can take place in several ways. First, there is a degree of synergism to the binding of multiple factors. The enhanceosome is an example in which binding of proteins that bend the DNA can lead to more efficient binding of additional proteins. The key characteristic of the resulting complex is that the activation of transcription provided by the enhanceosome is greater that the stimu-latory effect predicted from the sum of individual transcription factors. Synergy can also result from multiple interactions between activators bound to DNA at different upstream sites or different enhancers and targets in coactivators such as the mediator or nucleosome remodeling complexes. Many of the same mechanisms also can occur with repressors.

Combinatorial control also can result from the interplay between factors that alter chromatin structure. For example, modification of histone tails by a histone acetyltransferase tethered to a DNA-bound transcription factor can result in the loosening of chromatin at a particular site and creation of binding sites for additional factors. Subsequent binding of a nucleosome remodeling complex can render sequences accessible to the transcriptional machinery.

Modulation of Transcription Factor Activity

Regulation of transcription initiation is of fundamental importance in controlling gene expression. In many cases, the availability of factors that bind to specific sites in promoters is the switch that turns a gene on. Various strategies to control the binding of specific factors have been discovered (Fig. 15-21). One of the most straightforward is de novo synthesis of the specific factor (Fig. 15-21A). This requires an additional level of transcription regulation and translation of the mRNA that encodes the specific factor. All of these steps take some time; therefore, this regulatory scheme is not used in situations in which rapid responses are required. Instead, it is used more commonly in regulating developmental pathways.

Figure 15-21 regulation of transcription factor activity. Many strategies have evolved to regulate transcription factors in response to specific signals. A, The availability of a factor may be controlled by expressing it, de novo, only when it is needed. B, Factors may be synthesized in an inactive state and depend on a small molecule (ligand) for activity. C, Transcription factors that are synthesized in an inactive state can be activated by postsynthetic modification, such as phosphorylation. D, Some factors require an appropriate partner for activity. E, Constitutively active factors can be held in check by associating with inhibitory subunits. F, Active factors can be sequestered in the cytoplasm by blocking their transport to the nucleus.

Several mechanisms are used for rapid regulation of the activity of existing transcription factors. One mechanism involves the formation of an active factor from two inactive subunits (Fig. 15-21D). This association can be regulated through synthesis or by modification of preexisting subunits, leading to their association. Binding of small-molecule ligands is another means of controlling transcription factor activity (Fig. 15-21B). In this case, the binding of the ligand induces a conformational change that leads to DNA binding and transcription activation. Interaction of transcription factors with inhibitory subunits is also used to regulate factor activity (Fig. 15-21E). The DNA binding or activation potential is held in check until the appropriate signal leads to dissociation of the inhibitory factor. Covalent modification—for example, by phosphorylation—is also used to convert inactive transcription factors to a functional form (Fig. 15-21C). Finally, the ability of transcription factors to bind DNA may be regulated by restricting their localization to the cytoplasm (Fig. 15-21F). These regulatory schemes are not mutually exclusive, and many regulatory pathways (see the examples that follow) employ several different levels of regulation.

Transcription Factors and Signal Transduction

One of the hallmarks of eukaryotic gene regulation is the ability of cells to respond to a wide range of external signals. Cells detect the presence of hormones, growth factors, cytokines, cell surface contacts, and many other signals. This information is then transmitted to the nucleus, where appropriate changes in expression of specific genes are executed. Transcription factors represent the final step in these signal transduction pathways; the following sections discuss several specific examples. Chapter 27 covers several other signaling pathways that regulate gene expression (see Fig. 27-4 for the three types of signaling pathways to the nucleus).

Steroid Hormone Receptors

Regulation of gene expression by steroid hormone receptors involves both ligand-binding and inhibitory subunits. This family of nuclear receptors includes transcription factors with a common sequence organization consisting of a specific DNA-binding domain, a ligand-binding domain that regulates DNA binding, and one or more transcription activation domains. The ligands that regulate these factors are small, lipid-soluble hormone molecules that diffuse through cell membranes and bind directly to the transcription factor (Fig. 15-22A). The steroid hormones, retinoids, thyroid hormone, and vitamin D bind to distinct members of the nuclear receptor family, enabling them to recognize sequences in the promoters of a range of target genes. The specific sites of action in promoter DNA, termed hormone response elements, are related to either AGAACA or AGGTCA (Fig. 15-17C). Specificity of the response is generated by the spacing and relative orientation of the binding sites. Steroid receptors usually bind to inverted repeats separated by three nucleotides, whereas some other related receptors prefer direct repeats of similar sites. The nuclear receptors can bind as homodimers, although recent evidence suggests that heterodimers actually prevail in vivo. In addition to heterodimerizing with other members of the nuclear receptor family, interactions with other types of transcription factors could serve to link the steroid response to other pathways that signal through cell surface receptors.

Figure 15-22 transcription factors as targets of signal transduction pathways. External signals are transmitted by a variety of pathways that eventually impinge on transcription factors. A, Steroid hormones diffuse through the cell membrane and bind to the hormone receptor in the cytoplasm (estrogen) or, more commonly, the nucleus. Hormone binding induces a conformational change that renders the receptor competent to activate transcription. B, Ligands bound to the extracellular surface of seven-helix receptors initiate a pathway that leads to the activation of protein kinase A that moves to the nucleus, where it phosphorylates transcription factor CREB. (C, catalytic subunit of PKA. R, regulatory subunit of PKA that is dissociated from C by binding cAMP [R is shown smaller than actual size].) C, In a third strategy, constitutively active transcription factors are kept sequestered in the cytoplasm until a signaling pathway is activated. In this example, the transcription factor NF-kB is bound to an inhibitor called I-kB. Activation of the pathway leads to phosphorylation of I-kB, which targets the inhibitory subunit for destruction by the proteasome. The free NF-kB is transported to the nucleus, where it activates the transcription of target genes.

Inactive steroid hormone receptors are blocked from interacting with DNA by heat shock protein 90 (Hsp90; Fig. 15-22A). This protein is a molecular chaperone that keeps the receptor ligand-binding domain in a conformation ready to bind the ligand but unable to enter the nucleus. Hormone binding to the receptor dissociates Hsp90 and frees the receptor’s DNA-binding domain. The free ligand–bound receptor moves from the cytoplasm to the nucleus, where it binds its DNA target and activates transcription.

Cyclic Adenosine Monophosphate (cAMP) Signaling

Changes in gene expression often develop in response to the binding of signal molecules to cell surface receptors. Binding of ligand induces a structural change in the receptor that sets off a chain of events that leads to changes in transcription. Protein phosphorylation plays an important role in this process.

One of the best-understood examples of transcriptional regulation through cell surface receptor signaling is the adenyl cyclase system. The binding of ligand to some seven-helix receptors results in an increase in synthesis of cAMP, which, in turn, activates protein kinase A (see Fig. 27-3). The promoters of cAMP-regulated genes contain a conserved DNA sequence element, called a cAMP response element, that mediates the transcriptional response to cAMP. A transcription factor, termed cAMP response element–binding (CREB) protein, binds this sequence specifically. CREB protein is a member of the leucine zipper family and binds the DNA as a dimer. The DNA-binding domain of CREB protein can be exchanged with other DNA-binding domains without loss of cAMP responsiveness. This indicates that cAMP does not work by altering the DNA binding of CREB protein; rather, it suggests that cAMP alters the transcription activation function. Recent experiments have identified a site in the activation domain of CREB protein that is phosphorylated by protein kinase A. Mutation of serine 133 to alanine results in a CREB protein that cannot be phosphorylated and cannot activate transcription of target genes. Phosphorylation of serine 133 leads to a conformational change in CREB protein that allows it to interact with a protein adaptor that recruits the transcription machinery leading to transcription of target genes. Thus, the signal generated by binding of a ligand to a cell surface receptor is transduced to a DNA-binding factor that activates transcription of genes containing the appropriate regulatory elements.

NF-kB Signaling

NF-kB proteins are a family of related transcription factors present in many cell types. These factors control a diverse set of cellular processes, including immune and inflammatory responses, development, cell growth, and apoptosis. The activity of NF-kB is normally tightly controlled as persistently active NF-kB is associated with cancer, arthritis, asthma, and heart disease. In most cells, NF-kB is held in an inactive form in the cytoplasm through interaction with an inhibitor called I-kB (see Figs. 14-18 and 15-22C). When B lymphocytes (see Fig. 28-9) are stimulated to produce antibody, NF-kB binds to an enhancer in the immunoglobulin k-chain gene and activates transcription. The stimulatory signal leading to NF-kB activity is transmitted through a protein kinase cascade that eventually phosphorylates I-kB, signaling its destruction by proteolysis. This event unmasks the NF-kB nuclear localization signal, leading to its transport to the nucleus, where it activates transcription of immunoglobulin genes.

Transcription Factors in Development

In the preceding sections, the discussion centered on how external signals can lead to changes in gene expression in the nucleus, which, in turn, lead to changes in cell function. A critical step in this genetic program is the regulation of one transcription factor by another. Such cascades of transcription factor activity are fundamental to gene regulation in development.

Early cell divisions in multicellular organisms create different types of daughter cells that express distinct sets of genes. In this case, two types of information govern the expression of a gene. First, the environ-ment of the cell sends signals that are transduced to the nucleus and change the pattern of gene expression. How the nucleus interprets the transduced signals depends on the set of transcription factors that preexist within it. Thus, in addition to external signals, the history of the cell dictates which genes will respond to which signals.

The exact program of transcription factor interaction during development is extremely complicated and is certainly beyond the scope of this chapter. The underlying principles of these pathways are worth considering, however. One important observation is that developmentally regulated transcription factors are often autoregulatory. For factors that activate their own expression, this form of regulation acts as a switch that leads to continued expression after the initial stimulus is gone. Another important property of developmentally regulated transcription factors is that they are, in turn, regulated by several different factors. This allows complicated combinatorial signals to dictate expression. For example, some transcription factors activate certain promoters while repressing others. The basis of this contradictory property is thought to be the ability of transcription factors to cooperate with each other when bound at the same promoter. This cooperation can be either positive or negative. This allows the expression of a target gene to be regulated both by external signals (e.g., proximity of an adjacent cell that expresses a signaling molecule) and by the preexistence of a given factor in the cell. In this way, only cells of a certain lineage that are located in a certain area of an embryonic segment are able to express the gene. As new transcription factors involved in development are discovered, the challenge will be to decipher the complicated combinatorial interactions among them.

Transcription Factors and Human Disease

Advances in genomics and human genetics have demonstrated that mutations within specific genes are responsible for the pathogenesis or clinical features of particular human diseases. Multicellular organisms devote a significant fraction of their genome to encoding the transcription apparatus and attendant regulatory factors. Therefore, it is not surprising that mutations in some of the thousands of genes involved in transcription result in clinical phenotypes. The following examples indicate that mutations in either gene-specific or general transcription factors can contribute to disease.

Androgen Receptor

A nuclear hormone receptor, androgen receptor, binds to testosterone and regulates expression of genes involved in the development of male secondary sexual characteristics. Like other transcription factors, the androgen receptor has DNA-binding and transcription activation domains. In addition, the androgen receptor has a ligand-binding domain that binds testosterone and regulates the DNA-binding properties of the factor. Because the androgen receptor gene is located on the X chromosome, recessive mutations of the gene have a phenotype in males (which have only one copy of the X).

Mutations that alter different parts of the androgen receptor cause different clinical phenotypes. The most severe mutations cause androgen insensitivity syndrome, a condition in which individuals with a 46, XY chromosome constitutionally develop secondary female sexual characteristics. In this syndrome, androgens are synthesized, but the receptor fails to respond. Single missense mutations in the ligand-binding domain can weaken or eliminate ligand binding. Alternatively, ligand binding may be normal, but the mutation may weaken or eliminate DNA binding. Some androgen insensitivity syndrome mutations are associated with male breast cancer.

Another type of mutation in the androgen receptor causes a neuromuscular disease called spinal and bulbar muscular atrophy (Kennedy’s syndrome). This X-linked disease involves wasting of the proximal limb muscles as well as changes in facial muscles. The molecular basis of the disease is an expansion of a series of repeated CAG (glutamine) codons in the amino-terminal transcription activation domain. Normally, this region encodes 11 to 31 consecutive glutamine residues in different individuals. The number of repeats in patients with Kennedy’s syndrome ranges from 40 to 52. The mechanism by which the expanded polyglutamine do-main results in motor neuron damage has not been determined.

TFIIH and Human Disease

As was discussed in a previous section, the general RNA polymerase II transcription factor TFIIH is a multisubunit factor that contains both RNA polymerase II CTD kinase and DNA helicase activities. In addition to its role as a transcription factor, TFIIH plays a role in DNA repair and might serve to direct DNA repair to transcriptionally active regions in the genome.

Mutations in TFIIH subunits are associated with a set of rare human disorders (xeroderma pigmentosum, Cockayne’s syndrome, and trichothiodystrophy), each linked to defects in nucleotide excision repair of DNA damaged by ultraviolet light or chemical mutagens (see Box 43-1). Mutations in these diseases map to the genes encoding two different TFIIH helicase activities. Presumably, the alterations in these activities cause changes in DNA unwinding, either in the transcription initiation reaction or in the process of nucleotide excision repair. Some mutations are more selective for the DNA repair function, whereas other TFIIH mutations cause little or no DNA repair phenotype but rather seem to affect the TFIIH transcription function. The latter mutations cause wide-ranging defects, as might be expected for a defect in a general transcription factor.