CHAPTER 16 Eukaryotic RNA Processing*
In all organisms, the genetic information is encoded in the sequence of the DNA. However, to be used, this information must be copied, or transcribed, into the related polymer, RNA. Eukaryotes synthesize many different types of RNA, but none of these RNAs is simply transcribed as a finished product. The mature, functional forms of all eukaryotic RNA species are generated by posttranscriptional processing, and these processing reactions are the major topic of this chapter.
The major RNAs can be assigned to three major classes: (1) The cytoplasmic messenger RNAs (mRNAs) and their nuclear precursors (pre-mRNAs) carry the information that is used to specify the sequence, and therefore the structure, of all proteins in the cell. (2) Other RNAs do not encode protein but function directly, playing major roles in several metabolic pathways, including protein synthesis. These include the ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs), which are the key components of the protein synthesis machinery; the small nuclear RNAs (snRNAs), which form the core of the pre-mRNA splicing system; and the small nucleolar RNAs (snoRNAs), which are important factors in ribosome biogenesis. These RNAs are generally much longer-lived than are the mRNAs and therefore often are referred to as stable or nonprotein coding RNAs (ncRNAs). (3) The third and most recently identified class of RNA comprises several structurally related groups of very small (21 to 25 nucleotides) RNA species that play important roles in regulating gene expression. Base pairing between endogenous micro-RNAs (miRNAs) and target mRNAs in the cytoplasm represses their translation into protein. The packaging of DNA into a nontranscribed form termed heterochromatin (see Fig. 13-9) is promoted by a class of nuclear, small heterochromatic RNAs (shRNAs). Finally, the introduction of small double-stranded RNAs into many cell types and organisms results in cleavage of the target mRNA and consequent silencing of gene expression. This phenomenon is described as RNA interference (RNAi), and the RNAs are referred to as small interfering RNAs (siRNAs).
Synthesis of mRNAs
An overview of mRNA synthesis and degradation is shown in Figure 16-1.
Figure 16-1 Synthesis and degradation of eukaryotic mRNAs. Nascent mRNA transcripts are transcribed by RNA polymerase II. Formation of the 5′ cap structure and cleavage and polyadenylation of the 3′ end of the mRNA both occur cotranscriptionally and involve factors that are recruited by the C-terminal domain (CTD) of the transcribing polymerase (see Fig. 15-4). The termination of transcription requires both the recognition of the site of polyadenylation and the activity of the 5′-exonuclease Rat1, which degrades the nascent RNA transcripts. Rat1 binds to the polymerase CTD via Rtt103. Pre-mRNA splicing can either be cotranscriptional or occur shortly after transcript release, and recruitment of splicing factors is not strongly dependent on the CTD. In human cells, the spliceosome deposits the exon-junction complex (EJC) around 24 nucleotides upstream of the site of splicing. Several steps in nuclear mRNA maturation also are subject to surveillance. In yeast, nuclear pre-mRNAs can be either 3′ degraded by the nuclear exosome complex or decapped and 5′ degraded by the exonuclease Rat1. Nuclear decapping requires the Lsm2–8 complex and is probably performed by the Dcp1/2 decapping complex. Once in the cytoplasm, the mRNA is translated into proteins and undergoes degradation. Several different mRNA degradation pathways have been identified. A, Nonsense-mediated decay (NMD). If the EJCs all lie within or very close to the ORF, they will be displaced by the translating ribosomes. However, if an EJC lies beyond the end of the ORF, it will remain on the translated mRNA. This is taken as evidence that translation has terminated prematurely and triggers the NMD pathway. Recognition of the EJC requires the Upf1/2/3 surveillance complex, which also interacts with the ribosomes as they terminate translation. In yeast, NMD triggers both rapid decapping and 5′ degradation, without prior deadenylation, and 3′ degradation by the exosome. B, General mRNA turnover. During translation, most mRNAs undergo progressive poly(A) tail shortening. Loss of the poly(A) tail leads to rapid degradation. As in the nucleus, cytoplasmic mRNAs can be degraded from either the 5′ or the 3′ end. 5′ degradation occurs largely in a specialized cytoplasmic region termed the P body in yeast or cytoplasmic foci in human cells. Here, the mRNAs are decapped by the Dcp1/2 heterodimer and then degraded by the cytoplasmic 5′-exonuclease Xrn1. Both activities are strongly stimulated by the cytoplasmic Lsm1–7 complex. Alternatively, deadenylated mRNAs can be 3′ degraded by the cytoplasmic exosome. C, ARE-mediated decay. In this pathway, specific A+U rich elements (AREs) are recognized by ARE-binding proteins (ARE-BP) in the nucleus. These are transported to the cytoplasm in association with the mRNA and recruit the cytoplasmic exosome to rapidly degrade the RNA. D, Nonstop decay. If the mRNA lacks a translation termination codon, the first translating ribosome will stall and be trapped at the 3′ end of the RNA. The Ski7 protein, which is associated with the cytoplasmic exosome complex, is believed to release the stalled ribosome and target the RNA for 3′ degradation by the exosome. Note that this legend provides detail beyond the text.
mRNA Capping and Polyadenylation
The mRNA cap is an unusual structure. It consists of an inverted 7-methylguanosine residue, which is joined onto the body of the mRNA by a 5′-triphosphate-5′ linkage (Fig. 16-2). Cap addition involves three enzymatic activities: A 5′ RNA triphosphatase cleaves the 5′ triphosphate on the nascent transcript to a diphosphate; RNA guanylyltransferase forms a covalent enzyme–GMP complex and then caps the RNA by transferring this to the diphosphate; and RNA (guanine-7) methyltransferase covalently alters the guanosine base by methylation, generating m7G. In addition, the first encoded nucleotides are frequently modified by methylation of the 2′ hydroxyl position on the ribose group, but the functional significance of these internal modifications is currently unclear.
During 3′ processing, the nascent pre-mRNA is cleaved by an endonuclease, and a tail of adenosine residues is added by poly(A) polymerase. Around 200 to 250 A residues are added to mRNAs in human cells, and around 70 to 90 are added in yeast. Cleavage and polyadenylation are performed by a large complex containing approximately 20 proteins that recognizes sequences in the mRNA, of which the best defined is a highly conserved AAUAAA motif located upstream of the site of polyadenylation (Fig. 16-3).
Links between mRNA Processing and Transcription
The processes of cap addition and 3′ cleavage and polyadenylation are both linked to transcription of the mRNA by RNA polymerase II and occur cotranscriptionally on the nascent RNA (Fig. 16-1). The C-terminal domain (CTD) of the largest subunit of RNA polymerase II (RNA pol II) consists of many copies of a seven-amino-acid repeat (YSPTSPS), which undergo reversible modification by phosphorylation (see Fig. 15-4). A pronounced change in the CTD phosphorylation pattern coincides with the release of the polymerase from initiation mode into processive elongation mode. Immediately following transcription initiation, the repeats are largely phosphorylated on the serine residue at position 5. This modification is lost, while serine 2 phosphorylation increases, as the polymerase moves along the transcript. Capping of the 5′ end of the mRNA occurs by the time the transcript is approximately 25 to 30 nucleotides long, and the capping enzyme interacts with the serine 5 phosphorylated CTD. This and other interactions with the polymerase result in strong allosteric activation of capping activity. In contrast, the cleavage and polyadenylation factors involved in 3′ end processing are recruited by interaction with the CTD phosphorylated at serine 2.
Pre-mRNA Splicing
Signals for Splicing
Three conserved sequences within introns play key roles in their accurate recognition by the splicing machinery (Fig. 16-4). These lie immediately adjacent to the 5′ splice site and the 3′ splice site and surrounding an internal region that will form the intron branch point during the splicing reaction. The U1 and U6 snRNAs have sequences that are complementary to the 5′ splice site, while U2 is complementary to the branch point region.
No sequences in the exons are strictly required for splicing, but there are important stimulatory elements termed exonic splicing enhancers (ESEs), which generally bind members of the SR-protein family. The ESEs have two major functions: They stimulate the use of the flanking 5′ and 3′ splice sites, promoting exon definition, and they prevent the exon in which they are located from being included in an intron. This latter function is particularly important in ensuring that all introns are spliced out without the splicing machinery skipping from the 5′ end of one intron to the 3′ end of a downstream intron.
The Pre-mRNA Splicing Reaction
The splicing reaction proceeds in two steps (Fig. 16-4). In the first, the 5′–3′ phosphate linkage that joins the 5′ exon to the first nucleotide of the intron—at the 5′ splice site—is attacked and broken. This reaction leaves the 5′ end of the intron attached to the adenosine residue via an unusual 5′–2′ phosphate linkage. Since this adenosine remains attached to the flanking nucleotides by conventional 5′ and 3′ phosphodiester bonds, this creates a circular molecule with a tail that includes the 3′ exon. This structure is termed the intron lariat, and the adenosine to which the 5′ end of the intron is attached is termed the branch point, because it has a branched structure. In the second step of splicing, the free 3′ hydroxyl on the 5′ exon is used to attack and break the linkage between the last nucleotide of the intron and the 3′ exon—at the 3′ splice site. This leaves the 5′ and 3′ exons joined by a conventional 5′–3′ linkage and releases the intron as a lariat. This is linearized by the debranching enzyme and is probably rapidly degraded from both ends by exonucleases.
The initial steps in splicing are the recognition of the 5′ splice site by the U1 snRNA and the binding of U2 snRNA to the branch-point region, assisted by SR-proteins (Fig. 16-5). Base pairing between U2 and the pre-mRNA leaves a single adenosine bulged out of a helix and available for interaction with the 5′ splice site. The U4 and U6 snRNAs then join the spliceosome as a base-paired duplex, within a large complex that also contains the U5 snRNA. The U4 and U6 base pairing is opened, and the liberated U6 sequences displace U1 at the 5′ splice site. They also bind to U2—bringing the 5′ splice site and branch point into close proximity. At this point, the first enzymatic step of splicing occurs. This reaction is believed to be directly catalyzed by the intricate structure of the snRNA/pre-mRNA interactions rather than by the protein components of the spliceosome. The 5′ splice site is attacked and broken by the ribose 2′ hydroxyl group of the adenosine residue that is bulged out of the U2-intron duplex. The U5 snRNA and its associated proteins are responsible for holding onto the now free 5′ exon and correctly aligning it with the 3′ exon for the second catalytic step of splicing.
AT-AC Introns
The large majority of human mRNA splice sites have a GU dinucleotide at the 5′ splice site and AG at the 3′ splice site (Fig. 16-4). However, a minor group of introns contain different consensus splicing signals and are termed AT-AC (pronounced “attack”) introns because of the identities of the nucleotides located at the 5′ and 3′ splice sites. The splicing of the AT-AC introns involves a distinct set of snRNAs—U11, U12, U4ATAC, and U6ATAC—which replace U1, U2, U4, and U6, respectively. Only U5 is common to both spliceosomes. However, the underlying splicing mechanism is believed to be the same for both classes of intron.
Alternative Splicing
A surprising finding from the human genomic sequencing project was the relatively low number of predicted protein-coding genes, currently estimated at around 30,000. This result has caused increased interest in the phenomenon of alternative splicing, which allows the production of more than one mRNA, and therefore more than one protein product, from a single gene. Several general forms of alternative splicing are commonly found. Exons can be excluded from the mRNAs, or introns can be included. Some genes have arrays of multiple alternative exons, only one of which is included in each mRNA. In addition, the use of alternative splice sites can generate longer or shorter forms of individual exons (Fig. 16-6).
Figure 16-6 alternative splicing can generate multiple different proteins from a single gene. Here are some of the possible mRNA and protein products of a gene whose pre-mRNA is subject to alternative splicing. Left, Examples show the other effects of skipping one or more internal exons, which produces a set of related proteins with different combinations of “modules.” Right, Examples show the effects of alternative splice sites. In the case shown, the use of alternative 3′ splice sites redefines the 5′ end of the downstream exon. This can lead to the inclusion of additional amino acids in the protein product. Use of an alternative splice site can also cause the exon to be read in a different reading frame (green asterisk), changing the amino acid sequence. If the alternative reading frame contains a translation stop codon (red asterisk), a truncated protein will be produced, and the mRNA will generally be targeted for rapid degradation by the NMD pathway (Fig. 16-1).