Eukaryotic RNA Processing

Published on 28/02/2015 by admin

Filed under Basic Science

Last modified 28/02/2015

Print this page

rate 1 star rate 2 star rate 3 star rate 4 star rate 5 star
Your rating: none, Average: 0 (0 votes)

This article have been viewed 3019 times

CHAPTER 16 Eukaryotic RNA Processing*

In all organisms, the genetic information is encoded in the sequence of the DNA. However, to be used, this information must be copied, or transcribed, into the related polymer, RNA. Eukaryotes synthesize many different types of RNA, but none of these RNAs is simply transcribed as a finished product. The mature, functional forms of all eukaryotic RNA species are generated by posttranscriptional processing, and these processing reactions are the major topic of this chapter.

The major RNAs can be assigned to three major classes: (1) The cytoplasmic messenger RNAs (mRNAs) and their nuclear precursors (pre-mRNAs) carry the information that is used to specify the sequence, and therefore the structure, of all proteins in the cell. (2) Other RNAs do not encode protein but function directly, playing major roles in several metabolic pathways, including protein synthesis. These include the ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs), which are the key components of the protein synthesis machinery; the small nuclear RNAs (snRNAs), which form the core of the pre-mRNA splicing system; and the small nucleolar RNAs (snoRNAs), which are important factors in ribosome biogenesis. These RNAs are generally much longer-lived than are the mRNAs and therefore often are referred to as stable or nonprotein coding RNAs (ncRNAs). (3) The third and most recently identified class of RNA comprises several structurally related groups of very small (21 to 25 nucleotides) RNA species that play important roles in regulating gene expression. Base pairing between endogenous micro-RNAs (miRNAs) and target mRNAs in the cytoplasm represses their translation into protein. The packaging of DNA into a nontranscribed form termed heterochromatin (see Fig. 13-9) is promoted by a class of nuclear, small heterochromatic RNAs (shRNAs). Finally, the introduction of small double-stranded RNAs into many cell types and organisms results in cleavage of the target mRNA and consequent silencing of gene expression. This phenomenon is described as RNA interference (RNAi), and the RNAs are referred to as small interfering RNAs (siRNAs).

Synthesis of mRNAs

An overview of mRNA synthesis and degradation is shown in Figure 16-1.

image

Figure 16-1 Synthesis and degradation of eukaryotic mRNAs. Nascent mRNA transcripts are transcribed by RNA polymerase II. Formation of the 5′ cap structure and cleavage and polyadenylation of the 3′ end of the mRNA both occur cotranscriptionally and involve factors that are recruited by the C-terminal domain (CTD) of the transcribing polymerase (see Fig. 15-4). The termination of transcription requires both the recognition of the site of polyadenylation and the activity of the 5′-exonuclease Rat1, which degrades the nascent RNA transcripts. Rat1 binds to the polymerase CTD via Rtt103. Pre-mRNA splicing can either be cotranscriptional or occur shortly after transcript release, and recruitment of splicing factors is not strongly dependent on the CTD. In human cells, the spliceosome deposits the exon-junction complex (EJC) around 24 nucleotides upstream of the site of splicing. Several steps in nuclear mRNA maturation also are subject to surveillance. In yeast, nuclear pre-mRNAs can be either 3′ degraded by the nuclear exosome complex or decapped and 5′ degraded by the exonuclease Rat1. Nuclear decapping requires the Lsm2–8 complex and is probably performed by the Dcp1/2 decapping complex. Once in the cytoplasm, the mRNA is translated into proteins and undergoes degradation. Several different mRNA degradation pathways have been identified. A, Nonsense-mediated decay (NMD). If the EJCs all lie within or very close to the ORF, they will be displaced by the translating ribosomes. However, if an EJC lies beyond the end of the ORF, it will remain on the translated mRNA. This is taken as evidence that translation has terminated prematurely and triggers the NMD pathway. Recognition of the EJC requires the Upf1/2/3 surveillance complex, which also interacts with the ribosomes as they terminate translation. In yeast, NMD triggers both rapid decapping and 5′ degradation, without prior deadenylation, and 3′ degradation by the exosome. B, General mRNA turnover. During translation, most mRNAs undergo progressive poly(A) tail shortening. Loss of the poly(A) tail leads to rapid degradation. As in the nucleus, cytoplasmic mRNAs can be degraded from either the 5′ or the 3′ end. 5′ degradation occurs largely in a specialized cytoplasmic region termed the P body in yeast or cytoplasmic foci in human cells. Here, the mRNAs are decapped by the Dcp1/2 heterodimer and then degraded by the cytoplasmic 5′-exonuclease Xrn1. Both activities are strongly stimulated by the cytoplasmic Lsm1–7 complex. Alternatively, deadenylated mRNAs can be 3′ degraded by the cytoplasmic exosome. C, ARE-mediated decay. In this pathway, specific A+U rich elements (AREs) are recognized by ARE-binding proteins (ARE-BP) in the nucleus. These are transported to the cytoplasm in association with the mRNA and recruit the cytoplasmic exosome to rapidly degrade the RNA. D, Nonstop decay. If the mRNA lacks a translation termination codon, the first translating ribosome will stall and be trapped at the 3′ end of the RNA. The Ski7 protein, which is associated with the cytoplasmic exosome complex, is believed to release the stalled ribosome and target the RNA for 3′ degradation by the exosome. Note that this legend provides detail beyond the text.

mRNA Capping and Polyadenylation

Two distinguishing features set mRNA apart from other RNAs: a 5′ cap structure and a 3′ poly(A) tail. Both of these elements help to protect the mRNA against degradation and act synergistically to promote translation in the cytoplasm.

The mRNA cap is an unusual structure. It consists of an inverted 7-methylguanosine residue, which is joined onto the body of the mRNA by a 5′-triphosphate-5′ linkage (Fig. 16-2). Cap addition involves three enzymatic activities: A 5′ RNA triphosphatase cleaves the 5′ triphosphate on the nascent transcript to a diphosphate; RNA guanylyltransferase forms a covalent enzyme–GMP complex and then caps the RNA by transferring this to the diphosphate; and RNA (guanine-7) methyltransferase covalently alters the guanosine base by methylation, generating m7G. In addition, the first encoded nucleotides are frequently modified by methylation of the 2′ hydroxyl position on the ribose group, but the functional significance of these internal modifications is currently unclear.

During 3′ processing, the nascent pre-mRNA is cleaved by an endonuclease, and a tail of adenosine residues is added by poly(A) polymerase. Around 200 to 250 A residues are added to mRNAs in human cells, and around 70 to 90 are added in yeast. Cleavage and polyadenylation are performed by a large complex containing approximately 20 proteins that recognizes sequences in the mRNA, of which the best defined is a highly conserved AAUAAA motif located upstream of the site of polyadenylation (Fig. 16-3).

Links between mRNA Processing and Transcription

The processes of cap addition and 3′ cleavage and polyadenylation are both linked to transcription of the mRNA by RNA polymerase II and occur cotranscriptionally on the nascent RNA (Fig. 16-1). The C-terminal domain (CTD) of the largest subunit of RNA polymerase II (RNA pol II) consists of many copies of a seven-amino-acid repeat (YSPTSPS), which undergo reversible modification by phosphorylation (see Fig. 15-4). A pronounced change in the CTD phosphorylation pattern coincides with the release of the polymerase from initiation mode into processive elongation mode. Immediately following transcription initiation, the repeats are largely phosphorylated on the serine residue at position 5. This modification is lost, while serine 2 phosphorylation increases, as the polymerase moves along the transcript. Capping of the 5′ end of the mRNA occurs by the time the transcript is approximately 25 to 30 nucleotides long, and the capping enzyme interacts with the serine 5 phosphorylated CTD. This and other interactions with the polymerase result in strong allosteric activation of capping activity. In contrast, the cleavage and polyadenylation factors involved in 3′ end processing are recruited by interaction with the CTD phosphorylated at serine 2.

The termination of transcription by RNA polymerase II is dependent on RNA processing. Termination requires recognition of the poly(A) site by the cleavage and polyadenylation factors. These are carried with the transcribing polymerase, and their offloading might make the polymerase competent for termination. Cleavage of the nascent transcript also allows the entry of a 5′ exonuclease—an enzyme that can degrade RNA from the 5′ end in a 3′ direction. This enzyme, which is called Rat1 in yeast and Xrn2 in humans, then chases after the transcribing polymerase, degrading the newly transcribed RNA strand as it goes. When the exonuclease catches the polymerase, it stimulates termination of transcription. This is referred to as the Torpedo model for transcription termination.

Human β-globin mRNA precursors contain an additional cleavage site (termed the cotranscriptional cleavage site) downstream of the site of polyadenylation. The cotranscriptional cleavage site RNA sequence has intrinsic self-cleavage activity in the absence of proteins. Such an RNA is referred to as a self-cleaving ribozyme. This cleavage provides an entry site for the Xrn2 nuclease, allowing more efficient termination.

Pre-mRNA Splicing

Important experiments in the 1950s and 1960s established that genes were collinear with their protein products. It therefore came as a considerable surprise when, in the late 1970s, it emerged that genes in animals and plants frequently had numerous strikingly large inserts whose sequence was not included in the mature mRNA or the protein product. It turns out that most human pre-mRNAs undergo splicing reactions, in which specific regions are cut out and the remaining RNA is covalently rejoined. The regions that will form the mRNA are termed exons, and the bits that are cut out (and are normally degraded) are called introns. In unicellular eukaryotes, introns are generally a few hundred nucleotides in length or shorter. In metazoans, however, they are often several kilobases in length, and pre-mRNAs can contain many introns. It is therefore remarkable that all of the sites can be precisely identified and spliced.

Signals for Splicing

The signals in the pre-mRNA that identify the introns and exons are recognized by a combination of proteins and a group of small RNAs called the small nuclear RNAs (snRNAs). The snRNAs function in complexes with proteins in small nuclear ribonucleoprotein (snRNP) particles. Splicing occurs in a large complex termed the spliceosome, within which the pre-mRNA assembles together with five snRNAs (U1, U2, U4, U5, and U6) and around 100 different proteins. Particularly important protein-splicing factors are members of a large group of SR-proteins—so named because they contain domains rich in serine-arginine dipeptides.

Three conserved sequences within introns play key roles in their accurate recognition by the splicing machinery (Fig. 16-4). These lie immediately adjacent to the 5′ splice site and the 3′ splice site and surrounding an internal region that will form the intron branch point during the splicing reaction. The U1 and U6 snRNAs have sequences that are complementary to the 5′ splice site, while U2 is complementary to the branch point region.

While the spliceosome will finally bring together the sequences at each end of the intron, it is believed that the splicing machinery initially recognizes the exons in a reaction termed exon definition. This makes sense because mRNA exons are generally quite small—up to a few hundred nucleotides in length—whereas the introns can be many kilobases long.

No sequences in the exons are strictly required for splicing, but there are important stimulatory elements termed exonic splicing enhancers (ESEs), which generally bind members of the SR-protein family. The ESEs have two major functions: They stimulate the use of the flanking 5′ and 3′ splice sites, promoting exon definition, and they prevent the exon in which they are located from being included in an intron. This latter function is particularly important in ensuring that all introns are spliced out without the splicing machinery skipping from the 5′ end of one intron to the 3′ end of a downstream intron.

The Pre-mRNA Splicing Reaction

The splicing reaction proceeds in two steps (Fig. 16-4). In the first, the 5′–3′ phosphate linkage that joins the 5′ exon to the first nucleotide of the intron—at the 5′ splice site—is attacked and broken. This reaction leaves the 5′ end of the intron attached to the adenosine residue via an unusual 5′–2′ phosphate linkage. Since this adenosine remains attached to the flanking nucleotides by conventional 5′ and 3′ phosphodiester bonds, this creates a circular molecule with a tail that includes the 3′ exon. This structure is termed the intron lariat, and the adenosine to which the 5′ end of the intron is attached is termed the branch point, because it has a branched structure. In the second step of splicing, the free 3′ hydroxyl on the 5′ exon is used to attack and break the linkage between the last nucleotide of the intron and the 3′ exon—at the 3′ splice site. This leaves the 5′ and 3′ exons joined by a conventional 5′–3′ linkage and releases the intron as a lariat. This is linearized by the debranching enzyme and is probably rapidly degraded from both ends by exonucleases.

The initial steps in splicing are the recognition of the 5′ splice site by the U1 snRNA and the binding of U2 snRNA to the branch-point region, assisted by SR-proteins (Fig. 16-5). Base pairing between U2 and the pre-mRNA leaves a single adenosine bulged out of a helix and available for interaction with the 5′ splice site. The U4 and U6 snRNAs then join the spliceosome as a base-paired duplex, within a large complex that also contains the U5 snRNA. The U4 and U6 base pairing is opened, and the liberated U6 sequences displace U1 at the 5′ splice site. They also bind to U2—bringing the 5′ splice site and branch point into close proximity. At this point, the first enzymatic step of splicing occurs. This reaction is believed to be directly catalyzed by the intricate structure of the snRNA/pre-mRNA interactions rather than by the protein components of the spliceosome. The 5′ splice site is attacked and broken by the ribose 2′ hydroxyl group of the adenosine residue that is bulged out of the U2-intron duplex. The U5 snRNA and its associated proteins are responsible for holding onto the now free 5′ exon and correctly aligning it with the 3′ exon for the second catalytic step of splicing.

Both catalytic steps in splicing are technically termed transesterification reactions, because nucleotides are linked by phosphodiester bonds, and the new bond is made at the same time as the old bond is broken. For this reason, the splicing reactions do not, in principle, require any input of energy. However, the assembly and subsequent disassembly of the spliceosome require numerous ATPases. Most of these belong to a family of proteins that are generally termed RNA helicases. These are believed to use the energy of ATP hydrolysis to catalyze structural rearrangements within the assembling and disassembly spliceosome.

AT-AC Introns

The large majority of human mRNA splice sites have a GU dinucleotide at the 5′ splice site and AG at the 3′ splice site (Fig. 16-4). However, a minor group of introns contain different consensus splicing signals and are termed AT-AC (pronounced “attack”) introns because of the identities of the nucleotides located at the 5′ and 3′ splice sites. The splicing of the AT-AC introns involves a distinct set of snRNAs—U11, U12, U4ATAC, and U6ATAC—which replace U1, U2, U4, and U6, respectively. Only U5 is common to both spliceosomes. However, the underlying splicing mechanism is believed to be the same for both classes of intron.

Alternative Splicing

A surprising finding from the human genomic sequencing project was the relatively low number of predicted protein-coding genes, currently estimated at around 30,000. This result has caused increased interest in the phenomenon of alternative splicing, which allows the production of more than one mRNA, and therefore more than one protein product, from a single gene. Several general forms of alternative splicing are commonly found. Exons can be excluded from the mRNAs, or introns can be included. Some genes have arrays of multiple alternative exons, only one of which is included in each mRNA. In addition, the use of alternative splice sites can generate longer or shorter forms of individual exons (Fig. 16-6).

Current estimates for the proportion of human genes that are subject to alternative splicing range from 30% to 75%. In some cases, this could potentially give rise to a very large number of different protein isoforms.