Molecules: Structures and Dynamics

Published on 28/02/2015 by admin

Last modified 22/04/2025

Print this page

This article have been viewed 3791 times

CHAPTER 3 Molecules: Structures and Dynamics

This chapter describes the properties of water, proteins, nucleic acids, and carbohydrates as they pertain to cell biology. Chapter 7 covers lipids in the context of biological membranes.

Water

Water is so familiar that its role in cell biology and its fascinating properties tend to be neglected. Water is the most abundant and important molecule in cells and tissues. Humans are about two thirds water. Water is not only the solvent for virtually all cellular compounds but also a reactant or product in thousands of biochemical reactions catalyzed by enzymes, including the synthesis and degradation of proteins and nucleic acids and the synthesis and hydrolysis of adenosine triphosphate (ATP), to name a few examples. Water is also an important determinant of biological structure, as lipid bilayers, folded proteins, and macromolecular assemblies are all stabilized by the hydrophobic effect derived from the exclusion of water from nonpolar surfaces (see Fig. 4-5). Additionally, water forms hydrogen bonds with polar groups of many cellular constituents ranging in size from small metabolites to large proteins. It also associates with small inorganic ions.

Physical chemists are still trying to understand water, one of the most complex liquids. The molecule is roughly tetrahedral in shape (Fig. 3-1A), with two hydrogen bond donors and two hydrogen bond acceptors. The electronegative oxygen withdraws the electrons from the O—H covalent bonds, leaving a partial positive charge on the hydrogens and a partial negative charge on the oxygen. Hydrogen bonds between water molecules are partly electrostatic because of the charge separation (induced dipole) but also have some covalent character, owing to overlap of the electron orbitals. The strength of hydrogen bonds depends on their orientation, being strongest along the lines of tetrahedral orbitals. One can think of oxygens of two water molecules sharing a hydrogen-bonded hydrogen. Given two hydrogen bond donors and acceptors, water can be fully hydrogen-bonded, as it is in ice (Fig. 3-1C). Crystalline water in ice has a well-defined structure with a complete set of tetragonal hydrogen bonds and a remarkable amount (35%) of unoccupied space (Fig. 3-1D).

Figure 3-1 water. A, Space-filling model and orientation of the tetrahedral electron orbitals that define the directions of the hydrogen bonds. B, Tetrahedral local order in liquid water revealed by a theoretical calculation of a three-dimensional map of regions around the central water molecule where the local density of oxygen is at least 40% higher than average. Two adjacent water oxygens are centered near the two hydrogen bond donors, and two other waters are positioned in an elongated cap so that their protons can hydrogen-bond with the central water oxygen. C, Stick figure of crystallized ice showing the tetrahedral network of hydrogen bonds.D, A space-filling model of crystalline ice showing the large amount of unoccupied space. E, Shell of water molecules around a potassium ion. Small ions, such as Li⁺, Na⁺, and F⁻, bind water more tightly than do larger ions, such as K⁺, Cl⁻, and I⁻.

(D–E, From www.nyu.edu/pages/mathmol/library/water, Project MathMol Scientific Visualization Lab, New York University. See “ice.pdb” and “waterbox.pdb.”)

Neither theoretical calculations nor physical observations of liquid water have revealed a consistent picture of its organization. When ice melts, the volume decreases by only about 10%, so liquid water has considerable empty space too. The heat required to melt ice is a small fraction (15%) of the heat required to convert ice to a gas, in which all the hydrogen bonds are lost. Because the heat of melting reflects the number of bonds broken, liquid water must retain most of the hydrogen bonds that stabilize ice. These hydrogen bonds create a continuous, three-dimensional network of water molecules connected at their tetrahedral vertices, allowing water to remain a liquid at a higher temperature than is the case for a similar molecule, ammonia. On the other hand, because liquid water does not have a well-defined, long-range structure, it must be very heterogeneous and dynamic, with rapidly fluctuating regions of local order and disorder. This incomplete picture of water structure limits our ability to understand macromolecular interactions in an aqueous environment.

The properties of water have profound effects on all other molecules in the cell. For example, ions organize shells of water around themselves that compete effectively with other ions with which they might interact electrostatically (Fig. 3-1E). This shell of water travels with the ions, governing the size of pores that they can penetrate. Similarly, hydrogen bonding with water strongly competes with the hydrogen bonding that occurs between solutes, including macromolecules. By contrast, water does not interact as favorably with nonpolar molecules as it does with itself, so the solubility of nonpolar molecules in water is low, and they tend to aggregate to reduce their surface area in contact with water. Such nonpolar interactions are energeti-cally favorable because they reduce unfavorable interactions of nonpolar groups with water and increase favorable interactions of water molecules with each other. This is called the hydrophobic effect (see Fig. 4-5). These interactions of water dominate the behavior of solute molecules in an aqueous environment, where they influence the assembly of proteins, lipids, and nucleic acids into the structures that they assume in the cell. On the other hand, strategically placed water molecules can bridge two macromolecules in functional assemblies.

Proteins

Proteins are major components of all cellular systems. This section presents some basic concepts about protein structure that help to explain how proteins function in cells. More extensive coverage of this topic is available in biochemistry books and specialized books on protein chemistry.

Proteins consist of one or more linear polymers called polypeptides, which consist of various combinations of 20 different amino acids (Figs. 3-2 and 3-3) linked together by peptide bonds (Fig. 3-4). When linked in polypeptides, amino acids are referred to as residues. The sequence of amino acids in each type of polypeptide is unique. It is specified by the gene encoding the protein and is read out precisely during protein synthesis (see Fig. 18-8). The polypeptides of proteins with more than one chain are usually synthesized separately. However, in some cases, a single chain is divided into pieces by cleavage after synthesis.

Figure 3-2 the 20 l-amino acids specified by the genetic code. Shown for each are the full name, the three-letter abbreviation, the single-letter abbreviation, a stick figure of the atoms, and a space-filling model of the atoms in which hydrogen is white, carbon is black, oxygen is red, nitrogen is blue, and sulfur is yellow. For all, the amino group is protonated and carries a +1 charge, whereas the carboxyl group is ionized and carries a −1 charge. The amino acids are grouped according to the side chains attached to the α-carbon. These side chains fall into three subgroups. Top, The aliphatic (G, A, V, L, I, C, M, P) and aromatic (Y, F, W) side chains partition into nonpolar environments, as they interact poorly with water. Middle, The uncharged side chains with polar hydrogen bond donors or acceptors (S, T, N, Q, Y) can hydrogen-bond with water. Bottom, At neutral pH, the basic amino acids K and R are fully protonated and carry a charge of +1, the acidic amino acids (D, E) are fully ionized and carry a charge of −1, and histidine (pK: ∼6.0) carries a partial positive charge. All the charged residues interact favorably with water, although the aliphatic chains of R and K also give them significant nonpolar character.

Figure 3-4 the polypeptide backbone. This perspective drawing shows four planar peptide bonds, the four participating α-carbons (labeled 1 to 4), the R groups represented by the β-carbons, amide protons, carbonyl oxygens, and the two rotatable backbone bonds (φ and ø). The dotted lines outline one amino acid.

(Adapted from Creighton TE: Proteins: Structure and Molecular Principles. New York, WH Freeman and Co, 1983.)

Polypeptides range widely in length. Small peptide hormones, such as oxytocin, consist of as few as nine residues, while the giant structural protein titin (see Fig. 39-7) has more than 25,000 residues. Most cellular proteins fall in the range of 100 to 1000 residues. Without stabilization by disulfide bonds or bound metal ions, about 40 residues are required for a polypeptide to adopt a stable three-dimensional structure in water.

The sequence of amino acids in a polypeptide can be determined chemically by removing one amino acid at a time from the amino terminus and identifying the product. This procedure, called Edman degradation, can be repeated about 50 times before declining yields limit progress. Longer polypeptides can be divided into fragments of fewer than 50 amino acids by chemical or enzymatic cleavage, after which they are purified and sequenced separately. Even easier, one can sequence the gene or a complementary DNA (cDNA) copy of the messenger RNA for the protein (Fig. 3-16) and use the genetic code to infer the amino acid sequence. This approach misses posttranslational modifications (Fig. 3-3). Analysis of protein fragments by mass spectrometry can be used to sequence even tiny quantities of proteins.

Figure 3-16 The sequence of a purified fragment of DNA is rapidly determined by in vitro synthesis (see Fig. 42-1) using the four deoxynucleoside triphosphates plus a small fraction of one dideoxynucleoside triphosphate. The random incorporation of the dideoxy residue terminates a few of the growing DNA molecules every time that base appears in the sequence. The reaction is run separately with each dideoxynucleotide, and fragments are separated according to size by gel electrophoresis (see Fig. 6-5), with the shortest fragments at the bottom. A radioactive label makes the fragments visible when exposed to an X-ray film. The sequence is read from the bottom as indicated. An automated method uses four different fluorescent dideoxynucleotides to mark the end of the fragments and electronic detectors to read the sequence.

(Based on original data from W-L. Lee, Salk Institute for Biological Studies, San Diego, California.)

Figure 3-3 modified amino acids. Protein kinases add a phosphate group to serine, threonine, tyrosine, histidine, and aspartic acid (not shown). Other enzymes add one or more methyl groups to lysine, arginine, or histidine (not shown); a hydroxyl group to proline; or an acetate to the N-terminus of many proteins. The reducing environment of the cytoplasm minimizes the formation of disulfide bonds, but under oxidizing conditions within the membrane compartments of the secretory pathway (see Chapter 21), intramolecular or intermolecular disulfide (S—S) bonds form between adjacent cysteine residues.

Properties of Amino Acids

Every student of cell biology should know the chemical structures of the amino acids used in proteins (Fig. 3-2). Without these structures in mind, reading the literature and this book is like spelling without knowledge of the alphabet. In addition to their full names, amino acids are frequently designated by three-letter or single-letter abbreviations.

All but one of the 20 amino acids commonly used in proteins consist of an amino group, bonded to the α-carbon, bonded to a carboxyl group. Proline is a variation on this theme with a cyclic side chain bonded back to the nitrogen to form an imino group. Both the amino group (pK > 9) and carboxyl group (pK = ∼4) are partially ionized under physiological conditions. With the exception of glycine, all amino acids have a β-carbon and a proton bonded to the α-carbon. (Glycine has a second proton instead.) This makes the α-carbon an asymmetrical center with two possible configurations. The l-isomers are used almost exclusively in living systems. Compared with natural proteins, proteins constructed artificially from d-amino acids have mirror-image structures and properties.

Each amino acid has a distinctive side chain, or R group, that determines its chemical and physical properties. Amino acids are conveniently grouped in small families according to their R groups. Side chains are distinguished by the presence of ionized groups, polar groups capable of forming hydrogen bonds and their apolar surface areas. Glycine and proline are special cases, owing to their unique effects on the polymer backbone (see later section).

Enzymes modify many amino acids after their incorporation into polypeptides. These posttranslational modifications have both structural and regulatory functions (Fig. 3-3). These modifications are referred to many times in this book, especially reversible phosphorylation of amino acid side chains, the most common regulatory reaction in biochemistry (see Fig. 25-1). Methylated and acetylated lysines are important for chromatin regulation in the nucleus (see Fig. 13-3). Whole proteins such as ubiquitin or SUMO can be attached through isopeptide bonds to lysine e-amino groups to act as signals for degradation (see Fig. 23-8) or endocytosis (see Fig. 22-16).

This repertoire of amino acids is sufficient to construct millions of different proteins, each with different capacities for interacting with other cellular constituents. This is possible because each protein has a unique three-dimensional structure (Fig. 3-5), each displaying the relatively modest variety of functional groups in a different way on its surface.

Figure 3-5 a gallery of molecules. Space-filling models of proteins compared with a lipid bilayer, transfer rna, and dna, all on the same scale.

(Modified from Goodsell D, Olsen AJ: Soluble proteins: Size, shape, and function. Trends Biochem Sci 18:65–68, 1993.)

Architecture of Proteins

Our knowledge of protein structure is based largely on X-ray diffraction studies of protein crystals or nuclear magnetic resonance (NMR) spectroscopy studies of small proteins in solution. These methods provide pictures showing the arrangement of the atoms in space. X-ray diffraction requires three-dimensional crystals of the protein and yields a three-dimensional contour map showing the density of electrons in the molecule (Fig. 3-6). In favorable cases, all the atoms except hydrogens are clearly resolved, along with water molecules occupying fixed positions in and around the protein. NMR requires concentrated solutions of protein and reveals distances between particular protons. Given enough distance constraints, it is possible to calculate the unique protein fold that is consistent with these spacings. In a few cases, electron microscopy of two-dimensional crystals has revealed atomic structures (see Figs. 7-8B and 34-5).

Figure 3-6 protein structure determination by x-ray crystallography. A small part of an electron density map at 1.5-Å resolution of the cytoplasmic T1 domain of the shaker potassium channel from Aplysia. The chicken-wire map shows the electron density. The stick figure shows the superimposed atomic model.

(Based on original data from M. Nanao and S. Choe, Salk Institute for Biological Studies, San Diego, California.)

Each amino acid residue contributes three atoms to the polypeptide backbone: the nitrogen from the amino group, the α-carbon, and the carbonyl carbon from the carboxyl group. The peptide bond linking the amino acids together is formed by dehydration synthesis (see Fig. 17-10), a common chemical reaction in biological systems. Water is removed in the form of a hydroxyl from the carboxyl group of one amino acid and a proton from the amino group of the next amino acid in the polymer. Ribosomes catalyze this reaction in cells. Chemical synthesis can achieve the same result in the laboratory. The peptide bond nitrogen has an (amide) proton, and the carbon has a double-bonded (carbonyl) oxygen. The amide proton is an excellent hydrogen bond donor, whereas the carbonyl oxygen is an excellent hydrogen bond acceptor.

The end of a polypeptide with the free amino group is called the amino terminus or N-terminus. The numbering of the residues in the polymer starts with the N-terminal amino acid, as the biosynthesis of the polymer begins there on ribosomes. The other end of a polypeptide has a free carboxyl group and is called the carboxyl terminus or C-terminus.

The peptide bond has some characteristics of a double bond, owing to resonance of the electrons, and is relatively rigid and planar. The bonds on either side of the α-carbon can rotate through 360 degrees, although a relatively narrow range of bond angles is highly favored. Steric hindrance between the β-carbon (on all the amino acids but glycine) and the α-carbon of the adjacent residue favors a trans configuration in which the side chains alternate from one side of the polymer to the other (Fig. 3-4). Folded proteins generally use a limited range of rotational angles to avoid steric collisions of atoms along the backbone. Glycine without a β-carbon is free to assume a wider range of configurations and is useful for making tight turns in folded proteins.

Folding of Polypeptides

The three-dimensional structure of a protein is determined solely by the sequence of amino acids in the polypeptide chain. This was established by reversibly unfolding and refolding proteins in a test tube. Many, but not all, proteins that are unfolded by harsh treatments (high concentrations of urea or extremes of pH) will refold to regain full activity when returned to physiological conditions. Although many proteins are flexible enough to undergo conformational changes (see later discussion), polypeptides rarely fold into more than one final stable structure. Exceptions with medical importance are prions and amyloid (Box 3-1).

BOX 3-1 Protein Misfolding in Amyloid Diseases

Misfolding of diverse proteins and peptides results in spontaneous assembly of insoluble amyloid fibrils. Such pathological misfolding is associated with Alzheimer’s disease, transmissible spongiform encephalopathies (such as “mad cow disease”), and polyglutamine expansion diseases (such as Huntington’s disease, in which genetic mutations encode abnormal stretches of the amino acid glutamine). Accumulation of amyloid fibrils in these diseases is associated with slow degeneration of the brain. Pathological misfolding also results in amyloid deposition in other organs such as the endocrine pancreas in Type II diabetes. The precursor of a given amyloid fiber may be the wild-type protein or a protein modified through mutation, proteolytic cleavage, posttranslational modification, or polyglutamine expansion. The pathology of amyloidosis is not well understood. Some, but not all, amyloids are intrinsically toxic to cells. Some amyloid precursors are more toxic than the fibrils themselves. In all cases, fibril initiation is very slow, but once formed, fibrils act as seeds to promote the assembly of additional protein into fibrils.

Given that many unrelated proteins and peptides form amyloid, it is remarkable that most of these twisted fibrils have similar structures: narrow sheets up to 10μm long consisting of thousands of short β-strands that run across the width of the fibril. The β-strands can be either parallel or antiparallel, depending on the particular protein or peptide. Some amyloid fibrils consist of multiple layers of β-strands. The structures of the various parent proteins have nothing in common with each other or with amyloid cross β-sheets, so these are rare examples of polypeptides with two stable folds. To form amyloid, the native protein must either be partially unfolded or cleaved into a fragment with a tendency to aggregate.

In the common form of dementia called Alzheimer’s disease, the peptide that forms amyloid is a proteolytic fragment of a transmembrane protein of unknown function called β-amyloid precursor protein. “Infectious proteins” called prions cause transmissible spongiform encephalopathies. Normally, these proteins do no harm, but once misfolded, the protein can act as a seed to induce other copies of the protein to form insoluble amyloid-like assemblies that are toxic to nerve cells. Such misfolding rarely occurs under normal circumstances, but the misfolded seeds can be acquired by ingesting infected tissues.

Other proteins, including the peptide hormone insulin, the actin-binding protein gelsolin, and the blood-clotting protein fibrinogen, form amyloid in certain diseases. An inherited point mutation makes the secreted form of gelsolin susceptible to cleavage by a peptide processing protease in the trans-Golgi network. Fragments of 53 or 71 residues form extracellular amyloid fibrils in several organs.

Given that amyloid fibrils form spontaneously and are exceptionally stable, it is not surprising that functional amyloids exist in organisms ranging from bacteria to humans. For example, formation of the pigment granules responsible for skin color depends on a proteolytic fragment of a lysosomal membrane protein that forms amyloid fibrils as a scaffold from melanin pigments. Budding yeast has a number of proteins that can either assume their “native” fold or assemble into amyloid fibrils. The native fold of the protein Sup35p serves as a translation termination factor that stops protein synthesis at the stop codon (see Fig. 17-8). Rarely, Sup35p misfolds and assembles into an amyloid fibril. These fibrils sequester all the Sup35p in fibrils, where it is inactive. The faulty translation termination that occurs in its absence has diverse consequences that are inherited like prions from one generation of yeast to the next.

Although proteins fold spontaneously into a unique structure, it is not yet possible to predict three-dimensional structures of proteins from their amino acid sequences unless one already knows the structure of an ortholog or paralog. Then one can use the known structure and the amino acid sequence of the unknown to build a homology model that is often accurate enough to make reliable inferences about function. Predicting protein structures from sequence alone would have profound practical consequences, since the number of protein sequences known from genome-sequencing projects far exceeds the number of established protein structures (about 10,000).

The following factors influence protein folding:

1. Hydrophobic side chains pack very tightly in the core of proteins to minimize their exposure to water. Little free space exists inside proteins, so the hydrophobic core resembles a hydrocarbon crystal more than an oil droplet (Fig. 3-7). Accordingly, the most conserved residues in families of proteins are found in the interior. Nevertheless, the internal packing is malleable enough to tolerate mutations that change the size of buried side chains, as the neighboring chains can rearrange without changing the overall shape of the protein. Interior charged or polar residues frequently form hydrogen bonds or salt bridges to neutralize their charge.

2. Most charged and polar side chains are exposed on the surface, where they interact favorably with water. Although many hydrophobic residues are inside, roughly half the residues that are exposed to solvent on the outer surface are also hydrophobic. Amino acid residues on the surface typically appear to play a minor role in protein folding. Experimentally, one can substitute many residues on the surface of a protein with any other residue without changing the stability or three-dimensional structure.

3. The polar amide protons and carbonyl oxygens of the polypeptide backbone maximize their potential to form hydrogen bonds with other backbone atoms, side chain atoms, or water. In the hydrophobic core of proteins, this is achieved by hydrogen bonds with other backbone atoms in two major types of secondary structures: α-helices and β-sheets (Fig. 3-8).

4. Elements of secondary structure usually extend completely across compact domains. Consequently, most loops connecting α-helices and β-strands are on the surface of proteins, not in the interior (Fig. 3-9). Exceptions are found in some integral membrane proteins (see Figs. 10-3, 10-13, 10-14, and 10-15), where α-helices can reverse in the interior of the protein.

Figure 3-7 Space-filling (A) and ribbon (B) models of a cross section of the bacterial chemotaxis protein CheY illustrate some of the factors that contribute to protein folding. α-Helices pack on both sides of the central, parallel β-sheet. Most of the polar and charged residues are on the surface. The tightly packed interior of largely apolar residues excludes water. The buried backbone amides and carbonyls are fully hydrogen-bonded to other backbone atoms in both the α-helices and β-sheet. (PDB file: 2CHF.)

Figure 3-8 models of secondary structures and turns of proteins. A, α-Helix. The stick figure (left) shows a right-handed α-helix with the N-terminus at the bottom and side chains R represented by the β-carbon. The backbone hydrogen bonds are indicated by blue lines. In this orientation, the carbonyl oxygens point upward, the amide protons point downward, and the R groups trail toward the N-terminus. Space-filling models (middle) show a polyalanine α-helix. The end-on views show how the backbone atoms fill the center of the helix. A space-filling model (right) of α-helix 5 from bacterial rhodopsin shows the side chains. Some key dimensions are 0.15nm rise per residue, 0.55nm per turn, and diameter of about 1.0nm. (PDB file: 1BAD.) B, Stick figure and space-filling models of an antiparallel β-sheet. The arrows indicate the polarity of each chain. With the polypeptide extended in this way, the amide protons and carbonyl oxygens lie in the plane of the sheet, where they make hydrogen bonds with the neighboring strands. The amino acid side chains alternate pointing upward and downward from the plane of the sheet. Some key dimensions are 0.35nm rise per residue in a β-strand and 0.45nm separation between strands. (PDB file: 1SLK.) C, Stick figure and space-filling models of a parallel β-sheet. All strands have the same orientation (arrows). The orientations of the hydrogen bonds are somewhat less favorable than that in an antiparallel sheet. D–E, Stick figures of two types of reverse turns found between strands of antiparallel β-sheets. (PDB file: 1IMM.) F, Stick figure of an omega loop. (PDB file: 1LNC.)

Figure 3-9 ribbon diagrams of protein backbones showing β-strands as flattened arrows, α-helices as coils, and other parts of the polypeptide chains as ropes. Left, The β-subunit of hemoglobin consists entirely of tightly packed α-helices. (PDB file: 1 MBA.) Middle, CheY is a mixed a/b structure, with a central parallel β-sheet flanked by α-helices. Note the right-handed twist of the sheet (defined by the sheet turning away from the viewer at the upper right) and right-handed pattern of helices (defined by the helices angled toward the upper right corner of the sheet) looping across the β-strands. (Compare the cross section in Figure 3-7). (PDB file: 2CHF.) Right, The immunoglobulin V_L domain consists of a sandwich of two antiparallel β-sheets. (PDB file: 2IMM.)

These factors tend to maximize the stability of folded proteins in one particular “native” conformation, but the native state of folded proteins is relatively unstable. The standard free energy difference (see Chapter 4) between a folded and globally unfolded protein is only about 40 kJ mol⁻¹, much less than that of a single covalent bond! Even the substitution of a single crucial amino acid can destabilize certain proteins, causing a loss of function. In other cases, misfolding results in noncovalent polymerization of a protein into amyloid fibrils associated with serious diseases (Box 3-1).

The amino acid sequence of each polypeptide contains all the information required to specify folding into the native protein structure, just one of a near infinity of possible conformations. Chapter 17 explains how many conformations of the unfolded polypeptide are rapidly sampled through trial and error to select stable intermediates leading to the native structure. Cells use molecular chaperones to guide and control the quality of folding.

Secondary Structure

Much of the polypeptide backbone of proteins folds into stereotyped elements of secondary structure, especially α-helices and β-sheets (Fig. 3-8). They are shown as spirals and polarized ribbons in “ribbon diagrams” of protein organization used throughout this book. Both α-helices and β-strands are linear, so globular proteins can be thought of as compact bundles of straight or gently curving rods, laced together by surface turns.

a-Helices allow polypeptides to maximize hydrogen bonding of backbone polar groups while using highly favored rotational angles around the α-carbons and tight packing of atoms in the core of the helix (Fig. 3-8). All of these features stabilize the α-helix. Viewed with the amino terminus at the bottom, the amide protons all point downward and the carbonyl oxygens all point upward. The side chains project radially around the helix, tilted toward its N-terminus. Given 3.6 residues in each turn of the right-handed helix, the carbonyl oxygen of residue 1 is positioned perfectly to form a linear hydrogen bond with the amide proton of residue 5. This n to n + 4 pattern of hydrogen bonds repeats along the whole α-helix.

The orientation of backbone hydrogen bonds in α-helices has two important consequences. First, a helix has an electrical dipole moment, negative at the C-terminus. Second, the ends of helices are less stable than the middle, as four potential hydrogen bonds are not completed by backbone interactions at each end. These unmet backbone hydrogen bonds can be completed by interaction with appropriate donors or acceptors on the side chains of the terminal residues. Interactions with serine and asparagine are favored as “caps” at the N-termini of helices because their side chains can complete the hydrogen bonds of the backbone amide nitrogens. Lysine, histidine, and glutamine are favored hydrogen bonding caps for the C-termini of helices.

All amino acids are found within naturally occurring α-helices. Proline is often found at the beginning of helices and glycine at the end, because they are favored in bends. Both are underrepresented within helices. When present, proline produces bends. Glycine is more common in transmembrane helices, where it contributes to helix-helix packing.

A second strategy used to stabilize the backbone structure of polypeptides is hydrogen bonding of β-strands laterally to form β-sheets (Figs. 3-8 and 3-9). In individual β-strands, the peptide chain is extended in a configuration close to all-trans with side chains alternating top and bottom and amide protons and carbonyl oxygens alternating right and left. β-Strands can form a complete set of hydrogen bonds, with neighboring strands running in the same or opposite directions in any combination. However, the orientation of hydrogen bond donors and acceptors is more favorable in a β-sheet with antiparallel strands than in sheets with parallel strands. Largely parallel β-sheets are usually extensive and completely buried in proteins. β-Sheets have a natural right-handed twist in the direction along the strands. Antiparallel β-sheets are stable even if the strands are short and extensively distorted by twisting. Antiparallel sheets can wrap around completely to form a β-barrel with as few as five strands, but the natural twist of the strands and the need to fill the core of the barrel with hydrophobic residues favors barrels with eight strands.

Up to 25% of the residues in globular proteins are present in bends at the surface (Fig. 3-8D-F). Residues constituting bends are generally hydrophilic. The presence of glycine or proline in a turn allows the backbone to deviate from the usual geometry in tight turns, but the composition of bends is highly variable and not a strong determinant of folding or stability. Turns between linear elements of secondary structure are called reverse turns, as they reverse the direction of the polypeptide. Those between β-strands have a few characteristic conformations and are called β-bends.

Many parts of polypeptide chains in proteins do not have a regular structure. At one extreme, small segments of polypeptide, frequently at the N- or C-terminus, are truly disordered in the sense that they are mobile. Many other irregular segments of polypeptide are tightly packed into the protein structure. Omega loops are compact structures consisting of 6 to 16 residues, generally on the protein surface, that connect adjacent elements of secondary structure (Fig. 3-8F). They lack regular structure but typically have the side chains packed in the middle of the loop. Some are mobile, but many are rigid. Omega loops form the antigen-binding sites of antibodies. In other proteins, they bind metal ions or participate in the active sites of enzymes.

Packing of Secondary Structure in Proteins

Elements of secondary structure can pack together in almost any way (Fig. 3-9), but a few themes are favored enough to be found in many proteins. For example, two β-sheets tend to pack face to face at an angle of about 40 degrees with nonpolar residues packed tightly, knobs into holes, in between. α-Helices tend to pack at an angle of about 30 degrees across β-sheets, always in a right-handed arrangement. Adjacent α-helices tend to pack together at an angle of either +20 degrees or −50 degrees, owing to packing of side chains from one helix into grooves between side chains on the other helix.

Coiled-coils are a common example of regular superstructure (Fig. 3-10). Two α-helices pair to form a fibrous structure that is widely used to create stable polypeptide dimers in transcription factors (see Fig. 15-18) and structural proteins (see Fig. 39-4). Typically, two identical α-helices wrap around each other in register in a left-handed super helix that is stabilized by hydrophobic interactions of leucines and valines at the interface of the two helices. Intermolecular ionic bonds between the side chains of the two polypeptides also stabilize coiled-coils. Given 3.6 residues per turn, the sequence of a coiled-coil has hydrophobic residues regularly spaced at positions 1 and 4 of a “heptad repeat.” This pattern allows one to predict the tendency of a polypeptide to form coiled-coils from its amino acid sequence.

Figure 3-10 coiled-coils. A, Comparison of a single α-helix, represented by spheres centered on the α-carbons, and a two-stranded, left-handed coiled-coil. Two identical α-helices make continuous contact along their lengths by the interaction of the first and fourth residue in every two turns (seven residues) of the helix. (PDB file: 2TMA.) B, Atomic structure of the GCN4 coiled-coil, viewed end-on. The coiled-coil holds together two identical peptides of this transcription factor dimer (see Fig. 15-17 for information on its function). Hydrophobic side chains fit together like knobs into holes along the interface between the two helices. (PDB file: GCN4.) C, Helical wheel representation of the GCN4 coiled-coil. Following the arrows around the backbone of the polypeptides, one can read the sequences from the single-letter code, starting with the boxed residues and proceeding to the most distal residue. Note that hydrophobic residues in the first (a) and fourth (d) positions of each two turns of the helices make hydrophobic contacts that hold the two chains together. Electrostatic interactions (dashed lines) between side chains at positions e and g stabilize the interaction. Other coiled-coils consist of two different polypeptides (see Fig. 15-18), and some are antiparallel (see Fig. 13-19).

(C, Redrawn from O’Shea E, Klemm JD, Kim PS, Alber T: X-ray structure of the GCN4 leucine zipper, a two-stranded, parallel coiled-coil. Science 254:539–544, 1991.)

b-Sheets can also form extended structures. One called a β-helix consists of a continuous polypeptide strand folded into a series of short β-sheets that form a three-sided helix. Fig. 24-4 shows end-on and side views of two β-helices of a growth factor receptor.

Interaction of Proteins with Solvent

The surface of proteins is almost entirely covered with protons (Fig. 3-11). Some protons are potential hydrogen bond donors, but many are inert, being bonded to backbone or side chain aliphatic carbons. Although most of the charged side chains are exposed on the surface, so are many nonpolar side chains. Many water molecules are ordered on the surface of proteins by virtue of hydrogen bonds to polar groups. These water molecules appear in electron density maps of crystalline proteins but exchange rapidly, on a picosecond (10⁻¹² second) time scale. Waters that are in contact with nonpolar atoms maximize hydrogen bonding with each other, forming a dynamic layer of water with reduced translational diffusion compared with bulk water. This lowers the entropy of the water by increasing its order and provides a thermodynamic impetus to protein folding pathways that minimize the number of hydrophobic atoms displayed on the surface (see Fig. 4-5).

Figure 3-11 water associated with the surface of a protein. A, Protein protons exposed to solvent (white) on the surface of a small protein, bovine pancreatic trypsin inhibitor. B, Water molecules observed on the surface of the protein in crystal structures. (PDB file: 5BTI.)

Protein Dynamics

Pictures of proteins tend to give the false impression that they are rigid and static. On the contrary, even when packed in crystals, the atoms of proteins vibrate around their mean positions on a picosecond time scale with amplitudes up to 0.2nm and velocities of 200m per second. This motion is an inevitable consequence of the kinetic energy of each atom, about 2.5 kJ mol⁻¹ at 25°C. This allows the protein as a whole to explore a variety of subtly different conformations on a fast time scale. Binding to a ligand or a change in conditions may favor one of these alternative conformations.

In addition to relatively small, local variations in structure, many proteins undergo large conformational changes (Fig. 3-12). These changes in structure often reflect a change of activity or physical properties. Conformational changes play roles in many biological processes ranging from opening and closing ion channels (see Fig. 10-5) to cell motility (see Fig. 36-5). Many conformational changes have been observed indirectly by spectroscopy or hydrodynamic methods or directly by crystallography or NMR. For example, when glucose binds the enzyme hexokinase, the two halves of the protein clamp around this substrate by rotating 12 degrees about a hinge consisting of two polypeptides. Guanosine triphosphate (GTP) binding to elongation factor EF-Tu causes a domain to rotate 90 degrees about two glycine residues (see Fig. 25-7)! Similarly, phosphorylation of glycogen phosphorylase causes a local rearrangement of the N-terminus that transmits a structural change over a distance of more than 2nm to the active site (see Fig. 27-3). The Ca²⁺ binding regulatory protein calmodulin undergoes a dramatic conformational change (Fig. 3-12) when wrapping tightly around a helical peptide of a target protein (also see Chapter 26).

Figure 3-12 conformational changes of proteins. A, The glycolytic enzyme hexokinase. The two domains of the protein hinge together to surround the substrate, glucose. (PDB files: 2YHX and 1HKG.) B, EF-Tu, a cofactor in protein synthesis (see Fig. 17-10), folds more compactly when it binds guanosine triphosphate. (PDB files: 1EFU and 1EFT.) C, Calmodulin (see Chapter 26) binds Ca²⁺ and wraps itself around an α-helix (red) in target proteins. Note the large change in position of the helix marked with an asterisk. (PDB files: CLN and 2BBM.)

Modular Domains in Proteins

Most polypeptides consist of linear arrays of multiple independently folded, globular regions, or domains, connected in a modular fashion (Fig. 3-13). Most domains consist of 40 to 100 residues, but kinase domains and motor domains (see Figs. 36-3 and 36-9) are much larger. Each of more than 1000 recognized families of domains is thought to have evolved from a different common ancestor. In this sense, the members of a family are said to be homologous. Through the processes of gene duplication, transposition, and divergent evolution, the most widely used domains (e.g., the immunoglobulin domain) have become incorporated into hundreds of different proteins, where they serve unique functions. Homologous domains in different proteins have similar folds but may differ significantly in amino acid sequences. Nevertheless, most related domains can be recognized from characteristic patterns of amino acids along their sequences. For example, cysteine residues of immunoglobulin G (Ig) domains are spaced in a pattern required to make intramolecular disulfide bonds (Fig. 3-3).

Figure 3-13 modular proteins constructed from evolutionarily homologous, independently folded domains. A, Examples of protein domains used in many proteins: fibronectin 1 (FN I), fibronectin 2 (FN II), fibronectin 3 (FN III), immunoglobulin (Ig), Src homology 2 (SH2), Src homology 3 (SH3), kinase. (PDB files: FN7, 1PDC, 1FNA, 1IG2, 1HCS, 1PRM, and 1CTP.) B, Immunoglobulin G (IgG), a protein composed of 12Ig domains on four polypeptide chains. Two identical heavy chains (H) consist of four Ig domains, and two identical light chains (L) consist of two Ig domains. The sequences of these six Ig domains differ, but all of the domains are folded similarly. The two antigen-binding sites are located at the ends of the two arms of the Y-shaped molecule composed of highly variable loops contributed by domains H1 and L1. (PDB file: 1IG2.) C, Examples of proteins constructed from the domains shown in A: fibronectin (see Fig. 29-15), CD4 (see Figs. 27-8 and 28-9), PDGF-receptor (see Fig. 24-4), Grb2 (see Fig. 27-6), Src (see Fig. 25-3 and Box 27-1), and twitchin (see Chapter 39). Each of the 31 FN3 domains in twitchin has a different sequence. F1 is FI, F2 is FII, and F3 is FIII.

Rarely, protein domains with related structures may have arisen independently and converged during evolution toward a particularly favorable conformation. This is the hypothesis to explain the similar folds of immunoglobulin and fibronectin-III domains, which have unrelated amino acid sequences.

Nucleic Acids

Nucleic acids, polymers of a few simple building blocks called nucleotides, store and transfer all genetic information. This is not the limit of their functions. RNA enzymes, ribozymes, catalyze some biochemical reactions. Other RNAs are receptors (riboswitches) or contribute to the structures and enzyme activities of major cellular components, such as ribosomes (see Fig. 17-7) and spliceosomes (see Fig. 16-5). In addition, nucleotides themselves transfer chemical energy between cellular systems and information in signal transduction pathways. Later chapters elaborate on each of these topics.

Building Blocks of Nucleic Acids

Nucleotides consist of three parts: (1) a base built of one or two cyclic rings of carbon and a few nitrogen atoms, (2) a five-carbon sugar, and (3) one or more phosphate groups (Fig. 3-14). DNA uses four main bases: the purines adenine (A) and guanosine (G) and the pyrimidines cytosine (C) and thymine (T). In RNA, uracil (U) is found in place of thymine. Some RNA bases are chemically modified after synthesis of the polymer. The sugar of RNA is ribose, which has the aldehyde oxygen of carbon 4 cyclized to carbon 1. The DNA sugar is deoxyribose, which is similar to ribose but lacks the hydroxyl on carbon 2. In both RNA and DNA, carbon 1 of the sugar is conjugated with nitrogen 1 of a pyrimidine base or with nitrogen 9 of a purine base. The hydroxyl of sugar carbon 5 can be esterified to a chain of one or more phosphates, forming nucleotides such as adenosine monophosphate (AMP), adenosine diphosphate (ADP), and ATP.

Figure 3-14 atp and nucleotide bases. A, Stick figure and space-filling model of ATP. B, Four bases used in DNA. Stick figures show the hydrogen bonds used to form base pairs between thymine (T) and adenine (A) and between cytosine (C) and guanine (G). C, Uracil replaces thymine in RNA. C′₁ refers to carbon 1 of ribose and deoxyribose.

Covalent Structure of Nucleic Acids

DNA and RNA are polymers of nucleotides joined by phosphodiester bonds (Fig. 3-15). The backbone links a chain of five atoms (two oxygens and three carbons) from one phosphorous to the next—a total of six backbone atoms per nucleotide. Unlike the backbone of proteins, in which the planar peptide bond greatly limits rotation, all six bonds along a polynucleotide backbone have some freedom to rotate, even that in the sugar ring. This feature gives nucleic acids much greater conformational flexibility than polypeptides, which have only two variable torsional angles per residue. The backbone phosphate group has a single negative charge at neutral pH. The N—C bond linking the base to the sugar is also free to rotate on a picosecond time scale, but rotation away from the backbone is strongly favored. The bases have a strong tendency to stack upon each other, owing to favorable van der Waals interactions (see Chapter 4) between these planar rings.

Figure 3-15 rotational freedom of the backbone of a polynucleotide, rna in this case. The stick figure of two residues shows that all six of the backbone bonds are rotatable, even the C_4′—C′ bond that is constrained by the ribose ring. This gives polynucleotides more conformational freedom than polypeptides. Note the phosphodiester bonds between the residues and the definition of the 3′ and 5′ ends. Space-filling and stick figures at the bottom show a uridine (U) and adenine (A) from part of Figure 3-17.

(Redrawn from Jaeger JA, SantaLucia J, Tinoco I: Determination of RNA structure and thermodynamics. Annu Rev Biochem 62:255–287, 1993.)

Each type of nucleic acid has a unique sequence of nucleotides. Simple laboratory procedures employing the enzymatic synthesis of DNA allow the sequence to be determined rapidly (Fig. 3-16). All DNA and RNA molecules are synthesized biologically in the same direction (see Figs. 15-11 and 42-1) by adding a nucleoside triphosphate to the 3′ sugar hydroxyl of the growing strand. Cleavage of the two terminal phosphates from the new subunit provides energy for extension of the polymer in the 5′ to 3′ direction. Newly synthesized DNA and RNA molecules have a phosphate at the 5′ end and a 3′ hydroxyl at the other end. In certain types of RNA (e.g., messenger RNA [mRNA]), the 5′ nucleotide is subsequently modified by the addition of a specialized cap structure (see Figs. 16-2 and 17-2).

Secondary Structure of DNA

A few viruses have chromosomes consisting of single-stranded DNA molecules, but most DNA molecules are paired with a complementary strand to form a right-handed double helix, as originally proposed by Watson and Crick (Fig. 3-17). Key features of the double helix are two strands running in opposite directions with the sugar-phosphate backbone on the outside and pairs of bases hydrogen-bonded to each other on the inside (Fig. 3-14). Pairs of bases are stacked 0.34nm apart, nearly perpendicular to the long axis of the polymer. This regular structure is referred to as B-form DNA, but real DNA is not completely regular. On average, in solution, β-form DNA has 10.5 base pairs per turn and a diameter of 1.9nm. Hydrogen bonds between adenine and thymine and between guanine and cytosine span nearly the same distance between the backbones, so the helix has a regular structure that, to a first approximation, is independent of the sequence of bases. One exception is a run of As that tends to bend adjoining parts of the helix. Because the bonds between the bases and the sugars are asymmetrical, the DNA helix is asymmetrical: The major groove on one side of the helix is broader than the other, minor groove. Most cellular DNA is approximately in the β-form conformation, but proteins that regulate gene expression can distort the DNA significantly (see Fig. 15-7).

Figure 3-17 models of β-form dna. The molecule consists of two complementary antiparallel strands arranged in a right-handed double helix with the backbone (Fig. 3-15) on the outside and stacked pairs of hydrogen-bonded bases (see Fig. 3-14) on the inside. Top, Space-filling model. Middle, Stick figures, with the lower figure rotated slightly to reveal the faces of the bases. Bottom, Ribbon representation.

(Idealized 24–base pair model built by Robert Tan, University of Alabama, Birmingham.)

Under some laboratory conditions, DNA forms stable helical structures that differ from classic β-form DNA. All these variants have the phosphate-sugar backbone on the outside, and most have the usual complementary base pairs on the inside. α-form DNA has 11 base pairs per turn and an average diameter of 2.3nm. DNA-RNA hybrids and double-stranded RNA also have α-form structure. Z-DNA is the most extreme variant, as it is a left-handed helix with 12 base pairs per turn. Circumstantial evidence supporting the existence of Z-DNA in cells remains controversial.

DNA molecules are either linear or circular. Human chromosomes are single linear DNA molecules (see Fig. 12-1). Many, but not all, viral and bacterial chromosomes are circular. Eukaryotic mitochondria and chloroplasts also have circular DNA molecules.

When circular DNAs or linear DNAs with both ends anchored (as in chromosomes; see Chapter 13) are twisted about their long axis, the strain is relieved by the development of long-range bends and twists called supercoils or superhelices (Fig. 3-18). Supercoiling can be either positive or negative depending on whether the DNA helix is wound more tightly or somewhat unwound. Supercoiling is biologically important, as it can influence the expression of genes. Under some circumstances, supercoiling favors unwinding of the double helix. This can promote access of proteins involved in the regulation of transcription from DNA (see Chapter 15).

Figure 3-18 dna supercoiling. Electron micrographs of a circular mitochondrial DNA molecule in a relaxed configuration (A) and a supercoiled configuration (B).

(Reproduced, with permission, from David Clayton, Stanford University, Stanford, California; originally in Stryer L: Biochemistry, 4th ed. New York, WH Freeman and Co, 1995.)

The degree of supercoiling is regulated locally by enzymes called topoisomerases. Type I topoisomerases nick one strand of the DNA and cause the molecule to unwind by rotation about a backbone bond. Type II topoisomerases cut both strands of the DNA and use an ATP-driven conformational change (called gating) to pass a DNA strand through the cut prior to rejoining the ends of the DNA. To avoid free DNA ends during this reaction, cleaved DNA ends are linked covalently to tyrosine residues of the enzyme. This also conserves chemical bond energy, so ATP is not required for religation of the DNA at the end of the reaction.

Secondary and Tertiary Structure of RNAs

RNAs range in size from micro-RNAs of 20 nucleotides (see Fig. 16-12) to messenger RNAs with more than 80,000 nucleotides. Because each nucleotide has about three times the mass of an amino acid, RNAs with a modest number of nucleotides are bigger than most proteins (see Fig. 1-4). The 16S RNA of the small ribosomal subunit of bacteria consists of 1542 nucleotides with a mass of about 460 kD, much larger than any of the 21 proteins with which it interacts (see Fig. 17-7).

Except for the RNA genomes of a few viruses, RNAs generally do not have a complementary strand to pair with each base. Instead they form specific structures by optimizing intramolecular base pairing (Figs. 3-19 and 3-20). Comparison of homologous RNA sequences provides much of what is known about this intramolecular base pairing. The approach is to identify pairs of nucleotides that vary together across the phylogenetic tree. For example, if an A and a U at discontinuous positions in one RNA are changed together to C and a G in homologous RNAs, it is inferred that they are hydrogen-bonded together. This covariant method works remarkably well, because hundreds to thousands of homologous sequences for the major classes of RNA are available from comparative genomics. Conclusions about base pairing from covariant analysis have been confirmed by experimental mutagenesis of RNAs and direct structure determination.

Figure 3-19 rna secondary structures. A, Base pairing of Escherichia coli 16S ribosomal RNA determined by covariant analysis of nucleotide sequences of many different 16S ribosomal RNAs. The line represents the sequence of nucleotides. Blue sections are base-paired strands; pink sections are bulges and turns; green sections are neither base-paired nor turns. B, An antiparallel base-paired stem forming a hairpin loop. C, A bulge loop. D, An internal loop. E, A multibranched junction.

(A, Redrawn from Huysmans E, DeWachter R: Compilation of small ribosomal subunit RNA sequences. Nucleic Acids Res 14(Suppl):73–118, 1987. B–E, Redrawn from Jaeger JA, SantaLucia J, Tinoco I: Determination of RNA structure and thermodynamics. Annu Rev Biochem 62:255–287, 1993.)

The simplest RNA secondary structure is an antiparallel double helix stabilized by hydrogen bonding of complementary bases (Figs. 3-20 and 3-21). Similarly to DNA, G pairs with C and U pairs with A. Unlike the case in DNA, G also frequently pairs with U in RNA. Helical base pairing occurs between both contiguous and discontiguous sequences. When contiguous sequences form a helix, the strand is often reversed by a tight turn, forming an antiparallel stem-loop structure. These hairpin turns frequently consist of just four bases. A few sequences are highly favored for turns, owing to their compact, stable structures. Bulges due to extra bases or noncomplementary bases frequently interrupt base-paired helices of RNA.

Figure 3-20 Atomic structure of phenylalanine transfer rna (phe-trna) determined by X-ray crystallography. A, An orange ribbon traces the RNA backbone through a stick figure (left) and space filling model (right). (PDB file: 6TNA.) B, Skeleton drawing. C, Two dimensional base-pairing scheme. Note that the base-paired segments are much less regular than is β-form DNA. (PDB file: 6TNA.)

(B, Redrawn from an original by Alex Rich, MIT, Cambridge, Massachusetts.)

Crystal structures of RNAs such as tRNAs (Fig. 3-20) and a hammerhead ribozyme (Fig. 3-21) established that RNAs have novel, specific, three-dimensional structures. Crystal structures of ribosomes (see Fig. 17-7) showed that larger RNAs fold into specific structures using similar principles. Crystallization of RNAs is challenging, and NMR provides much less information on RNA than on proteins of the same size, so much is yet to be learned about RNA structures.

Figure 3-21 Hammerhead ribozyme, a self-cleaving RNA sequence found in plant virus RNAs. A, Ribbon diagram. B, Space-filling model. The structure consists of an RNA strand of 34 nucleotides complexed to a DNA strand of 13 nucleotides (in vivo, this is a 13-nucleotide stretch of RNA, which would be cleaved by the ribozyme). The RNA forms a central stem-loop structure (stem II) and base pairs with the substrate DNA to form stems I and III. Interactions of the substrate strand with the sharp uridine turn distort the backbone and promote its cleavage. (PDB file: 1HMH.)

(A, Redrawn from Pley HW, Flaherty KM, McKay DB: Three-dimensional structure of a hammerhead ribozyme. Nature 372:68–74, 1994.)

As in proteins, many residues in RNAs are in conventional secondary structures, especially stems consisting of base-paired double helices; however, RNA backbones make sharp turns that allow unconventional hydrogen bonds between bases, ribose hydroxyls, and backbone phosphates. Generally, the phosphodiester backbone is on the surface with most of the hydrophobic bases stacked internally. Some bases are hydrogen-bonded together in triplets (Fig. 3-22) rather than in pairs. Four or five Mg²⁺ ions stabilize regions of tRNA with high densities of negative charge.

Figure 3-22 rna conformational changes. A–B, Molecular models of NMR structures of TAR, a stem-loop regulator of HIV mRNA. Binding of arginine (or a protein called TAT) causes a major conformational change: Two bases twist out of the helix into the solvent (top). U23 forms a base triplet with U38 and A27 (space-filling model), and the stem straightens. This conformational change promotes transcription of the rest of the mRNA. (A, PDB files: 1ANR and 1AKX.) C–E, Guanine-binding riboswitch from Bacillus subtillis. C, Diagram of the mRNA showing the location of the riboswitch just upstream of the genes for the enzymes required to synthesize guanine. At low guanine concentrations, the RNA is folded in a way that allows transcription of the genes. (PDB file: 1U8D.) D, High guanine concentrations (the analog hypoxantine, HX, is shown here) bind to the riboswitch, causing refolding into a terminator stem loop that prevents transcription of the mRNA. E, Ribbon drawing of the crystal structure with bound hypoxanthine.

(C, Reference: Batey RT, Gilbert SD, Montange RK: Structure of a natural guanine-responsive riboswitch complexed with the metabolite hypoxanthine. Nature 432:411–415, 2004. D, Reference: Mandal M, Boese B, Barrick JE, et al: Riboswitches control fundamental biochemical pathways in B. subtillis and other bacteria. Cell 113:577–586, 2003.)

Like proteins, RNAs can change conformation. The TAR RNA is a stem-loop structure with a bulge formed by three unpaired nucleotides (Fig. 3-22). TAR is located at the 5′ end of all RNA transcripts of the human immunodeficiency virus (HIV) that causes AIDS. Bind-ing of a regulatory protein called TAT changes the conformation of TAR and promotes elongation of the RNA. Binding arginine also changes the conformation of TAR.

Like proteins, RNAs can bind ligands. About 2% of the genes in the bacterium Bacillus subtillis are regulated by RNA sequences located in the mRNAs. For example, mRNAs for enzymes used to synthesize purines such as guanine have a guanine-sensitive riboswitch that controls translation (Fig. 3-22C-D). At low guanine levels, the conformation allows transcription. High concentrations of guanine bind the RNA, causing a massive reorganization that blocks transcription. This negative feedback loop optimizes the cellular concentration of guanine.

Carbohydrates

Carbohydrates are a large family of biologically essential molecules made up of one or more sugar molecules. Sugar polymers differ from proteins and nucleic acids by having branches. Compared with proteins, which are generally compact, hydrophilic sugar polymers tend to spread out in aqueous solutions to maximize hydrogen bonds with water. Carbohydrates may occupy 5 to 10 times the volume of a protein of the same mass. The terms glycoconjugate and complex carbohydrate are currently preferred for sugar polymers rather than polysaccharide.

Carbohydrates serve four main functions:

1. Covalent bonds of sugar molecules are a primary source of energy for cells.

2. The most abundant structural components on earth are sugar polymers: Cellulose forms cell walls of plants; chitin forms exoskeletons of insects; and glycosaminoglycans are space-filling molecules in connective tissues of animals.

3. Sugars form part of the backbone of nucleic acids, and nucleotides participate in many metabolic reactions (see earlier discussion).

4. Single sugars and groupings of sugars form side chains on lipids (see Fig. 7-3) and proteins (see Figs. 21-26 and 29-13). These modifications provide molecular diversity beyond that inherent in proteins and lipids themselves, changing their physical properties and vastly expanding the potential of these glycoproteins and glycolipids to interact with other cellular components in specific receptor-ligand interactions (see Fig. 30-12). Conversely, other glycoconjugates block inappropriate cellular interactions.

A modest number of simple sugars (Fig. 3-23) form the vast array of different complex carbohydrates found in nature. These sugars consist of three to seven carbons with one aldehyde or ketone group and multiple hydroxyl groups. In water, the common five-carbon (pentose) and six-carbon (hexose) sugars cyclize by reaction of the aldehyde or ketone group with one of the hydroxyl carbons. This forms a compact structure that is used in all the glycoconjugates considered in this book. Given several asymmetrical carbons in each sugar, a great many stereochemical isomers exist. For example, the hydroxyl on carbon 1 can either be above (b-isomer) or below (a-isomer) the plane of the ring. Proteins (enzymes, lectins, and receptors) that interact with sugars distinguish these stereoisomers.

Figure 3-23 A–C, Simple sugar molecules. Stick figures and space-filling model of d-glucose showing the highly favored condensation of the carbon 5 hydroxyl with carbon 1 to form a hemiacetal. The resulting hydroxyl group on carbon 1 is in a rapid equilibrium between the a (down) or b (up) configurations. The space-filling model of β-d-glucose illustrates the stereochemistry of the ring; the stick figures are drawn as unrealistic planar rings to simplify comparisons. Stick figures show three stereoisomers of the 6-carbon glucose (A), three modifications of glucose (B), a 6-carbon keto sugar condensed into a five-membered ring (C), and two 5-carbon riboses (D).

Sugars are coupled to other molecules by highly specific enzymes, using a modest repertoire of intermolecular bonds (Fig. 3-24). The common O-glycosidic (carbon-oxygen-carbon) bond is formed by removal of water from two hydroxyls—the hydroxyl of the carbon bonded to the ring oxygen of a sugar and a hydroxyl oxygen of another sugar or the amino acids serine and threonine. A similar reaction couples a sugar to an amine, as in the bond between a sugar and a nucleoside base. Sugar phosphates with one or more phosphates esterified to a sugar hydroxyl are components of nucleotides as well as of many intermediates in metabolic pathways.

Figure 3-24 glycosidic bonds. Stick figures show the formation of O- and N-glycosidic bonds and a common example of each: the disaccharide sucrose and the nucleoside cytidine. Enzymes catalyze the formation of glycosidic bonds in cells. The chemical name of sucrose [glucose-a(1→2)fructose] illustrates the convention for naming the bonds of glycoconjugates.

Glycoconjugates—polymers of one or more types of sugar molecules—are present in massive amounts in nature and are used as both energy stores and structural components (Fig. 3-25). Cellulose (unbranched β-1,4 polyglucose), which forms the cell walls of plants, and chitin (unbranched β-1,4 poly N-acetylglucosamine), which forms the exoskeletons of many invertebrates, are the first and second most abundant biological polymers found on the earth. In animals, giant complex carbohydrates are essential components of the extracellular matrix of cartilage and other connective tissues (see Figs. 29-13 and 34-3). Glycogen, a branched α-1,4 polymer of glucose, is the major energy store in animal cells. Starch-polymers of glucose with or without a modest level of branching-performs the same function for plants.

Figure 3-25 examples of simple glycoconjugates. A, Cellulose, an unbranched homopolymer of glucose used to construct plant cell walls. B, Glycogen, a branched homopolymer of glucose used by animal cells to store sugar. Many glycoconjugates consist of several different types of sugar subunits (see Figs. 21-26 and 29-13).

Glycoconjugates differ from proteins and nucleic acids in that they have a broader range of conformations owing to the flexible glycosidic linkages between the sugar subunits. Although sugar polymers may be stabilized by extensive intramolecular hydrogen bonds and some glycosidic linkages are relatively rigid, NMR studies have revealed that many glycosidic bonds rotate freely, allowing the polymer to change its conformation on a submillisecond time scale. This dynamic behavior limits efforts to determine glycoconjugate structures. They are reluctant to crystallize, and the multitude of conformations does not lend itself to NMR analysis. Structural details are best revealed by X-ray crystallography of a glycoconjugate bound to a protein, such as a lectin or a glycosidase (a degradative enzyme).

Sugars are linked to proteins in three different ways (Fig. 3-26) by specific enzymes that recognize unique protein conformations. Glycoprotein side chains vary in size from one sugar to polymers of hundreds of sugars. These sugar side chains can exceed the mass of the protein to which they are attached. Chapters 21 and 29 consider glycoprotein biosynthesis.