Molecular Biology Primer for Neurosurgeons

Published on 13/03/2015 by admin

Last modified 22/04/2025

Print this page

This article have been viewed 1426 times

CHAPTER 3 Molecular Biology Primer for Neurosurgeons

Commenting on the structure and connections of neurons during his acceptance speech for the Nobel Prize in Medicine, Ramon y Cajal proposed that

These fibres, ramifying several times, always proceed towards the neuronal body, or towards the protoplasmic expansions around which arise plexuses or very tightly bound and rich nerve nests [these] morphological structures, whose form varies according to the nerve centres being studied, confirm that the nerve elements possess reciprocal relationships in contiguity but not in continuity. It is confirmed also that those more or less intimate contacts are always established, not between the nerve arborizations alone, but between these ramifications on the one hand, and the body and protoplasmic processes on the other.

With these observations in support of the neuron doctrine in 1906, so began one of the most abiding lines of scientific pursuit in biology—understanding the principal mechanisms by which the fate of central neurons are specified during development at the correct time and place to facilitate the quadrillion synaptic connections that are established, maintained, and remodeled. These arrays of neural networks codify our perceptions and other cognitive functions. To do so, synapses bring together in apposition specialized morphologic structures of the presynaptic, usually axonal, and postsynaptic, typically dendritic, subcellular neuronal domains often ensheathed by the end-feet of astrocytes. The precision and strength of these connections rely on the pinpoint placement of gene products in each of these cellular compartments. Our understanding of neuroscience in these molecular terms has been one of the fundamental challenges in the past several decades but has been complicated by the varying intrinsic properties of neurons—including morphology, types of neurotransmitter release, projection targets, and basic input/output characteristics—that exist along a wide spectrum of neuronal phenotypes even within the same neuroanatomic region.

Over the past several decades, neuroscientists have embraced a rapidly evolving set of molecular biologic techniques to gain insight into understanding these dynamics of gene expression. These findings have been critical in understanding not only the mechanistic underpinnings of normal development but also the role that some genes play in neurological diseases from the developmental to the degenerative. Our purpose here is to introduce a series of core methods in the contemporary molecular neuroscientist’s toolbox. We do so in the broader context of the traditional candidate gene approach and the more recent rise of functional genomics.

The Candidate Gene Approach

Lacking any biochemical basis that would aid in purifying gene products associated with a disease state, the candidate gene approach was largely an outgrowth of the efforts of positional cloning strategies in the early 1980s. Using linkage analysis to look at differences in chromosome structure in diseased versus nondiseased individuals often in tandem with linkage disequilibrium mapping to define broad (i.e., <10 centimorgans) and much finer respective chromosomal intervals associated with a priori knowledge of their etiologic role in a disease state enabled molecular geneticists to narrow the number of candidate causative genes. Before completion of the Human Genome Project, identifying these loci was an incredibly arduous and time-consuming task, in large part because of the lack of high-resolution genetic and physical chromosomal maps. The unprecedented efforts to surmount these struggles were most famously recorded in the midst of the pioneering work of Louis Kunkel and Ronald Worton ¹ in identifying dystrophin as the gene responsible for Duchenne’s muscular dystrophy. In the past several decades, numerous genes with high relative risk for similar “simple” mendelian diseases have been successfully identified with these methods. More than 30% of mendelian disease has neurological manifestations.² By applying the tools of molecular biology to a gene or the small set of candidate genes identified within a quantitative trait loci it has become possible to facilitate a systematic approach to answering some very basic questions about the organization of a gene and the expression of its gene products.

Emerging from the sequence of methods conceptualized first by Sol Spiegelman ³ and most effectively by Edwin Southern,⁴ molecular biologists have exploited the now well-worn principle of molecular hybridization in which a single-stranded nucleic acid probe (or primer) forms a stable hybrid molecule, as a result of nucleotide complementarity, with a single-stranded target sequence immobilized on a solid support (i.e., nitrocellulose or nylon membranes) or in solution (Fig. 3-1). Under the appropriate experimental conditions, the stability and biochemical kinetics of the hybrid are directly proportional to the length and degree of nucleotide complementarity.

FIGURE 3-1 The fundamental tools of molecular biology are based on molecular hybridization. A cartoon schematic of molecular hybridization shows the basic principle in practice illustrated with a portion of genomic DNA. When genomic DNA is transferred to a solid-phase support such as nylon or nitrocellulose in a Southern blot, the DNA is denatured so that labeled probes may hybridize to the single-strand genomic DNA sequence. We highlight three representative areas of the genomic DNA, each having a different sequence. The length of these sequences, as well as the probes, is not drawn to scale and would contain many more base pairs than depicted. A probe, labeled here with biotin, that has nucleotide complementarity to the genomic sequence will bind with specificity. Using this principle in some form, a table of the basic tools of molecular biology is given to provide a cursory overview of the methods and their uses. The maximal sensitivity of the technique is not ordinarily the success that will be found with every experiment; it is the best observed sensitivity.

As first applied in Southern blotting whereby genomic DNA was separated according to size by agarose gel electrophoresis, transferred to nitrocellulose, and annealed to complementary DNA probes labeled with a detectable tag hybridized to target genomic DNA, the dosage or deletion analysis, or both, of candidate genes was affirmed. Indeed, Southern blot analysis of the dystrophin gene has identified duplications as well as mapped various exon deletion mutants. Recently, variation in the copy number of genes has received new consideration inasmuch as several new genomic disorders have been shown to be manifested in a gene dosage–dependent manner.⁵ Such disorders include dup7 (q11.23) syndrome, methyl-CPG-binding protein 2 (MECP2), and adult-onset autosomal dominant leukodystrophy. By reducing the stringency of hybridization conditions, however, differences in hybridization patterns may reveal the existence of certain fragments that are not able to hybridize to the probe under the most stringent hybridization conditions. These data are often the first clues that the gene is part of a larger multigene family that shares significant but not complete nucleotide sequence identity. By using a modification of the Southern blotting method to screen a complementary DNA library at moderate stringency, the dystrophin-like sequence utrophin was identified.⁶

Traditional Southern blotting has largely been supplanted by the advent of polymerase chain reaction (PCR) protocols. The ability to replicate short fragments of DNA with an enzymatic assay first described by Kleppe and colleagues ⁷ and transformed by Mullis and associates’ use of a thermostable DNA polymerase⁸^,⁹ has dramatically reduced lead times in comparison to conventional Southern blotting methods. As with Southern blotting, there must be some knowledge of the target DNA sequence for PCR. In practice, PCR typically requires two primers, one of which is complementary to the 5′ DNA region of interest and the other to the 3′ end of the DNA region. The DNA regions can be any part of a gene, exonic or intronic, although amplifications longer than 10 kilobases become increasingly difficult to isolate. Replication of the DNA fragment, or amplicon, occurs processively as part of a series of 20 to 40 repeated temperature cycles in which one cycle consists of denaturation of the DNA target, annealing of the primers to the single-stranded DNA, and enzymatically catalyzed elongation of complementary DNA by a thermostable DNA polymerase. One noteworthy improvement in the past 15 years has been the addition of thermostable DNA polymerases with separate 3′ to 5′ exonuclease proofreading activity, which has resulted in higher sequence fidelity of the amplified product. Among the multiple variants of the basic PCR methodology that have been developed, identification of allele-specific, single nucleotide polymorphisms (SNPs)¹⁰ and foci of methylated CpG islands¹¹ in genomic DNA are just two. Because of the dependability and efficiency of amplifying a specific locus within the genome, PCR and its basic multiplex versions, which can amplify several amplicons simultaneously,¹² continue to be workhorses for the preparation of genomic samples.

A parallel set of approaches for studying expression of the primary products of gene transcription that involve messenger RNA (mRNA) has enabled neuroscientists to assign critical spatiotemporal information to gene function. The first technique developed to do so, the Northern blot,¹³ differs from the Southern blot in that the target molecules separated according to size by agarose gel electrophoresis under denaturing conditions and transferred to a membrane support for hybridization consist of RNA, whether total RNA or poly A⁺ mRNA. At the organ or tissue level, it remains unmatched in its ability to provide an accurate size of the transcript and quantifiable patterns of gene expression during development or disease states, or both. Furthermore, differences in the size of the transcript within or among samples suggest any number of possibilities, including but not limited to alterative splicing, alternative start sites, or differences in polyadenylation. Typically, medium- and high-abundance messages are readily visualized with labeled probes. However, low-abundance messages can be difficult to detect.

The low-throughput methodology, long lead times, and limits of detection dictated by RNA immobilization on solid-phase supports have largely been overcome by the development of solution-based hybridization techniques that provide at least an order of magnitude more sensitive for detecting mRNA transcripts. Detection of mRNA in its native state by the ribonuclease protection assay adapts the Northern blot approach by hybridizing a labeled complementary RNA probe (usually 100 to 500 base pairs) with high specific activity to unlabeled cellular RNA that is freely suspended in solution.¹⁴ The resulting RNA duplexes, which are highly stable, are protected from digestion by single-strand–specific ribonucleases and separated by polyacrylamide gel electrophoresis (PAGE). Detection of signal by the isotopic or nonisotopic labels offer sensitive and reliable detection and quantitation of mRNA that is up to 50 times more sensitive than Northern blotting.¹⁵ A second solution-based alternative requires conversion of the native mRNA to complementary DNA by reverse transcriptase followed by PCR. Although gene-specific primers can be used for the reverse transcription, a polythymidine oligonucleotide (oligo-dT), random hexamers, or combination of them are most frequently used. Oligo-dT primers readily hybridize to the translationally important poly A⁺ tails present in most mRNA and thereby bias the start site of reverse transcription toward the 3′ end of the mRNA. Conversely, random hexamers are often used when attempting to obtain a target template closer to the 5′ end of long mRNA sequences. Detection of the target sequence can be achieved by standard PCR protocols, whereas multiplex PCR protocols allow the simultaneous detection of several amplicons.¹² However, when quantitation is required, quantitative real-time PCR (qRT-PCR) protocols are necessary.¹⁶^,¹⁷ qRT-PCR requires primers with specific design characteristics and amplicons with no more than 250 base pairs. Before quantitation, it is essential that the primers be exhaustively tested with the target template to ensure that the gene of interest is amplified and is the expected size. Two versions of qRT-PCR are now in use. The most simple qRT-PCR methodology uses a fluorescent dye that emits short wavelengths of ultraviolet spectrum light as a function of the number of amplicons created during each successive cycle when it intercalates with the double-stranded DNA formed during the thermocycling process. The generation of nonspecific, double-stranded PCR products, often referred to as primer-dimers, can interfere with or completely prevent quantitation. A more reliable but more expensive qRT-PCR route uses the addition of a separate sequence-specific RNA- or DNA-based probe modified with a fluorescence reporter at one end of the molecule and a quencher of fluorescence at the other end. In this version of qRT-PCR, the fluorescent reporter probe anneals to the target template somewhere in the target amplicon between the qRT-PCR primers. In intact reporter probes, the fluorescence is quenched. Detection and quantitation occur only as signal is emitted because the fluorescent reporter is physically cleaved from the quencher by the 5′-3′ exonuclease activity of the thermostable polymerase during the elongation step. To ensure accuracy with any of these solution-based mRNA quantitation assays, it is necessary to normalize expression of the target transcript with a stably expressed control gene. These methods are most favored when dealing with low-abundance transcripts and when the cellular RNA material is limited or, in a worst-case scenario, partially degraded.

In the past several years there has been a renaissance in Northern blotting methods for detecting a specific set of small RNA molecules, microRNAs (miRNAs). miRNAs are abundant, single-stranded, 21- to 23-nucleotide RNA molecules processed from endogenous 70-nucleotide pre-miRNAs by two enzymes, Drosha and Dicer.¹⁸^–²⁰ These small noncoding RNAs are recruited to unique ribonucleoprotein complexes, RNA-induced silencing complexes,²¹^,²² where they mediate the translational suppression or degradation of nascent mRNA transcripts bearing homologous antisense sequence in their 3′ untranslated regions.²³ miRNA function appears to have an essential role in neuronal development,²⁴^,²⁵ and recent indications suggest that dysregulation of miRNA networks contributes to neurodegenerative disease.²⁶^–²⁸ As an analytic tool, the popularity of the Northern blot lies in its continued accuracy in estimating the size of the mature miRNA molecule and the ability to detect the pre-miRNA product simultaneously. Enhanced detection advances consisting of cross-linking small RNA molecules (<100 nucleotides) to nylon membranes,²⁹ as well as the use of locked nucleic acids,³⁰^,³¹ to increase probe specificity are among the most recent contributions.

Gene expression observations at the organ or tissue level provide only an aggregate view of gene expression because the robust, cell-specific controls that create numerous subpopulations of individual neurons and nonneuronal cell types with considerable heterogeneity in mRNA expression phenotypes within the same anatomic region are lost. In addition, de novo expression, induction, and repression are rarely observed in the mature nervous system, so the dynamic range of expression is often modest. To determine the cellular resolution of gene expression, in situ hybridization is most often used. As a method for detecting and localizing specific mRNA transcripts in morphologically preserved tissue or cells, in situ hybridization uses a single-stranded complementary DNA or RNA probe that hybridizes to the target endogenous mRNA transcripts. Detecting mRNA transcripts within the cellular cytoarchitecture presents a unique balancing act. Because the mRNA population has not been diluted by mRNA from other cell types, the target mRNA of interest is present at its individual steady-state level. Many times, it is present at higher levels of abundance in specific subpopulations of neurons than one would detect within the whole tissue. However, mRNAs within the normal cellular matrix are but one constituent of a multi-megadalton ribonucleoprotein complex, and consequently their primary sequence is often masked or sterically inaccessible to a probe. Thus, one must fix the mRNA in place while balancing the ability to permeabilize the architecture so that the DNA probes, which are typically oligonucleotides of 20 to 50 bases, PCR-generated probes of several hundred bases, or complementary RNA probes (i.e., riboprobes) that can be routinely made with in vitro transcription kits up to one kilobase long, have access to the target mRNA sequences. However, care must be taken when using longer complementary riboprobes because cross-hybridization with similar sequences in other genes may occur. It is important to isolate gene-specific sequences to use as riboprobes. Typically, two or three separate gene-specific complementary DNA probes or riboprobes targeting the same transcript are used to cross-validate the subcellular distribution. There are two usual controls for specificity. The primary control is a competition control with excess unlabeled probe hybridization followed by labeled probe hybridization and subsequent detection. The DNA probes or riboprobes are complementary to the target mRNA transcripts and thus antisense. Sense controls are also used to show the degree of nonspecific background.

Originally, detection schemes for in situ hybridization used isotopic labels (³³P or ³⁵S) incorporated into the complementary probes. Quantitative methods for converting radioactive signal with silver grain density via photographic emulsion have been widely used since the mid-1990s.³² Nonisotopic detection methods have multiplied over the past 2 decades because they take a considerably shorter time, can have greater signal resolution, and allow the simultaneous detection of multiple different targets by combining various detection methods. Complementary probes labeled with biotin or digoxigenin allow a number of different detection options. Some use colorimetric substrates consisting of horseradish peroxidase or alkaline phosphatase conjugated to streptavidin beads or primary antidigoxigenin antibodies to facilitate detection. Data from the Allen Brain Atlas use just such a colorimetric detection strategy, which provides an increasingly comprehensive data set of expressed gene at the cellular level.³³ One continuing technical concern with enzyme-linked amplification schemes is diffusion of the colorimetric signal from the site of localization. Research laboratories have used fluorescence detection schemes within the past several decades that involve new generations of fluorophores with long-term photostability, such as Alexa dyes or quantum dot (Qdot) nanocrystals; these fluorophores have been key components in illuminating the trafficking dynamics of mRNA molecules within the subcellular compartments of dendrites³⁴ and axons³⁵^,³⁶ via conventional epifluorescent microscopy or single-photon confocal laser scanning microscopy. Quantitative data analysis of these fluorescent images requires the use of image acquisition and analysis software such as Metamorph or IP Lab. In a typical sample, the total fluorescence intensity for a region of interest, normalized against background noise and any differences in the area of the region of interest, is compared across experiments or among samples and subjected to statistical analysis. In contemporary molecular cytogenetics, fluorescence in situ hybridization (FISH)-based karyotyping and banding methods refer to a rubric of techniques used for both clinical genetics and tumor cytogenetics that can simultaneously characterize several chromosomes or chromosomal subregions (Box 3-1).

Box 3-1

Array of Fluorescence In Situ Hybridization–Based Methods in Molecular Cytogenetics

Over the past several decades, G-banding has served as one of the routine standards for chromosome banding. It relies on successful culture of the tissue of investigation, often fetal or tumor tissue, and preparation of metaphase cells. It should be noted that product-of-conception samples in particular suffer relatively high rates of failure (10% to 40%) during the tissue-culturing process ³⁷ and poor chromosome morphology.³⁸ Monochrome changes in the visible karyotype of morphologically optimal samples produced by Giemsa staining can provide low but sufficient chromosomal resolution to distinguish changes in chromosome number and large structural rearrangements whether translocations or macrodeletions. Chromosomes with small translocations, cryptic aberrations, microdeletions, and inversions or more complex karyotypes are often beyond the limits of conventional G-banding analysis.³⁹ As a result, complementary fluorescence in situ hybridization (FISH)-based karyotyping and banding methods have been developed to overcome the intrinsic morphologic and technical obstacles in karyotyping associated with G-banding protocols.

In its most basic form, FISH-based karyotyping uses DNA probes hybridizing to a specific gene or chromosome locus in metaphase or, less commonly, interphase chromosome preparations for the straightforward detection of deletion/duplication syndromes and gene fusions or rearrangements. In contrast to metaphase FISH, interphase FISH does not require the growth of viable cells for the preparation of a chromosomal spread. Interphase FISH makes use of preserved tissue and cellular material such as that found in paraffin-embedded biopsy samples. Perhaps the most significant advance in the past decade or so has been the development of multicolor FISH-based karyotyping and banding techniques (Fig. 3-2). The various iterations of multicolor FISH karyotyping methods are all based on the availability of chromosome-specific probe sets developed from degenerate oligonucleotide–primed polymerase chain reaction of flow-sorted chromosomal libraries⁴⁰^,⁴¹ labeled in tandem with four to seven spectrally separable fluorochrome labels⁴² that simultaneously “paint” and distinguish each whole chromosome.⁴³^,⁴⁴ Chromosome painting with chromosome-specific probes can be done combinatorially (e.g., multiplex FISH or spectral karyotyping) or by using an additional ratiometric approach⁴⁵ (i.e., combined binary ratio–FISH) and has proved to be a powerful adjunct to conventional cytogenetic techniques in the diagnostic analysis of interchromosomal events and characterization of the cytogenetic evolution of tumors, even in complex aberrations. The sensitivity (i.e., whether a translocation can be detected) and specificity (i.e., whether it can classified with assurance) of the analysis are vitally dependent on the fluorochrome combinations used in the target chromosomes.⁴⁶ The resolution of these whole chromosome–painting techniques is limited by the lack of any additional spatial delineation within a chromosome. For this reason, multicolor karyotyping alone is often not sensitive enough to precisely determine chromosomal breakpoints, subtle chromosome rearrangements, or intrachromosomal aberrations (e.g., inversions, duplications, terminal deletions). Differential hybridization signals obtained with other FISH-based karyotype techniques, such as comparative genomic hybridization, have similar resolving capabilities.⁴⁷ Efforts to improve the resolution of these assays have used chromosome arm–specific,⁴⁴^,⁴⁸ region-specific,⁴⁹^,⁵⁰ centromeric,⁵¹ and subtelomeric probes.⁵²^,⁵³ To address the latter issue of intrachromosomal aberrations, FISH-based banding patterns obtained by using differentially labeled and pooled subregional DNA probes were designed to produce high-resolution chromosomal karyotypes with identifiable fluorescent banding patterns within a single chromosome via spectral color banding,⁵⁴ cross-species color banding⁴⁹^,⁵⁵ or multicolor banding.⁵⁰^,⁵⁶

FIGURE 3-2 An idealized graphic comparison of a Giemsa-stained, whole chromosome–painted, and multibanded chromosome. FISH, fluorescence in situ hybridization.

As intermediaries in the continuum between genotype and phenotype, levels of mRNA should be taken as a surrogate for corresponding protein expression or their functional activity with caution. Early data sets attempting to establish a correlation between protein and mRNA levels found varying degrees of concordance. Although a significant positive correlation has been observed in human transitional cell carcinomas,⁵⁷ more marginal grading was observed in a comparative examination of 19 genes in the human liver.⁵⁸ Conversely, in a more limited study of three matrix metalloproteinase–related genes expressed in benign and neoplastic prostate tissue,⁵⁹ no correlative relationships were identified. When these data are placed in the context of the vast regulatory networks that monitor the various posttranscriptional events, this wide variability is not entirely unexpected. A nearly universal theme among the traditional tools for assaying protein expression is the use of affinity between an antibody and an epitope on the target protein to facilitate protein detection. The specificity of the primary antibody is the central determinant in the accuracy of protein recognition. Much like assays for detecting nucleic acids, methods to identify protein expression can be accomplished with the use of blots, free in solution or in situ.

Protein blots, more commonly referred to as Western blots,⁶⁰ immobilize size-separated protein lysates on membranes (e.g., nitrocellulose or polyvinylidene difluoride [PVDF]), where they are detected with a monoclonal or polyclonal antibody.⁶¹^,⁶² In short, cellular proteins solubilized in a lysis buffer containing one or more detergents, protease inhibitors, and more often than not, a strong reducing agent are separated by PAGE. Separation is most commonly accomplished by molecular weight. However, separation in this single dimension may be further refined by prior separation in an immobilized pH gradient gel, and these gels are often referred to as two-dimensional (2D) gels. Although protein lysates can be run under nondenaturing conditions, usually when separating by molecular weight, sodium dodecyl sulfate (SDS) is incorporated in the polyacrylamide gels and the running buffers to maintain the protein lysates in a denatured state. Strong reducing agents such as β-mercaptoethanol or dithiothreitol in the lysis or loading buffer (or in both) eliminate any secondary or tertiary folding that would interfere with protein migration within the polyacrylamide gel. Changes in the percentage of acrylamide within the gel determine how well low- or high-molecular-weight proteins separate within the gel. Gels with a higher percentage of acrylamide resolve lower-molecular-weight proteins better and vice versa. The gel-immobilized proteins are transferred in a glycine-based buffer to nitrocellulose or PVDF membranes. The addition of a primary antibody recognizing the target protein facilitates detection of the protein, most commonly in a multistep (i.e., indirect) detection scheme using fluorescently labeled secondary antibodies or enzyme-linked secondary antibodies and various colorimetric, chemifluorescent, or chemiluminescent substrates. Advances in antibody production offer not only a growing selection of primary immunoreagents but also novel innovations in antibody development that combine the most attractive properties of mouse monoclonal (e.g., uniformity, purity, indefinite availability) with rabbit polyclonal (higher affinity) antibodies.⁶³^,⁶⁴ The choice of primary antibody is not without a caveat. A single primary antibody will rarely share equal affinity for the target proteins in nondenaturing and denaturing conditions and often has to be determined empirically. Signal detection reveals the experimental molecular weight of the protein, which can be compared against the calculated molecular weight of the open reading frame. Although aberrant migration can occur as a result of the intrinsic properties of the protein sequence, significant differences in molecular weight are also suggestive of cotranslational or posttranslational protein modifications. The ability to obtain quantitative information from signal detection is quite limited regardless of the detection scheme, but with proper normalization for loading errors⁶⁵ and effort to increase the dynamic range of signal⁶⁶ by using cameras with charge-coupled devices, there has been considerable improvement. A far more sensitive technique for determining protein abundance is the sandwich enzyme-linked immunosorbent assay (ELISA).⁶⁷ Sandwich-ELISA methodology requires a capture antibody that recognizes the protein of interest. It is adsorbed to the surface of a microtiter plate and flooded with the presence of protein lysate. Any antigenic sequence within the protein of interest binds to the capture antibody. A detecting antibody that also recognizes the protein of interest, although at a different epitope not occluded by the capture antibody, is added to facilitate detection of the protein. Once an enzyme-linked secondary antibody is bound to the detecting antibody, an inert substrate of the enzyme is cleaved to create fluorescent or chemiluminescent signal that can be quantified to determine the concentration of the antigen.

Another widely used protein detection method is immunoprecipitation.⁶⁸ The principal methodology is very simple. An antibody recognizing the target protein is incubated with cell lysates in solution in the presence of protease inhibitors and allowed to form an immune complex, which can then be “precipitated” out of the solution by using protein A/G–coupled agarose beads. Proteins not captured by the primary antibody and sequestered within the immune complex are washed away. The immune complexes (antibody and antigen) are eluted from the protein A/G–coupled beads, separated by SDS-PAGE, and analyzed by Western blot to verify the identity and quantity of the target protein. One important variable is the composition of the immunoprecipitation lysis buffer. Ideally, the lysis buffer is able to balance the solubilization of proteins from the original tissue or cell matrix while leaving their native conformation intact.⁶⁹ By so doing, immunoprecipitation protocols seek to physically isolate the protein of interest from the rest of the cell lysate while retaining the protein-protein interactions of the target antigen. One criticism of this methodology has focused on the potential for identifying false-positive co-immunoprecipitation protein partners because of the lysis step, which allows all the solubilized cellular constituents to mix in solution. This mixing may produce nonphysiologic interactions because proteins normally compartmentalized within the cellular milieu or expressed in different cell types are now allowed to interact. As a result, validation of such protein-protein interactions by parallel methods will be required (Box 3-2).

Box 3-2

Protein Domains and Selected Parallel Methodologies for Assessing Protein-Protein Interactions

The three-dimensional structure of a protein is often scattered, with various protein modules that can fold and, in some cases, autonomously retain the function of the rest of the protein chain.⁷⁰ Numerous protein-protein interaction domains and their sequence boundaries have been identified by structural data from x-ray crystallography and nuclear magnetic resonance, but obtaining these data is time-consuming and labor-intensive. By extrapolating from these data, mathematical algorithms have been constructed to query the primary amino acid sequence for the presence of and boundaries for various types of protein-protein–interacting domains and interfaces. Meanwhile, curated databases have been assembled to identify and catalog many of the physical interactions among pairs or larger groups of proteins.⁷¹ Identification of direct interacting partners will require multiple ways of verifying any interaction. A selected set of protein-protein interaction methodologies that are, to varying degrees, complementary with immunoprecipitation are presented in the following sections.

Protein Affinity Chromatography/Pull-Down Assay

A protein is covalently or noncovalently coupled to a solid-phase matrix, such as cyanogen bromide (CNBr), and protein lysates are run through the column.⁷² Weakly retained and strongly retained protein can be eluted under differing salt conditions. These columns can be incredibly sensitive with detectable binding constants as weak as 10⁻⁵ M. However, retaining the native structure, activity, and proper orientation of the protein when directly immobilized to a CNBr matrix can be difficult and result in contradictory results when compared with other methods.⁷³ As an alternative, the target protein is expressed with a suitable affinity tag (e.g., poly-His, biotin, GST, FLAG, c-myc) and immobilized on agarose-Sepharose supports by its respective ligand (e.g., Ni²⁺, streptavidin, glutathione, or monoclonal antibodies specific to FLAG or c-myc). The interacting partner can be identified by Western blots, direct sequencing, or mass spectrometry.

Tandem Affinity Purification

This protein affinity assay requires placing two tags such as protein A and the calmodulin-binding protein on a bait protein separated by a TEV protease cleavage site. It reduces nonspecific pull-down of proteins through successive rounds of purification, but it does so at the expense of transient protein-protein interactions.⁷⁴^,⁷⁵ A particularly successful application of the tandem affinity purification system in combination with mass spectrometry was used for the systematic analysis of yeast protein complexes.⁷⁶

In Vivo Cross-Linking

False-positive results generated by the loss of spatial organization during the lysis step are mitigated by chemical and photo cross-linking to covalently “freeze” the protein-protein interactions in situ before isolation. Chemical cross-linking ⁷⁷^,⁷⁸ typically uses the exogenous introduction of a variety of homo- or hetero-bifunctional cross-linking reagents. In contrast, photo cross-linking uses modified versions of leucine (photo-leucine) and methionine (photo-methionine) containing a photoactivatable diazirine ring incorporated into growing peptide chains by the nascent translation apparatus.⁷⁹ The addition of either amino acid does not alter the cell’s metabolism, but when the cells are exposed to ultraviolet light, highly reactive intermediates enable identification of protein-protein interactions when used in combination with immunoprecipitation, Western blotting, or mass spectrometry.

Label Transfer

Label transfer is a variation of the cross-linking methodologies that is especially useful for detecting weak or transient protein-protein interactions. It involves coupling of an isotopic or nonisotopic label transfer reagent into the bait protein and incubating it with an unlabeled protein lysate. When exposed to ultraviolet light, any interacting proteins are cross-linked by the label transfer reagent. The actual label transfer occurs when the cross-linker is cleaved so that the label is left attached to the interacting protein, where it can be detected by separation via sodium dodecyl sulfate–polyacrylamide gel electrophoresis and Western analysis, sequence analysis, or mass spectrometry.⁸⁰^,⁸¹

Yeast Two-Hybrid Screening

Pioneered by Fields and Song,⁸² the yeast two-hybrid screening technique uses transcriptional activity to assess protein-protein interactions. The premise of the technique is activation of a downstream reporter gene (e.g., β-galactosidase, secreted alkaline phosphatase, luciferase) that is easily assayed by binding of the Gal4 binding domain (BD) and Gal4 activating domain (AD) proteins to an upstream activating sequence. The BD and AD do not need to directly bind but are brought together in apposition with suspected interacting proteins fused with the AD or BD. Although useful as a complementary tool, this technique has a notoriously high false-positive rate.

Fluorescence Resonance Energy Transfer

Intermolecular fluorescence resonance energy transfer (FRET) is a spectroscopic method that assesses the proximity and relative angular orientation of two different proteins—one having a donor fluorophore such as cyan fluorescent protein and the other having an acceptor fluorophore such as yellow fluorescent protein—by monitoring the emission spectra when laser excitation of the donor protein is applied.⁸³^–⁸⁵ When the fluorophores are separated, only the emission spectra of the donor protein is observed (Fig. 3-3). When in close enough proximity, laser excitation of the donor fluorophore transfers the excited energy state to the acceptor fluorophore and generates a peak in its emission spectra. The two principal reasons why this technique is proving to be an extremely valuable tool for probing protein-protein interactions are that (1) the efficiency of this transfer is extremely sensitive to the separation in distance between the two fluorophores and (2) the range over which the transfer in the excited energy state can occur is spatially delimited to approximately 10 nm.

FIGURE 3-3 Intermolecular fluorescence resonance energy transfer. Two hypothetical fusion proteins are depicted. One is fused in frame with blue fluorescent protein (BFP) and the other with green fluorescent protein (GFP). The binding pocket amino acid residues responsible for the protein-protein interaction are highlighted by white shading. On laser excitation at a wavelength of 370 nm, only the BFP would emit a signal in the range of 440 nm. Very little or no GFP emission in the 510-nm range would be observed. However, if the protein-protein interaction brings the fusion protein to within 10 nm of each other, laser excitation would excite the BFP excitation is not seen as emission spectra in the 440-nm range. Rather, the energy of excitation is transferred to the GFP, emission is observed at a wavelength of 510 nm.

Determining the subcellular distribution of protein expression from the results of Western blotting or immunoprecipitation is constrained by the type of lysate used. Differential centrifugation protocols are the simplest way to fractionate various organelles, membranes, and subcellular structures as starting material for these protein assays, but residual contaminants interfere with detection of the nuanced changes in subcellular localization and intercompartmental translocations that are thought to be crucial to cellular information-processing networks.⁸⁶ In situ visualization of protein expression in sections of frozen or paraffin-embedded fixed tissue (i.e., immunohistochemistry) and its cognate technique in fixed cells in culture (i.e., immunocytochemistry) are robust and dynamic methods for identifying and quantifying alterations in patterns of localization in the various compartments and subcellular structures intrinsic to nerve cells. They apply a common methodology based on the ability of a primary antibody to bind to endogenous proteins expressed within its native cytoarchitectural matrix. Primary antibodies can be directly conjugated to enzymes or fluorophores to facilitate detection. More often, because of reduced cost, labeled secondary antibodies are used with colorimetric or indirect immunofluorescence visualization schemes to provide quantifiable patterns of protein distribution. Indirect immunolabeling of multiple primary antibodies, which is most easily accomplished when the primary antibodies are raised in different species, can be used to correlate the colocalization of additional proteins when the emission spectra of the fluorophore-conjugated secondary antibody are separable. Because the maximal optical spatial resolution defined by Rayleigh scattering is approximately 200 nm for conventional wide-field fluorescence microscopic techniques, colocalization is suggestive of, although not formal proof of a protein-protein interaction.

The microinjection of a directly conjugated, high-quality antibody presents an opportunity to visualize protein expression dynamics in live cells. Recent improvements in neuronal transfection techniques frequently make heterologous expression of DNA constructs a more consistent option. A chimera of the target protein fused with a fluorescent protein (FP),⁸⁷^,⁸⁸ such as GFP, DsRed, or their variants, with increased brightness and folding efficiency⁸⁹ has the enviable characteristic of requiring no cofactors other than O₂. When directly imaged, changes in the subcellular distribution of the chimeric protein can be quantified in largely the same manner described for FISH-based mRNA trafficking dynamics in neuronal processes. There persist three experimental concerns, only one of which is technical, that limit the usefulness of this type of transfection approach. The technical limitation is the size of the chimera sequence itself. Ordinarily, mammalian expression constructs are only efficient at driving overexpression of DNA constructs that are less than about 10 kilobases in size. The additional concerns are biologic and twofold. First, does overexpression of the DNA construct lead to ectopic expression of the target chimera? Second, does expression of the FP tag interfere in any way with the functional activity of the target protein? Most autofluorescent fusion proteins have been constructed by placing the coding sequence of the target protein upstream or downstream of the FP sequence. When either of these locations has a negative impact on target protein function, the alternative is to insert the FP sequence within the open reading frame. Lacking any previous knowledge of where to place the FP within the open reading frame, finding a permissive site is guesswork at best. One strategy to overcome the pitfalls of performing sequential insertions to create a functional FP fusion uses the bacterial Tn5 transposon to create libraries of FPs that can then be individually isolated and tested.⁹⁰

When interfering with the activity of the target protein is the desired goal, there is no equivalent instrument in the molecular neurobiologist’s toolbox to genetic techniques that precisely and completely eliminate gene function.⁹¹ One now widely disseminated nucleic acid–based gene-specific silencing method to partially silence gene function uses the RNA interference (RNAi) pathway.⁹² Small interfering RNAs (siRNAs) are 20- to 25-nucleotide, double-stranded effector molecules of RNAi. They are integrated into the RNA-induced silencing complex in the same fashion as miRNA (Fig. 3-4),⁹³^–⁹⁵ where they become juxtaposed with endogenous mRNA, base-pair with perfect homology to the mRNA, and cause it to be cleaved by argonaute-2 (Ago-2), the catalytic component of the RNA-induced silencing complex (RISC). In primary neuronal or astrocyte culture, transient transfection with the siRNA duplexes or with plasmid-encoded short-hairpin RNA (shRNA), which expresses the siRNA duplex as a hairpin suitable for processing with the enzyme Dicer, is the most common course of action. In vivo delivery of siRNA has proved successful with plasmid- or viral vector–encoded shRNA via a number of local⁹⁶^–¹⁰⁰ or systemic^96,¹⁰¹^–¹⁰³ applications. Although the empirical rules for rational siRNA design and selection prediction algorithms improve specificity,¹⁰⁴ the efficacy of the knockdown can vary as a function of whether individual or pooled sets of siRNA are used and the potency of this siRNA. The most attractive potential of RNAi is the flexibility that it allows in controlling the spatial and temporal effects of inhibition. With the development of inducible siRNA whose expression is controlled by tetracycline- or doxycycline-regulated promoters,¹⁰⁵^–¹⁰⁹ photoactivated versions of “caged” siRNA,¹¹⁰ and focal transfection methods,¹¹¹ stepwise advances to this promise are being realized. Although the low to moderate concentrations of siRNA typically used to produce significant knockdown tend to evade interferon response–mediated changes in global gene expression,¹¹²^,¹¹³ a secondary effect has on occasion been noted.¹¹⁴^,¹¹⁵ A more acute concern is the possibility of an siRNA modulating the expression of a closely related sequence¹¹⁶^–¹¹⁸ and resulting in observable changes in phenotype.¹¹⁹ Chemical modifications of the siRNA can mitigate some of these off-target effects,¹²⁰ but the exact nature of siRNA specificity remains unclear (see Elbashir and colleagues¹²¹ and Miller and associates,¹²² but compare with Semizarov and coworkers¹²³). Until a better consensus of siRNA specificity is reached, current siRNA design suggests allowing for at least two nucleotide mismatches with all off-target genes. A recent editorial suggests that the ideal control is to rescue the siRNA phenotype by using an siRNA-resistant gene with a silent mutation in the 3′ nucleotide of a codon in the middle of the siRNA binding site.¹²⁴

FIGURE 3-4 Small interfering RNA (siRNA) and microRNA (miRNA) processing pathway using the RNA-inducing silencing complex (RISC). RISC mediates the functional activity of both siRNA and miRNA. Short-hairpin versions of siRNA or endogenous miRNA are recruited to the RISC, which in mammals is composed of three core proteins (e.g., Dicer, TRBP, and argonaute-2 [Ago-2]) and associate with each other in the absence of double-stranded RNA (dsRNA).⁹³^,⁹⁴ Synthetic siRNA duplexes, which are initially processed by an endogenous kinase, have a number of structural features that we highlight. The 21- to 23-nucleotide complexes have areas of high (highlighted by the green shaded box) and low (highlighted by the yellow shaded box) thermodynamic stability that dictates which strand is eventually incorporated into the RISC complex as the guide strand. These siRNA duplexes typically also incorporate a pair of thymidine base pairs as 3′ overhangs of each strand. After incorporation into the RISC, the dsRNA of both siRNA and miRNA is then unwound; only the guide strand is loaded onto Ago-2, whereas the passenger strand is released.⁹⁵ On binding of mRNA with complete complementarity (siRNA) or with nucleotide mismatches (miRNA or, not pictured, siRNA), this dsRNA is able to influence gene function. siRNA can bind anywhere in the messenger RNA (mRNA), whereas miRNA is thought to bind primarily in the 3′ untranslated region.

The Rise of Functional Genomics

Candidate gene studies take advantage of two lines of evidence that dovetail to increase success: the increased efficiency of association studies in selected population-based samples and an a priori understanding of the clinical phenotypes and how it might be affected by candidate gene function. However, this approach has met with mixed success when assessing complex diseases in which multiple genes, as well as their sequence and functional variants, probably initiate small individual contributions and relative risk for a cumulative phenotype that varies in the severity of symptoms and age at onset and evolves over time. Lacking the tools of scale to perform the simultaneous analyses required, continuing efforts toward miniaturization and scalability epitomize the new “omics” technologies that are transforming nervous system studies by allowing data-rich and detailed characterization of the molecular mechanisms underlying cell physiology. Ironically, it does so by using the very same methods of biochemistry, molecular biology, and cell biology worked out decades earlier. At its core, functional genomics aspires to integrate data from the study of different molecular strata—the genome, transcriptome, proteome, metabolome, and their regulatory mechanisms—into a systems-level model of cell biology. The ostensible goal is to obtain a richly detailed, global understanding of the nervous system’s emergent properties through the interactions among all its constituent elements.¹²⁵ In so doing, it promises to expand our insight into the root problems of complex diseases and transform the current predictive power of our diagnostic and therapeutic regimens.¹²⁶

Transcriptomics

Evolving technologic innovations fostering miniaturization, increased scalability and efficiency, and decreased cost born from the Human Genome Project now exploit this wealth of genomic data. Gene expression profiling ¹²⁷^,¹²⁸ is the most widely used functional genomics technology due in equal parts to its early development and the ease with which it can be performed. High-density DNA microarrays anchor cDNA¹²⁹ or oligonucleotides¹³⁰^,¹³¹ of different genes in massively parallel arrays of up to approximately 10,000 spots/cm² and greater than 1,000,000 spots/cm², respectively, on a glass surface. Using the principle of molecular hybridization, several micrograms of fluorescently labeled RNA or cDNA probes hybridize with the target DNA and are analyzed with high-resolution scanners that optically detect the strength of fluorescent signal from the bound probes. Raw data are normalized and processed through a series of statistical approaches to determine whether any gene is differentially expressed. The probes are constructed from abundant sources of high-integrity mRNA, which is reverse-transcribed in the presence of fluorophore-coupled deoxyribonucleoside triphosphates (dNTPs) or amino-allyl–labeled dNTPs that can be coupled to a fluorophore such as Cy3 or Cy5. In postmortem tissue, even with extended postmortem intervals of up to 30 hours, high-integrity mRNA can be isolated for microarray sample preparation. However, it is not the postmortem interval inasmuch as it is the pH of the tissue (great than 6.25) that determines mRNA integrity.¹³² Alternatively, when confronted with small amounts of starting material, it may be necessary to amplify the mRNA population so that there will be enough material to drive the hybridization reaction when labeled without skewing the complexity of the original mRNA population.¹⁶ Because exponential amplification techniques, such as PCR, do not offer this capability, a linear amplification technique, aRNA amplification, enzymatically converts the mRNA population into a single-stranded cDNA with reverse transcriptase by using a specialized oligo-dT primer containing a 3′ T7 RNA polymerase promoter sequence. When a complementary second strand is synthesized, it serves as a transcription template for T7 RNA polymerase to produce RNA oriented in the antisense direction, which can be labeled directly or converted into labeled cDNA as a probe. Amplification of mRNA by manual harvesting of individual live neurons^128,^133,¹³⁴ or dendrites,¹³⁵^,¹³⁶ as well as neurons in fixed tissue,¹³⁷^,¹³⁸ has been equally as successful as automated approaches such as laser capture microdissection.¹³⁹

Experimental artifacts can be introduced by sample preparation (e.g., differences in the integrity of mRNA or in the efficiency of labeling), the array (e.g., DNA spotting or printing errors), or processing (e.g., variable fluorescence scanner performance). Careful experimental design (e.g., checking the integrity of the mRNA before use, dye swaps and parallel processing of samples to control for labeling efficiency, quality control experiments for each lot of custom and commercial microarrays) can largely eliminate these issues. Reducing systemic biases in the results requires the optical data to be normalized at the global level to facilitate comparisons across microarray experiments and at the local level to account for individual variations in signal intensity that are unique to the surface of that microarray.¹⁴⁰^,¹⁴¹ Common approaches on how best to apply these normalization procedures to distribution of the data are available in the form of open source and open development software offered by the Bioconductor project. When analyzing data, the most conservative treatment of it uses a Bonferroni correction to reduce possible false-positive errors. However, this correction for multiple measurements can also lead to increases in the number of false-negative errors. Various data analysis software packages try to balance the discovery rates of these two types of errors. As with all high-throughput assays, these data should be verified with other techniques. Although the current generation of high-throughput, low-cost cDNA or oligonucleotide microarrays is the primary laboratory workhorse for quantifying the transcriptome, next-generation technology is already on the horizon (Box 3-3).

Box 3-3

Next-Generation Technology for mRNA Gene Expression

Sources of noise within the experiment can be controlled, in part, by the normalization techniques discussed earlier. However, there are several notable technical limitations of DNA microarrays at present: high background levels as the result of cross-hybridization by multiple targeting probes ¹⁴² and low concordance (≈30% to 40%) of transcript detection between platforms.¹⁴³^–¹⁴⁵ In the former circumstance, the probe will normally bind with high specificity to an arrayed target sequence. Off-target effects such as cross-hybridization occur when a flanking, unbound sequence of the same probe binds weakly to an adjacent arrayed sequence. The resulting background noise contributes to the relatively small dynamic range (≈2 orders of magnitude) of signal detection in microarrays, although in the latter circumstance the weak overlap in mRNA expression profiles across different microarray platforms is a consequence of unpredictable intramolecular folding events in some long probes¹⁴⁶ and hybridization differences driven by the use of different sequences for the same target gene on various platforms. The cross-platform differences can be improved by using the RefSeq database for gene matching¹⁴⁷ and still further when expression patterns are analyzed only when target sequences between platforms overlap.¹⁴⁸

Next-generation technologies determine the identity of the mRNA transcript with the use of highly paralleled, direct sequencing methods.¹⁴⁹ Although expensive at the moment, RNA sequencing (RNA-Seq) has several clear advantages over the current array-based methods, including low background noise, a large dynamic range (up to ≈3.5 orders of magnitude), and single–base pair resolution, which allows the ability to distinguish different isoforms (i.e., splice variants) and allelic expression (i.e., single nucleotide polymorphisms or structural variants, including insertion-deletions and copy number variations) of the same mRNA without subsequent need for any specialized normalization. Briefly, the total or poly A⁺ mRNA population is converted into a library of short, adaptor-modified cDNA (200 to 500 base pairs) that is compatible with deep-sequencing instrumentation. In the case of a population enriched in poly A⁺ mRNA, there are two general paths for processing. The poly A⁺ mRNA can be fragmented first, usually by hydrolysis or nebulization, and then ligated to adaptors and reverse-transcribed into cDNA. Conversely, the poly A⁺ mRNA can be ligated to the adaptor, reverse-transcribed into cDNA, and then fragmented with DNAse I or sonication. When fragmented at the RNA level, there is limited bias over the length of the transcript.¹⁵⁰ However, fragmentation at the cDNA level greatly biases the readable sequence toward the 3′ end.¹⁵¹ Depending on the amount of input mRNA, amplification of the population may or may not be necessary. These libraries can generate millions of short reads that typically vary from 30 to 250 base pairs by 454, Solexa, or SOLid sequencing and can be compared against the genomic sequence or the coding sequencing of a gene.

There are two principal areas where highly paralleled mRNA expression technology has proven utility. One is as a comparative expression profile or signature profile. This genome-wide molecular fingerprint provides a distinctive pattern of gene expression that can be used as a comprehensive framework for assessing differences in classes of neurons ¹³⁴ and astrocytes,¹⁵² during development in myoblasts,¹⁵³ and in genetic mutants in model systems.¹⁵⁴ It has also been used to evaluate the secondary effects of drug compounds on regulation of gene expression.¹⁵⁵ Signature profile comparisons of cells under different stimulation conditions¹⁴¹ or environmental influences¹⁵⁶ that expand the complexity of the mRNA populations, or “expression space,” have colloquially been referred to as “exercising the genome.”¹⁵⁷ The Human Genome Project estimated the total number of human genes expressed in the central nervous system to be approximately 25,000 to 30,000.¹⁵⁸ However, alternative splicing, which is thought to occur in 92% to 94% of genes¹⁵⁹ and is often subverted in disease,¹⁶⁰^,¹⁶¹ generates further heterogeneity in the mature mRNA population from a single pre-mRNA transcript.¹⁶² The patterns of expression of alternative splice forms are strongly correlated across different tissues, thus suggesting the presence of tissue-specific regulatory mechanisms.^159,^163,¹⁶⁴ Because classic microarray designs do not incorporate this additional level of mRNA complexity, filling this gap in signature profile data is a series of new alternative splicing arrays¹⁶⁵^–¹⁶⁷ with promising insight into transformation of Hodgkin’s lymphomas¹⁶⁸ and gliomagenesis.¹⁶⁹ Adaptations of DNA oligonucleotide microarray methods have also been made to signature-profile the expression of miRNA in low-density microarrays.¹⁷⁰^,¹⁷¹

Most highly paralleled gene expression studies make no a priori hypotheses about which individual genes are regulated among comparison sample sets. It is done within the framework of a systems biology approach in which it is assumed that transcription occurs with a finite set of resources. Thus, a change in the mRNA transcription of one gene will have collateral, sometimes seemingly stochastic influences on other mRNA. A second application of gene expression attempts to mine the comparative expression profiles for information on transcriptional regulatory networks. Data mining of signature profile results can identify clusters of mRNA to be transcriptionally active or silent. The genomic sequences of this mRNA are then analyzed for the presence of shared promoter elements that might contribute to the levels of expression. This sequence analysis is usually paired with direct analysis of promoter occupation by the suspected transcription factor via chromatin immunoprecipitation (ChIP).¹⁷²^,¹⁷³ In a typical ChIP assay, the DNA-protein interaction is cross-linked by formaldehyde in situ to fix the interaction, although this step can be omitted when analyzing histone-DNA interactions (referred to as a native-ChIP). The DNA is then fragmented into approximately 500–base pair stretches by sonication or enzymatic digestion. The cross-linked transcription factor is used as an epitope to immunoprecipitate the complex. Antibodies for this purpose are often prequalified by commercial suppliers because they must be of very high quality. The cross-linking in the isolated complex is reversed wherein the DNA sequence of the chromatin fragment is identified by direct sequencing (ChIP-seq), a PCR-based method (ChIP-display),¹⁷⁴ or most commonly, hybridization to a tiling array (ChIP-on-chip or ChIP-chip) for genome-wide detection. Tiling arrays are a relatively recent variation of microarrays with many design considerations that contain short (≈25 base pairs) oligonucleotides, or tiles, of nonrepetitive regions of genomic sequences that are arrayed linearly (i.e., contiguous sequence end to end or separated by five nucleotides) or with a fractional offset (i.e., overlapping genomic sequence tiles) for higher resolution studies.¹⁷⁵ One important adjunctive function of ChIP-chip studies is the ability to establish the presence of the possible epigenetic effects of histone modifications and, by extension, nongermline DNA methylation.¹⁷⁶^–¹⁷⁸ It is important to note that these descriptive studies of transcription factor occupancy alone do not indicate the efficacy of the interaction on transcription. However, when integrated with mRNA expression profiling, it is possible to identify functional regulatory network motifs, which is the aim of the ENCODE (Encyclopedia of DNA Elements) Project.¹⁷⁹

This type of analysis, often enhanced by expression profile comparisons with genetic methods (Box 3-4) that create overexpression or null mutation phenotypes of the transcription factors that bind to the promoter loci, are more likely to reveal the presence of multitiered regulatory networks. A good example of this is maintenance of a human embryonic stem (ES) cell phenotype. ES cells maintain their pluripotency and ability to self-renew by maintaining a feedforward transcriptional regulatory network that requires the OCT4-SOX2 complex to autoregulate its own expression, as well as initiate expression of NANOG.¹⁸⁷^,¹⁸⁸ These transcription factors interact to maintain ES cells in the undifferentiated state by repressing the activation of a host of other transcription factors, including key homeodomain proteins, while activating the transcription of another set of transcription factors, including REST, SKIL, and STAT3.¹⁸⁹

Box 3-4

Genetic Tools for Modifying Gene Function

RNA Interference

The processing of hairpin microRNA (miRNA) or plasmid-derived short-hairpin RNA by the enzymes Drosha and Dicer leads to the generation of small interfering RNA (siRNA)—short, double-stranded 21– to 23–base pair RNA with symmetrical 2- to 3-nucleotide 3′ overhangs. Synthetic siRNA gains functional activity by an endogenous kinase that modifies the 5′ hydroxyl groups to phosphate groups. These RNA duplexes are recruited into the RISC complex, where they are guided to endogenous mRNA. On base pairing, the transcript is translationally silenced or cleaved by the catalytic component of the RISC. Although commonly used in cell culture models to reduce gene expression, it has been applied in embryonic stem cells to inactivate genes in a heritable fashion, but without the complete functional reduction in gene expression.

Insertional Mutagenesis

In contrast to chemical mutagenesis, insertional mutagenesis is a transposon-based technique for generating gene disruptions by inserting a molecular tag randomly ¹⁸⁰ or, more recently, by using targeted methods that combine the transposon with a DNA-binding domain.¹⁸¹ In the randomized version, mapping the site of insertion is required to determine where it occurred and whether the insertion site will generate any dysfunction and, if so, the severity of dysfunction in protein activity.

Homologous Recombination

This most precise and elegant method for altering gene function requires a DNA construct to align with the targeted gene of interest, by mechanisms still poorly understood but probably similar to the alignment of homologous chromosomes during meiosis and mitosis (Fig. 3-5). The recombination event, which is most efficient in yeast and mice but much less common than random insertion events, takes place anywhere in the flanking homologous sequence. The DNA construct contains both a positive (i.e., neomycin) and a negative (i.e., thymidine kinase gene) selection marker to select for homologous recombination events and against nonhomologous recombination, respectively. The neomycin selectable marker by itself, in traditional knockout strategies, causes a significant disturbance in gene function when introduced into an intron. Over the past decade, site-specific recombinase (SSR) systems have allowed geneticists to conditionally express or silence targeted genes, which can be exogenously engineered transgenes encoding reporter, sensor, or effector molecules.⁹¹^,¹⁸² In approximately 15% of conventional transgenics, embryonic lethality is an issue. The basic concept of conditional transgenes evolved from this obstacle. The most commonly used SSRs are Cre (causes recombination of the bacteriophage P1 genome) and Flp (named for its ability in Caenorhabditis elegans to invert a gene). Three versions of Flp are currently in use (e.g., enhanced Flp, Flp-wt, and low-activity Flp) and have a dynamic range of activity across them of more than 1 order of magnitude. Each of the SSRs catalyze recombination events at specific DNA target motifs built into the DNA construct before homologous recombination. For Cre, that site is loxP. The cognate site for wild-type Flp or any of its variants is FRT. These SSRs possess a combination of fortuitous characteristics: neither of their DNA target motifs are found naturally in mice, and they catalyze the recombination between these target sequences with efficiency and reliability and do so without the need for any additional cofactors. In traditional gene-targeting deletions, the neomycin marker can interfere with the phenotype by influencing the expression of nearby genes.¹⁸³ Removal of the selectable marker is one of the most obvious applications of the SSR system and requires only inserting loxP sites flanking the neomycin cassette.¹⁸⁴^,¹⁸⁵ Conditional transgenesis can remove or repair the gene of interest. In the former, a single SSR strategy has loxP sites that remove both the neomycin selectable marker and the exon to be deleted. Partial Cre excision occurs by transient expression of the recombinase in recombinant cells after selection, thereby leaving a conditional null allele. The same situation can be accomplished by using both the Cre and Flp recombinases in tandem. For repairing gene functions, we show two hypothetical approaches. The relative strength of perturbing gene function with a neomycin cassette can be enhanced significantly when the cassette is oriented in the reverse direction of the target gene. Excision of the reverse-orientation neo^r marker is accomplished by flanking loxP sites. An alternative tactic uses a synthetic stop sequence and a positive selection marker placed between the 5′ untranslated region and the start codon¹⁸⁶ with dual SSRs.

FIGURE 3-5 Mechanism of homologous recombination and the utility of site-specific recombinase approaches. A, Homologous recombination uses the presence of both a positive (neomycin [neo^r]) and a negative (herpes simplex virus thymidine kinase gene [tk^HSV]) selection marker to identify recombinant cells. Recombinant cells in which one allele is disrupted are conferred resistance to G-418 as a result of the neo^r marker. Unlike its mouse counterpart, tk^HSV can convert the nucleotide analogue ganciclovir. Nonhomologous insertions will include the tk^HSV gene, thus making only these cells sensitive to ganciclovir. Untranslated regions are represented in orange boxes before the start codon (ATG) or after the termination codon (TAG). An intervening exon is denoted in red and targeted for replacement with the neo^r cassette. B, Various site-specific recombinase strategies are shown starting with the structure of the targeted allele after a homologous recombination event. These strategies entail removing the selection marker cassette used during normal knockout transgenesis, as well as genetic manipulation to allow the conditional disruption or repair of gene function. Arrowheads represent loxP sites, whereas stars represent FRT sites.

Although these types of analysis were most easily performed with microarray-based signature profiles in model systems such as yeasts early on,¹⁹⁰^–¹⁹² these methods have gained traction in mammalian models for identifying transcriptional regulatory networks in dopaminergic neurons of the midbrain¹⁹³ and in ES cells of neural¹⁹⁴^,¹⁹⁵ and hematopoietic¹⁹⁶ origin. Additionally, a recent publication has illustrated the power of next-generation RNA-Seq technology when applied to this analysis.¹⁹⁷

Some epigenetic modifications (i.e., genomic imprinting) or other genetic variations, such as SNPs, that are potential sources of variation in transcript abundance are not likely to be accounted for. Because sequencing results of the human genome estimate that SNPs are the most prevalent class of common genetic variations (i.e., variants with a minor allele frequency of >1%), they may require additional consideration. Although the vast majority of these genetic variants introduce silent mutations and neutral phenotypes, there has been much focus on determining the relative ratio of neutral, near-neutral,¹⁹⁸ and non-neutral SNPs within populations of different ancestry.¹⁹⁹^–²⁰¹ These genetic polymorphisms are naturally occurring, evolutionarily stable differences thought to confer a predisposition, susceptibility, or resistance to disease and influence individual responses to curative regimens, perhaps by altering the three-dimensional local DNA topography.²⁰²

There are a number of highly paralleled methods for assaying SNPs across the genome.²⁰³^,²⁰⁴ Both the mass spectrometry (MS)-based assay and fluorescence polarization–based assay are allele-specific primer extension methods in which the genomic region is amplified by PCR and used as a template for the annealing of an oligonucleotide primer immediately upstream of the polymorphism. A DNA polymerase then adds only a single nucleotide, because chain-terminating dideoxynucleotide triphosphates (ddNTPs) are used, as dictated by the target DNA sequence at the polymorphic site. Multiplex MS versions rely on the natural differences in molecular weight of the DNA for detection by matrix-assisted laser desorption/ionization (MALDI) time-of-flight (TOF) MS for efficiently assigning genotype.²⁰⁵ In the fluorescence polarization–based version,²⁰⁶ the ddNTPs are labeled with different fluorophores. For detection, the labeled ddNTP is incorporated into the primer, which causes it to rotate more slowly within the plane of laser polarization and thereby emit more signal than the unincorporated ddNTPs, which rotate more quickly. A more robust fluorescence polarization assay uses PCR with a universal fluorescence resonance energy transfer (FRET) reporter system for detection of SNPs.²⁰⁷ An alternative set of highly multiplexed SNP assays incorporate universal PCR. One example of this is the molecular inversion probe assay,²⁰⁸ in which a single oligonucleotide simultaneously binds a complementary genomic DNA sequence that flanks either side of the SNP. The single base being flanked is filled in with a DNA polymerase and ligated to circularize the oligonucleotide. The circularized molecular inversion probe is isolated from the linear, nonbinding probes and amplified. The highly paralleled, multiplexing capability comes from a unique bar code in each set of SNP-specific oligonucleotides that can be detected on a high-density array containing sequence complementary to the bar code tags. In contrast, the GoldenGate assay uses allele-specific primer extension through the SNP to which it can be ligated to a second oligonucleotide (specific only for the locus and not the polymorphism), amplified, and detected with bar code tags.²⁰⁹ Genome-wide studies of SNP analysis have also used SNP arrays, which provide relatively comprehensive coverage (≈80%) of the genome currently mapped. Apart from the previously discussed sources of limitations for any array-based platform, it was originally thought that SNP array artifacts may arise from an additional bias. Detailed position mapping of SNP sites suggest that they are enriched in the 250–base pair sequences upstream of transcription start sites. Because of the close proximity of SNPs to each other at this loci, it was estimated that approximately15% of the microarray probes for any given gene will overlap with SNPs that are polymorphic in the population study. However, in a large-scale human study, no such systemic artifacts arose.²¹⁰ Studies integrating the simultaneous effect of genome-wide DNA polymorphisms and global effects on gene expression suggest that transcript abundance can be a quantitative trait that can be mapped.²¹¹

Proteomics

Complementing global gene expression analysis with studies examining the intrinsic variations in global protein expression, state of posttranslational modification, and subcellular localization within the functionally important networks with which they interact is the focus of high-throughput proteomics techniques. By definition, the proteome is the expression of the entire complement of proteins in a cell, tissue, or organism produced from a particular genome at a single static point in time.

Highly paralleled analysis of protein expression and abundance is the most widely featured form of analysis (Table 3-1). Some forms approach the experimental design from a so-called top-down approach in which naturally occurring proteins can be analyzed. The most cost-effective form of these techniques is conventional 2D gel electrophoresis.²¹³^,²⁴⁵ Determining protein levels by 2D gel electrophoresis requires separating a protein lysate by isoelectric point in the first dimension followed by SDS-PAGE in the second. Comparison of Coomassie brilliant blue, Sypro Ruby, or silver-stained gels differentially exposes expressed spots that can be excised and enzymatically digested from the preparation after identification of the peptides by MS. Peptide mass mapping by MALDI-TOF MS and peptide sequencing by electron spray ionization MS are highly efficient at identifying gel-separated proteins. However, various technical issues, such as labor-intensive image analysis for gel matching, bias in protein representation, and small dynamic range of resolution (≈1 order of magnitude), are responsible for the high coefficient of variation (20% to 30%) that has limited its wider use.²⁴⁶ Using methods with sufficient dynamic range in proteomics is important because the rather modest differences observed in the changes in mRNA abundance in microarray studies are in sharp contrast to the range of protein expression (≈5 to 8 orders of magnitude).

TABLE 3-1 Proteomics Methodologies and Detection of Posttranslational Modifications*

A multiplexing fluorescent 2D difference in gel electrophoresis (2D-DIGE)²⁴⁷ method directly labels lysine groups of proteins with mass- and charge-matched cyanine (Cy) dyes before resolving them in the first isoelectric dimension. Up to three samples,²⁴⁸^,²⁴⁹ each with a spectrally separable Cy dye, can be run on the same 2D gel and optically overlayed, thereby providing a significant increase in confidence during spot matching among samples. Addition of the Cy dye adds approximately 500 Da to the labeled protein, but it is also the basis for the large dynamic range (≈4 orders of magnitude) of 2D-DIGE²⁵⁰^,²⁵¹ and its ability to detect relatively small changes in protein expression.²⁴⁸ Titrating the Cy dye labeling limits the tagging to one lysine on each protein and prevents precipitation of the protein because of its increased hydrophobicity and visualization artifacts associated with the added mass of each additional fluorophore. As with standard 2D gels, hydrophobic membrane proteins are underrepresented in the sample despite the use of various chaotropic agents.²¹²^,²⁵² Similarly, proteins with high molecular weight are difficult to resolve in the first dimension, low-abundance proteins (<1000 copies per cell) are not detected,²⁵³ and spots may contain more than one protein.

Current global biomarker discovery strategies attempt to generate differential maps of small peptide markers (<30 kD) that can correctly classify masked serum or cerebrospinal fluid (CSF) samples to a specific disease. To do so, most of these recent studies have relied on surface-enhanced laser desorption/ionization (SELDI) MS.²⁵⁴ A sample of serum or CSF is directly applied to the surface of a gold-plated chip with a modified chromatographic matrix (e.g., weak positive ion exchange [CM10], metal binding surface [IMAC30], or strong anion exchange [Q10]) that will bind only certain sets of proteins within the serum or CSF. With the use of TOF MS, patterns of peptide biomarkers in ovarian,²⁵⁵ breast,²¹⁴ and prostate cancer²⁵⁶ have shown promising correlative strength.

In contrast to this trio of techniques, bottom-up proteomic methods consistently use enzymatic digestion to generate peptide fragments as the primary input for mass measurements by MS coupled with efficient multidimensional liquid chromatographic separation. Isotope-coded affinity tags (ICAT) are the most popular method for quantifying the relative expression levels of individual proteins.²⁵⁷ By specifically labeling cysteine residues with a reagent that contains nine ¹²C (light) or nine ¹³C (heavy) atoms²⁵⁸ and a biotin tag, samples of control proteins derivatized with the [¹²C]-ICAT reagent or experimental proteins with the [¹³C]-ICAT reagent are combined, digested with trypsin, and separated on an avidin column. All the cysteine-containing peptides tagged with biotin are selectively separated on an avidin column, subjected to reverse-phase chromatography, and identified by liquid chromatography–MS or MS. The selective enrichment of cysteine-containing peptides, which are thought to constitute 10% to 20% of the peptides from a whole cell extract,²¹⁷^,²⁵⁹ does significantly decrease the complexity of the peptide mixture. Quantifying the ¹²C/¹³C ratio provides a relative, not absolute expression ratio of individual proteins within the sample set. A variation of ICAT uses a combination of four isobaric labels that can label multiple lysine-containing peptides per protein. This feature of the isobaric tag for relative and absolute quantitation (iTRAQ)²⁶⁰ has the effect of increasing the confidence interval of identification and quantitation. iTRAQ-labeled peptides are normally separated by reverse-phase chromatography, but OFFGEL fractionation has also been used as another option.²¹⁶ A recent comparison of ICAT and iTRAQ with 2D-DIGE supports the notion that the global-tagging approach of iTRAQ is more sensitive than either ICAT or 2D-DIGE,²¹⁵ although all are likely to have dynamic ranges of resolution of at least 4 orders of magnitude. A more important point elucidated in this study was the limited overlap of proteins characterized by each of the techniques, thus suggesting the complementary quality of their data sets.

A third MS-based assay, multidimensional protein identification technology (MudPit),²⁶¹ separates complex peptide mixtures in two dimensions, as opposed to 2D gels, which fractionate in two dimensions. Peptide mixtures are separated in the first dimension on the basis of electrostatic charge and eluted by using a step gradient of increasing salt concentrations directly onto a tandemly coupled column that separates on the basis of hydrophobicity. The advantage of this unbiased approach is that it allows integral membrane proteins to be identified among the complex peptide mixture. Typically, these proteins are lost by 2D gel separation.

Stable isotope labeling with amino acids in cell culture (SILAC)²⁶² is a simple approach for the in vivo labeling of proteins for MS-based detection and quantitation of protein expression and posttranslational modifications.²⁶³ Isotopically labeled amino acids, usually lysine and arginine, are added directly to the growth media, where cells directly incorporate the light (e.g., ¹²C or ¹⁴N) or heavy (e.g., ¹³C or ¹⁵N) label into newly synthesized protein chains that can be prepared for peptide identification and quantitation by MS. SILAC has been coupled with various enrichment techniques to monitor phosphorylation,²⁶⁴^–²⁶⁶ as well as methylation,²⁴⁴ in protein lysates.

The large data sets that are generated by these techniques do not directly address the state of posttranslational modification. Phosphorylation events are common and affect approximately 30% of all proteins ²⁶⁷ at one time, and they are well-known regulatory switches in neuronal development,²⁶⁸ synaptic plasticity,²⁶⁹^,²⁷⁰ and neuron-generated biologic rhythms.²⁷¹ The current tools for the study of phosphoproteomics place heavy emphasis on the direct use of phospho-specific reagents in array-based platforms or various enrichment techniques to isolate phosphoproteins for subsequent analysis with MS (see Table 3-1). Selective phosphopeptide enrichment can be achieved by chromatographic separation, such as with immobilized metal affinity chromatographic (IMAC) columns²²⁸^–²³⁰ and antiphosphotyrosine affinity columns²²²^–²²⁵ or phospho-dependent binding domains.²³⁹ As a result of the negative charge of the phosphopeptides and their hydrophilic nature, specialized MS protocols have been used in some circumstances for analysis of phosphorylation sites.²⁷²^,²⁷³ The one exception to the dependence on a phosphoprotein enrichment scheme is the fluorescent Pro-Q Diamond stain for direct use in isoelectric focusing gels, SDS-PAGE, and 2D gels for the detection of all types of phosphorylated amino acids,²²⁰ although greater proteome coverage can be achieved when Pro-Q Diamond staining is coupled with liquid chromatography–MS approaches.²⁷⁴ In ICAT, phosphorylated residues within a cysteine-containing peptide can be detected as a normal consequence of the experiment.²⁷⁵ Tools for detection or enrichment of other posttranslational modifications, such as glycosylation and nitration, have also been used (see Table 3-1).

The protein expression profiles generated by antibody array–based detection methods have matured considerably in surface designs of solid supports, coupling chemistries, on-chip probe stability, and fabrication.²⁷⁶ First-generation high-density arrays printing approximately 1000 unique antibodies with current fabrication platforms, with expectations of up to 10,000 in the near future, are beginning to provide an alternative to other proteomic methodologies for generating highly paralleled data in disease biomarker discovery signatures,²⁷⁷^,²⁷⁸ phosphoproteomics,²⁷⁹^,²⁸⁰ and oncoproteomics.²⁸¹^,²⁸² One distinct advantage of this approach is the minute amounts of sample consumed, usually less than 1 µL, with multispotting techniques potentially providing significant improvement.²⁸³^,²⁸⁴ In DNA microarrays, multiple housekeeping genes act as an internal control during the data normalization process. Lacking a comparable standard, a variety of data normalization approaches have been devised for antibody-based arrays.²⁸⁵ Other complementary microarray formats include binding domain–based arrays, such as phospho-dependent Src homology 2 domains,²³⁹ and reverse-phase protein microarrays,²⁸⁶ which are the inverse of antibody-based arrays (i.e., cells, serum, or tissue is spotted on a nitrocellulose slide and probed with a single antibody) and have been used to show disease progression in cancer.