Principles and new developments in molecular biology

Published on 09/03/2015 by admin

Filed under Obstetrics & Gynecology

Last modified 22/04/2025

Print this page

This article have been viewed 2079 times

CHAPTER 12 Principles and new developments in molecular biology

Raheela Khan

Introduction

Expedited by completion of the draft version of the Human Genome Project in 2001 (Lander et al 2001, Venter et al 2001), the last decade has witnessed unprecedented advances in experimental methods in an ambitious effort to link genomics with whole-organism physiology and pathophysiology. The genome amongst individuals is 99.9% homologous, with diversity of function generated through postgenomic regulation at the mRNA and protein level. Detailed analysis of the human genome shows it to consist of approximately 3.1 billion base pairs (Little 2005). Of the genes, approximately 24,000, far fewer than the number expected, encode proteins with phenotypic diversity generated by multiple splice variants. It is estimated that over 800 additional genes are transcribed into small microRNAs (miRNA), although significantly more encode non-coding RNA (Little 2005). Clearly, the draft sequence has provided tantalising insight into the mysterious organization and complexity of the human genome that will be gradually unravelled for years to come.

New developments and refinement of large-scale, high-throughput investigations, often referred to as ‘-omic technologies’, seem set to transform medicine. This is likely to be achieved through improved prediction, detection, diagnosis and treatment of disease, as well as better monitoring of the healthy state. There have also been crucial advances in the generation of new insight into our understanding of the regulation of global gene expression. Specifically, the central dogma which describes forward flow of information of DNA–RNA–protein, initially overturned by the discovery of the enzyme reverse transcriptase (which converts RNA to DNA), is matched by exciting developments in the field of RNA biology. The context of these developments and application of this new technology to gynaecology is presented herein with key examples from the literature.

Global Screening and Analysis

Traditionally, pursuit of scientific enquiry has been hypothesis-driven, which by its nature deconstructs complex biological systems into manageable ‘bite-sized’ studies with a very specific research question. In our wider quest to understand systems biology and networks, a reductionist approach is ill suited to investigating the integrative nature of whole-organism physiology, specifically the interactions between genes, protein and function. Given the immense volume of sequence data generated by the Human Genome Project, an escalation in research targeted towards population-based, high-throughput screening of gene expression has since been observed. The emphasis on networks, rapid advances in -omic technologies and high-dimensional biology has produced unparalleled insight into the relationships between the genome, transcriptome, proteome and, more recently, the metabolome. These innovative approaches have also necessitated unprecedented dependence on the discipline of bioinformatics. Table 12.1 lists key websites that are the backbone of such investigations requiring information on gene and protein sequences, gene annotation, sequence homology, microarray analysis etc. Many of these are freely available, with two of the most comprehensive and popular sources of sequence information being the National Center for Biotechnological Information (NCBI) and Ensembl.

Table 12.1 Commonly used web-based tools and databases for genome analysis

Function/ institute	Software	URL (http://…)
Nucleotide/protein sequence data	NCBI	www.ncbi.nlm.nih.gov
European Bioinformatics Institute	EMBL	www.ebi.ac.uk
Expert protein analysis system	Expasy	www.expasy.ch
Human and other genome sequences	Ensembl	www.ensembl.org
Human genome sequence data	NCBI	www.ncni.nlm.nih.gov/genome
Basic local alignment search tool	BLAST	www.ncbi.nlm.nih.gov/blast
ESTs, full-length clones, libraries	IMAGE	www.geneservice.co.uk/products/image/index.jsp
Human gene expression	HuGE	www.HugeIndex.org
Online Mendelian inheritance in man	OMIM	www.ncbi.nlm.nih.gov/omim
Haplotype map project	HapMap	www.hapmap.org
Microarray gene expression data	MGED	www.mged.org
Stanford microarray database	SMD	smd.stanford.edu/
Microarray manufacture	Affymetrix	www.affymetrix.org, www.dnachip.org/
Primer design	Primer 3	primer3.sourceforge.net/

EST, expressed sequence tag.

Microarray technology

Gene sequence data in isolation provide little information on protein function. ‘Functional genomics’, the term used to describe the relationship between genes and physiological mechanisms on a global rather than individual basis, has evolved to probe these interactions. In particular, pioneering research using genechip technology enabling simultaneous global gene expression of thousand of genes (Schena et al 1995, Brown and Botstein 1999) has been at the forefront of functional genomics, and is the mainstay of high-throughput assays. It is a powerful means of identifying subsets of genes that are either up- or downregulated in a particular research scenario (Duggan et al 1999, Hegde et al 2000). Specifically, gene expression may be compared: (1) in different tissues/cells, (2) under developmental regulation (fetal versus adult) and on ageing, (3) in normal and disease states, and (4) in response to, for example, drug treatments, environmental cues etc.

Practical Aspects and Analysis

Microarrays

The success of DNA microarray technology exploits the complementarity of Watson–Crick base pairing that underlies hybridization of sample cDNA to either short oligonucleotide or cDNA sequences immobilized in a grid-like fashion on a solid substrate (Duggan et al 1999, Hegde et al 2000). Typically, a glass slide, nylon membrane or silicon wafer is the preferred format, although bead-based arrays are also available. For cDNA arrays, probe sequences principally derive from the IMAGE (Integrated Molecular Analysis of Genomes and their Expression) clone library arising from the Human Genome Project. These arrays include known genes whose function is as yet unknown, and expressed sequence tags whose full sequence and function are yet to be determined. High-density oligonucleotide arrays, such as those provided by Affymetrix, may be fabricated in situ by solid-phase chemical synthesis combined with photolithography. The sequences on Affymetric chips derive from ‘refseq’ — the definitive version of the gene sequence — contained within the NCBI suite of programs (Table 12.1). Currently, the latest arrays from Affymetrix offer unparalleled whole-genome analysis with over 700,000 probes representing over 28,000 human genes on a single array. Despite the diversity of microarrays available, some may not include the gene of interest to the specific research group, leading to the loss of key information in the study of gene pathways. An alternative strategy is to manufacture tailored arrays that include a smaller set of known genes of interest.

Once a research question has been formulated, the basic steps in a DNA microarray experiment, illustrated in Figure 12.1, commence with extraction and reverse transcription of total RNA or mRNA into cDNA from samples under investigation. Sample cDNA is then labelled either with radioactive or fluorescent probes and hybridized to the array. The main advantage of radioactive over fluorescent labelling is the enhanced sensitivity of the former. A currently popular approach involves differential labelling of test and control sample cDNA with the fluorophores Cy3 and Cy5. Following stringent washing to remove non-specific binding, image analysis of the emitted signal is performed either by quantitative phosphorimaging of radioactively labelled samples or by powerful laser scanning of fluorescence. These raw data are subsequently processed to generate scatter plots that offer a quick and easy means of surveying unaltered, up- or downregulated genes. A gene expression matrix, consisting of data organized in rows and columns reflecting the output of each individual gene in a particular sample, allows the data to be ranked and probabilities determined in order to identify individual fold changes in gene expression as a result of a biological effect. Typically, more than a two-fold change in transcript expression is taken to indicate biological relevance.

Figure 12.1 (A) Schematic illustration showing the transition from genome to protein. The structure of a gene contains introns (In) and exons (Ex). Techniques used in investigating aspects of molecular function are highlighted in the context of the entity targeted. Thus, GWAS for genome-wide association studies, ChIP (chromatin immunoprecipitation) for studying gene–protein interactions through the promoter region, and qRT-PCR (quantitative reverse transcriptase polymerase chain reaction) for transcript analysis using cDNA reverse transcribed from RNA and RNA interference for post-transcriptional silencing. (B) The steps involved in carrying out a DNA microarray experiment.

Inherent in these methods is the technical variation arising from a number of sources including RNA quality, variable probe labelling, high background etc. Distinguishing between real experimental changes and those due to technical variation is achieved by some form of normalization designed to remove bias and to ensure that the results obtained are an accurate depiction of a true biological effect. Normalization generally requires subtraction of background intensity from the signal of each gene. Other forms of normalization include comparison of gene expression against a known set of reference (housekeeping) genes included on the array that demonstrate constant expression irrespective of tissue type and conditions. The Human Gene Expression database (Table 12.1) is a useful source of information of such genes expressed in a variety of tissues.

Statistical analysis

Once microarray data have been collected, standard methods used to determine real as opposed to chance changes in gene expression levels may be analysed using parametric or non-parametric techniques. In the former case, the t-test uses a measure of within-treatment error that results in finding genes that only have a large change in expression levels relative to the within-treatment variance. This is usually resolved by undertaking a suitable number of experiments to reduce the variance, but is not so easy with array data where experiments are expensive.

An alternative approach is to use fold change, in which the within-treatment variance is ignored. The problem with this is that variances do matter and vary between genes and treatments. Moreover, fold change may stem from an initially low level of expression, possibly reducing the biological significance of the change. A method of improving the power of the t-test is to use a Bayesian statistical approach that uses prior knowledge of within-treatment measurement (Baldi et al 1998). In the case of arrays, this is achieved by assuming that genes with similar expression levels have similar measurement errors (Baldi and Long 2001). In addition, the data are not viewed in isolation but in the context of known biological interactions.

Given the caveats that apply to the use of the t-test for statistical analysis of array data, an alternative method used to derive probability values that has been applied to microarray data is the use of permutation tests. Unlike the t-test, permutation tests do not require data to be normally distributed nor variances to be equal, and are therefore more likely to detect real changes in expression. Tusher et al (2001) have developed a method that uses permutation tests known as ‘Significance Analysis of Microarrays’ (SAM), where each gene is assigned a score taking into account its expression and standard deviation. The uptake of the SAM method has been greatly facilitated by the availability of an Excel plug-in. It also provides information on the false discovery rate, which corrects for false-positive results in order to eliminate random changes in gene expression.

Exploratory data analysis and data models

Since the aim of most microarray experiments is to identify differential gene expression compared with a control sample/condition, some means of organizing genes into meaningful subsets forms part of exploratory analysis. For example, a comparison between normal and endometriotic tissue may unveil changes in multiple genes involved in inflammation or cell adhesion. Similarly, treating a cell line with steroids may lead to a downregulation of genes encoding inflammatory cytokines. Exploratory analysis investigates such relationships using unsupervised or supervised classification to identify patterns of gene expression based on similarity measurements, yet providing little or no information on the statistical significance of the findings. Supervised classification is distinct from unsupervised methods since it involves making prior assumptions about the data sets but may introduce bias (Shipp et al 2002).

Broadly speaking, unsupervised methods that include cluster analysis (Kerr and Churchill 2007) or principal components analysis (Hilsenbeck et al 1999) identify groups of genes that change and cluster them into cognate groups. One of the shortfalls of this approach is that genes that do show changes with expression but have no perceived function may be overlooked. This is more so when one considers that it is likely that several genes may be implicated in a physiological response. A further problem is in identifying the best model within which to place the individual gene expression changes. Genes may be clustered using hierarchical clustering (Eisen et al 1998), two-way clustering (Alon et al 1999), k-mean clustering, principal components analysis (Tavazoie et al 1999) and self-organizing maps (Tamayo et al 1999). It is also preferable to determine the validity of applying particular clustering algorithms to one’s data by carrying out bootstrap or jack knife analysis (Reimers 2005). Detailed mathematical arguments relating to supervised and unsupervised methods are beyond the scope of this chapter; readers are referred to the preceding references and the following sources for further information: Reimers (2005), D’Haeseleer (2005), Thalamuthu et al (2006) and Kerr et al (2008).

Limitations of microarrays

Although the use of microarrays has dramatically altered the field of molecular biology, it is no easy task to compare data across groups and arrays due to the variation in arrays used, normalization protocols employed, and statistical analyses and exploratory models applied. In an attempt to reach a consensus on sharing and storing array data, Brazma et al (2001) have launched Minimum Information About A Microarray Experiment (MIAME) as a mechanism for recording detailed relevant information on the execution and analysis of array experiments made publicly available for use by independent researchers.

Despite their undoubted utility in the laboratory, the limitations of DNA microarray methods are listed below.

• They do not detect mRNA but cDNA. The relationship between expression of mRNA and cDNA is complex and governed by several factors. Thus, cDNA levels may not mirror mRNA expression precisely.

• There is currently no method that allows reliable measurement of RNA.

• Arrays provide information on changes in hybridization signals for thousands of genes that cannot be performed in large replicates.

• Sources of error may arise from poor sample, RNA quality, probe labelling or background noise.

• It is not practicable to assess the identity of the DNA on the array, and errors relating to gene annotation, although rare, do occur.

• Having performed such studies at significant cost and effort, comparing microarray data is difficult and some agreement on standardizing experimental design in order to make valid comparisons has been proposed.

DNA microarrays remain a popular experimental tool, such that the technology has now extended to array-based global analysis of miRNAs, DNA methylation, genome-wide scanning and proteins. A further modification of this method for analysing histone modifications and gene-promoter sequence binding is based on chromatin immunoprecipitation (ChIP), which utilizes formaldehyde fixation to provide a snapshot of protein–DNA molecular interactions in situ (Wathelet et al 1998). This method has advantages over traditional in-vitro methods used to study transcription factors (i.e. electrophoretic mobility shift assays) and gene reporter studies that employ synthetic segments of promoter DNA that do not form chromatin with a structure reflective of that in native chromosomal DNA. ChIP-on-ChIP is an array-based method that allows mass screening of gene promoters involved in transcription and repression of genes (Collas 2009), and is a valuable addition to the armoury employed in studying gene regulation.

Validation of Microarray Data

All biological processes are ultimately regulated by the repertoire of genes that are expressed in a cell at a given time. The cellular response to physiological and pathophysiological stimuli will also depend on changes in this profile. Array hybridization (and other techniques) is designed to identify the transcripts in a cell and to estimate their relative abundance. Thus, the array itself is a form of quantitative assessment. However, since array studies are often performed on relatively few specimens, additional quantitative methods are frequently required to investigate small numbers of genes in multiple samples. This is often termed ‘array verification’. As array technology and the associated data-processing methods become more robust, use of this verification will decrease. Nonetheless, some verification is prudent and is most readily achieved using a polymerase chain reaction (PCR)-based approach.

Quantitative polymerase chain reaction

The sensitivity of the PCR has been instrumental in the elucidation of gene expression profiles at a cell and tissue level. Most readers will be familiar with the basis of the PCR which uses Taq polymerase, a thermostable enzyme isolated from the bacterium Thermus aquaticus, to exponentially amplify gene expression. Taq DNA polymerase has intrinsic 5′–3′exonuclease activity, optimal at approximately 72°C, and amplifies at a rate of 30–70 bases per second. The technique uses either template DNA or RNA from various sources (ex-vivo tissue biopsies, cell lines, blood, cloned DNA, microbial DNA) to compare expression of product (amplicon) under varied experimental conditions as decribed earlier. Quantitative PCR (qPCR) has rapidly supplanted conventional ‘endpoint’ PCR (Higuchi et al 1992, Wittwer et al 1997, Kubista et al 2006). It represents a major refinement where assay throughput has been greatly improved with 96- or 384-well plate formats, fluorescence detection and the commercial availability of ready prepared ‘master mixes’ that only require the addition of cDNA to amplify the relevant gene. The process of amplification essentially involves continual cycling between denaturation of double-stranded DNA to generate two single-template strands that undergo annealing to target primers, followed by strand extension and a consequent increase in product copy number.

Unlike DNA-PCR, which is relatively reliable and easily quantifiable, quantitative reverse transcriptase PCR (qRT-PCR), which requires the conversion of total RNA to cDNA with the enzyme reverse transcriptase (RT step), is a recognized error-prone reaction. RNA is significantly more labile than DNA, requiring careful handling. The crucial RT step is the source of much of the inaccuracy in quantification of gene expression due to inherent variability in, for example, the rate of reverse transcription and the choice of either oligo (dT)-18 or random hexamers for the priming of cDNA. The former produces a more accurate transcript profile which reflects the mRNA gene pool of the sample, with transcription requiring full-length, unfragmented, high-quality mRNA. Random hexamers prime cDNA from multiple points along the same transcript, thereby producing possibly more than one cDNA transcript per original target sequence, and also amplify ribosomal RNA. The choice of whether to use random hexamers, oligo-dT primers or a combination of both is a matter for the investigator. All RT reactions should be run in parallel with ‘no RT’ controls, that are duplicates of the original RNA reaction but with RT omitted in order to determine possible contamination with genomic DNA that would lead to false-positive results.

The heterogeneous nature of tissue samples is an additional complication, and it is advisable that, wherever possible, laser capture microdissection should be utilized in order to determine gene expression in specific cell types which will be more meaningful in interpreting association of gene expression with phenotype. Fortunately, this is relatively straightforward using techniques such as confocal immunofluorescence, immunohistochemistry and in-situ hybridization to determine a site of expression of the target protein or gene.

Primer design is a critical step in a PCR reaction. The current trend, however, is to purchase primers using software, in many cases provided by companies with expertise in qPCR. At a cost, these companies will also design primers on request, supplied as part of an assay kit where all reagents are optimized for the primers. The main disadvantage in ordering primers via this route is the lack of disclosure on information regarding the primer sequence, thus making it difficult to know exactly where the primer is annealing. Many freely available primer design tools are available on the internet.

Fluorescence chemistry in quantitative reverse transcriptase polymerase chain reaction

The two most widely used forms of fluorescence detection for qPCR are as follows.

• DNA-binding dyes: SYBR green was the first available fluorescent probe to be used in qPCR and qRT-PCR as it is easily incorporated into a normal reaction mix. It acts by intercalating with double-stranded DNA, thus generating approximately 1000-fold increased fluorescence intensity compared with unbound dye. Shortcomings of this method include its indiscriminate binding to any form of double-stranded DNA, including both the amplicon as well as primer-dimers. However, melt-curve analysis that enables distinction between specific and non-specific amplification based on knowledge of the desired amplicon is easily performed as part of the reaction. Figure 12.2 illustrates qRT-PCR undertaken using human endometrial cDNA and SYBR green chemistry.

• Hydrolysis probes using Taqman chemistry, the most popular of the fluorescence approaches, are based on the incorporation of an oligonucleotide probe which hybridizes to the DNA circumscribed by the two PCR primers. The probe is typically labelled with a fluorophore at its 5′ end and a quencher at the 3′ end. The 5′–3′ exonuclease activity of Taq polymerase as it progresses along the template digests a portion of the probe, sequentially removing each fluorescent dye molecule, finally achieving spatial separation from the dye at the other end of the probe. When these dyes are close together, their fluorescence is quenched [by fluorescence resonance energy transfer (FRET)]. Upon separation, this effect is lost and fluorescence is emitted. Since the product from one cycle is the template for the next, the amount of probe hybridized and therefore digested during each cycle increases. This leads to an increase in the emitted fluorescence which is proportional to the amount of initial template present. The Taqman probe method does not lend itself to melt-curve analysis. More sensitive FRET-based detection methods are available that do permit melt-curve analysis, but their use is limited by the expense in using this chemistry.

Figure 12.2 Quantitative reverse transcriptase polymerase chain reaction (PCR) of the adhesion molecule E-cadherin in human endometrium using the SYBR® green method. All three samples are characterized by a C_t of 17. Inset illustrates a melt curve showing a single peak produced during the amplification, indicating the formation of a specific amplicon. CF RFU, curve fit relative fluorescence units.

Source: Patel S, Shaw RW, Khan RN (unpublished data).

Data analysis and interpretation in quantitative polymerase chain reaction

Quantification of the qRT-PCR assay is most accurately determined during the exponential phase of the reaction when amplification products are being generated at a steady optimum rate, approximately 10–20 cycles into the reaction. The software creates an amplification plot with the measured fluorescence signal plotted against the cycle number. The single most important readout of the qRT-PCR is the C_t, and this is the first fluorescence signal that occurs above the threshold limit for fluorescence detection. Differences in C_t values are used to calculate the relative abundance of template between samples, as the C_t value is directly proportional to the amount of template at the start. A crucial factor in qPCR is the amplification efficiency, calculated from the slope of the calibration curve (see below) which should be close to 100% (Higuchi et al 1992).

Quantification in qPCR is carried out by one of two methods: absolute or relative. Both methods have advantages and disadvantages, and the decision regarding which one to use is one of personal choice.

Absolute quantification

This is a method by which known concentrations of nucleic acids are used to generate a standard curve from which unknown mRNA expression levels for target genes may be determined. For the method, known concentrations of cDNA or copy number are used in a calibration curve to create a standard curve with C_t plotted against log concentration or copy number cDNA. The data in Figure 12.2 illustrate an amplification curve showing E-cadherin expression in human endometrium.

Relative quantification

Also known as the delta delta C_t (ΔΔC_t) method, this uses a simpler approach in that there is no requirement for a standard curve and all quantification is based on C_t values expressed relative to a housekeeping or reference gene (Pfaffl 2001). However, this method is only effective if the reference gene of choice is constant in expression between samples or treatment. The amplification efficiencies of both the target and reference genes should be equal. Despite this, it remains a widely used method.

Normalization

As with microarray data, normalization is an essential step in qRT-PCR and is typically performed in relation to a reference gene. As reference genes are known to vary in many different cell types under various conditions, it is good practice to probe samples for more than one single reference gene (Vandesompele et al 2002). A new standard for determining an appropriate reference gene is derived from the GeNorm programme, which takes actual C_t values from a sample for a number of tested reference genes that are stated in the paper and uses an algorithm to assign an M value to determine the most appropriate stable reference gene from those tested (Vandesompele et al 2002).

It is essential that qPCR and RT-qPCR studies are reported accurately. The varied format currently adopted by researchers does not lend itself to critical appraisal and does not permit comparisons to be made easily. As with microarrays, guidelines on the minimum information for publication of quantitative real-time PCR experiments have been developed (Bustin et al 2009). Researchers undertaking qPCR/qRT-PCR are encouraged to comply with the recommendations made in this article.

Genome-Wide Association Studies

Mapping of single nucleotide polymorphisms and links to complex diseases

The abundance of sequence data produced by the Human Genome Project has made possible the use of microarrays in elucidating genomic variations that predict susceptibility to common diseases that include diabetes and cardiovascular disease (Wellcome Trust Case Control Consortium 2007). Thus, genome-wide association studies (GWAS) utilize large-scale, high-throughput methods to perform unbiased parallel genomic analysis of an unprecedented number of biological samples. The expectation is that this approach will be incorporated into personalized medicine or pharmacogenomics where responses to treatment will be predicated on an individual’s genotype.

GWAS is based on the principle that predisposition to common diseases is attributable to a small number of genetic variations, and that large-scale screening of populations or families will identify sequence patterns predictive of disease (Nica and Dermitzakis 2008). Most complex diseases are a consequence of a composite, non-Mendelian inheritance of multiple genes. The most common source of genetic variation within the human genome is the single nucleotide polymorphism (SNP), where the replacement of one nucleotide with another at a related locus potentially alters the downstream protein product encoded. SNPs, approximately 12 million of which reside within the human genome, are the preferred genetic markers due to their abundance, occurring approximately once every 1000 base pairs. Access to the HapMap database — a repository cataloguing the genetic variation in ethnically distinct populations and their link to health and disease (Frazer et al 2007) — has been instrumental in the launch of GWAS studies. Not all SNPs cause disease (Stranger and Dermitzakis 2006) and, interestingly, experimental evidence indicates that SNPs account for only a small proportion of the phenotypic variance, implying association with other non-genetic factors, particularly gene–environment interactions.

The availability of arrays that contain millions of cDNA fragments allows an individual’s genotype to be queried for up to 2 million known genetic variants. The design of GWAS may take the form of:

• a case–control study, in which individuals with disease are compared with a disease-free control group;

• a cohort design where a large number of individuals are recruited from a similar sample then categorized according to genetic variants; or

• the trio design, which attempts to survey an afflicted individual and his/her parents. This latter approach is best suited to Mendelian inheritance of disease (Pearson and Manolio 2008).

Pertinent to female health, the Womens’ Genome Health Study, a full-cohort prospective GWAS, was commenced recently (Ridker et al 2008). The goal of this project is to produce a fully searchable database of SNPs for women in order to identify polymorphism patterns that predict disease in otherwise healthy women. The Women’s Genome Health Study will use the same study population of well-characterized healthy women initially recruited to the Women’s Health Study. These women have already undergone 12 years of monitoring for major health events that include, amongst several others, cardiovascular disease, cancer, diabetes and osteoporosis. Significantly, this study has collated an extensive amount of epidemiological information along with baseline blood samples for several disease biomarkers and DNA for genotyping, as well as dietary and behavioural data that will allow gene–environment and gene–gene interactions to be examined.

Post-Transcriptional Repression and Regulation

The role of microRNA and short, interfering RNA

Probably the single most important discovery in cell biology in recent years relates to that of small non-coding RNA molecules, specifically short, interfering RNA (siRNA) and miRNA (Figure 12.3). The landmark paper describing RNA interference (RNAi), where Fire et al (1998) used double-stranded RNA to silence gene expression in Caenorhabditis elegans, has caused a paradigm shift in our interpretation of post-transcriptional regulation. It is estimated that over half of the genes encoding proteins in the human are likely to be regulated by miRNAs, providing a compelling case for further research in this field.

Figure 12.3 Post-transcriptional regulation of microRNA. Pri-miRNA, produced from the action of RNA polymerase II on the miRNA gene, is cleaved by the enzyme Drosha into a pre-miRNA transcript ∼70 nucleotides long. Pre-miRNA exits the nucleus after associating with exportin 5, a nuclear transporter. In the cytosol, a ribonuclease III known as Dicer processes the pre-miRNA into a ∼22 nucleotide duplex. The mature strand of the duplex then preferentially enters the RNA-induced silencing complex (RISC) to attach to newly formed RNA to elicit silencing. The shaded part of the figure illustrates the conventional sequence whereby mRNA is translated into protein.

Both miRNA and siRNA are produced endogenously in the nucleus, although siRNA was initially thought to have a solely exogenous origin, deriving principally from viruses. siRNA has since been demonstrated in native cells where it has been termed ‘endo-siRNA’. Interestingly, biogenesis of both siRNA and miRNA proceeds by a similar mechanism, described below, which evokes silencing in association with the Argonaute (Ago) superfamily of proteins (Meister et al 2005, Tomari and Zamore 2005).

miRNAs, in their fully-processed mature form, are short (18–24), single-stranded segments of RNA that bind to the 3′ UTR region of mRNA to induce transcriptional silencing (Carthew and Sontheimer 2009). They are produced in the nucleus from DNA by the action of RNA polymerase II to generate a long (hundreds to thousands of nucleotides), double-stranded precursor, primary RNA or pri-miRNA. The latter is spliced in the nucleus into pre-miRNA (70 nucleotides in length) by the enzyme ‘Drosha’ (Lee et al 2002), then exported to the cytoplasm by an accessory protein, exportin 5. Further processing of pre-miRNA by the enzyme ‘Dicer’ yields a miRNA duplex of approximately 22 nucleotides and a two-nucleotide overhang at the 3’ end (Yi et al 2003, Gregory et al 2005). Entry of this duplex into the RNA-induced silencing complex induces the association of one strand (the guide strand) with Ago where it is unwound and post-transcriptional repression of target mRNA ensues (see Figure 12.2). The other ‘passenger’ strand is degraded. It is interesting that perfect complementary base pairing between miRNA-mRNA results in degradation of the target mRNA, while a slight mismatch of base pairing induces silencing and translational repression (Tomari and Zamore 2005). Whilst the specific mechanisms that determine whether target mRNA is degraded or repressed remain an active area of research, imperfect base pairing has evolved as an ingenious mechanism that explains the multiplicity of targets for miRNA clusters. It should be borne in mind that some miRNA species only have one target.

Apart from their roles in development and stem cell differentiation, miRNAs appear to play a key role in both oncogenesis and tumour suppression. miRNAs involved in oncogenesis involve the mIR-17-92 cluster implicated in a mouse model of lymphoma (He et al 2005). Intriguingly, circulating miRNAs have been detected in blood and serum of cancer patients (Lawrie 2008), thus presenting an opportunity to harness miRNAs for diagnostic, prognostic and therapeutic purposes in the management of a wide range of malignancies.

The emergence of RNAi (Fire et al 1998) and the early success of siRNA in achieving targeted protein knockdown has overcome several obstacles. In practical applications, siRNA has been replaced with short hairpin RNA (shRNA), where the two individual strands of shRNA are linked by a short loop sequence of approximately eight nucleotides. shRNA mimics this short hairpin loop found in most native RNA species, therefore replicating function more faithfully than siRNA. Although siRNA technology holds great promise, one of the problems with its use was the observation that binding of the guide strand to large numbers of genes resulted in altered expression of a host of miRNA species; a phenomenon known as ‘off-targeting’. While this has been largely overcome through improved design of siRNA, another problem surfaced with the finding that shRNA-expressing vectors were dose-dependently linked to increased mortality in experimental animals, indicating overloading of the siRNA pathway. In addition, this study also challenged the notion that siRNA species lack immunogenicity since an immune response that involved Toll-like receptors — the pattern recognition family of Toll-like receptors — could be elicited (Grimm 2009). Despite these concerns, the experimental and potential therapeutic advantages of siRNAs provide a compelling case for their translation to the clinical arena.

Applications to Reproductive Biology

Significant milestones in reproductive biology that have transformed lifestyles include the development of the oral contraceptive pill, hormone replacement therapy and in-vitro fertilization. The power of gene cloning, as exemplified by the case of ‘Dolly the sheep’, has helped to launch the field of regenerative medicine and stem cell biology. Science has made many other, less well-known breakthroughs that have contributed to improving female health. Despite this, effective pharmacological therapies for the treatment of endometriosis, infertility, recurrent miscarriage and polycystic ovary syndrome (PCOS) remain elusive. Indeed, our understanding of the main female reproductive disorders is limited due to a paucity of research funding compared with diseases such as diabetes, cardiovascular disease and cancer. Microarrays in conjunction with new developments in our understanding of transcriptional regulation may elucidate temporal, tissue- and cell-specific gene signatures. Moreover, given the similarities between PCOS and diabetes, a GWAS approach to this condition may identify common genetic variations that might inform or predict development of PCOS. High-throughput technologies will undoubtedly provide new approaches to knowledge, diagnosis and therapeutics in all areas of reproductive biology.

Implantation/fertility

One of the most intractable problems facing reproductive medicine is failed implantation, whether this arises in natural cycles or in assisted reproduction. Transcript profiling studies, summarized in a recent review (Sherwin et al 2006), have sought to define endometrial function and receptivity. More recently, functional genomics has provided new information on molecular phenotyping of normo-ovulatory women (Talbi et al 2006). The authors convincingly demonstrated that microarray methods are as effective as histological methods in dating the endometrium. By using k-means clustering, a unique molecular signature associated with early proliferative, early secretory, mid-secretory and late secretory phase endometrium was also identified. Whilst studies of miRNA in the female reproductive tract are in their infancy, deletion of ‘Dicer’, an enzyme involved in miRNA processing, causes multiple reproductive defects that include reduced ovulation rates, loss of embryo integrity and oviductal cysts (Luense et al 2009). Investigations in humans on the role of ‘Dicer’ may uncover a common function of this enzyme in fertility. Given the ubiquity of RNA silencing, it is likely that developments in understanding failed implantation and subfertility will involve further forays into the field of miRNAs.

Endometriosis

The surgical and medical treatment of endometriosis remains a major area of clinical interest. Current medical strategies seek to induce a pseudomenopause or pseudopregnancy. Newer approaches include the use of aromatase inhibitors, gonadotrophin-releasing hormone antagonists, selective oestrogen modulators and antiprogestins. Alternatively, it might be possible to find new response genes or miRNAs that are differentially expressed in endometriotic lesions compared with eutopic endometrium. A recent investigation showed differential expression of 22 miRNAs in ectopic versus eutopic endometrium (Ohlsson Teague et al 2009). The target pathways appear to involve the c-jun and protein kinase B signalling pathways. On this basis, new therapeutic strategies targeting these cascades or the miRNAs implicated offer new direction in the future treatment of endometriosis.

Ovarian cancer

Ovarian epithelial tumours are of several histological subtypes, broadly being divided into serous and mucinous and by the degree of differentiation. A downregulation of miR-21 and several of the let-7 family of miRNAs in ovarian tumours has been reported, while miR-221 expression was raised (Dahiya et al 2008). A recent investigation has also identified that the mRNA for key enzymes ‘Drosha’ and ‘Dicer’ that catalyse miRNA biogenesis is reduced by 51% and 60%, respectively, in invasive ovarian cancer. Furthermore, high levels of ‘Dicer’ and ‘Drosha’ correlated with increased patient survival (Merritt et al 2008). A study of six types of cancer including ovarian demonstrated that miRNA profiling in tiny volumes (<1 ml) of serum could clearly distinguish between normal donors and cancer patients (Lodes et al 2009). These findings raise the prospect of using less-invasive serum-based assays in ovarian cancer detection and progression.