Methods in natural product chemistry

Published on 02/03/2015 by admin

Filed under Basic Science

Last modified 22/04/2025

Print this page

rate 1 star rate 2 star rate 3 star rate 4 star rate 5 star
Your rating: none, Average: 5 (1 votes)

This article have been viewed 3141 times

Chapter 7 Methods in natural product chemistry

In Chapter 6 we looked at the initial process in the selection of biomass (plant or microbe), its extraction and screening in different formats (high- and low-throughput screening), the isolation of the active components (bioassay-guided isolation) and the evaluation of the drug lead in clinical trials to the final drug. In this chapter we deal with the isolation process in more detail and cover the techniques that are used to isolate and characterize an active compound using chromatographic and spectroscopic techniques.

Bioassay-guided isolation

Bioassay-guided isolation is the physical process used to isolate biologically active chemicals from a natural source. Many of the chemicals described in Chapter 6 are from plant sources, but microbes are also an exceptionally valuable source of chemical diversity, in particular the filamentous bacteria (the Actinomycetes) of which the antibiotic-producing genus Streptomyces is the most widely studied for bioactive compounds. The fungi are also important and microbiologists spend time working in biota-rich environments such as the Amazon basin collecting, typing (identifying) and culturing samples for shipment back to the laboratory to be screened for bioactivity. As with plants, this process can be highly complicated, particularly in the identification of fungi, of which there may be potentially millions of new species waiting to be described in remote locations. This exercise is extremely worthwhile, as it is highly likely that new species will contain new chemistry that may have interesting bioactivity when fully screened. This will be particularly relevant for the Basidiomycetes, a large group of fruiting fungi that produce a mushroom cap (basidium) and are sometimes difficult to grow in solution fermentation.

Preparation and extraction

Whether samples are plants, microbes (fermented or solid phase), marine animals (corals, slugs, tunicates) or insects they are referred to as biomass. In the case of plants, following their identification and classification by a field botanist into a species and family, samples are collected from the aerial parts (leaves, stem and stem bark), the trunk bark and roots or, in the case of large trees, the heartwood (sometimes referred to as timber). These samples are then gently air-dried, although this can be problematic in highly humid environments such as rainforests and coastal regions. Better control is achieved in the laboratory using drying cabinets or lyophilizers (freeze-driers), although biomass must be dried quickly to avoid degradation of components by air or by microbes. Care must be taken with lyophilizers as they utilize a high vacuum, which can remove volatile components that may have interesting biological activities.

Once biomass has been dried, it is ground into small particles using either a blender or a mill. Plant material is milled twice, first using a coarse mill and then a fine mill to generate a fine powder. The grinding process is important as effective extraction depends on the size of the biomass particles; large particles will be poorly extracted, whereas small particles have a higher surface area and will therefore be extracted more efficiently.

Selection of the solvent extraction approach is very important. If a plant is under investigation from an ethnobotanical perspective, then the extraction should mimic the traditional use. For example, if an indigenous people use a specific extraction protocol such as a water extract, a cold/hot tea, alcohol or alcohol–water mixtures, then an identical or at least a very similar method should be used in the laboratory so that the same natural products are extracted. Failure to extract biomass properly may result in loss of access to active compounds. Additionally, using an inappropriate extraction method, such as strong heating of biomass with a solvent, may result in degradation of natural products and consequent loss of biological activity.

Numerous extraction methods are available, the simplest being cold extraction (in a large flask with agitation of the biomass using a stirrer) in which the ground dried material is extracted at room temperature sequentially with solvents of increasing polarity: first hexane (or petroleum ether), then chloroform (or dichloromethane), ethyl acetate, acetone, methanol and finally water. The major advantage of this protocol is that it is a soft extraction method as the extract is not heated and there is little potential degradation of natural products. The use of sequential solvents of increasing polarity enables division of natural products according to their solubility (and polarity) in the extraction solvents. This can greatly simplify an isolation process. Cold extraction allows most compounds to be extracted, although some may have limited solubility in the extracting solvent at room temperature.

In hot percolation, the biomass is added to a round-bottomed flask containing solvent and the mixture is heated gently under reflux. Typically, the plant material is ‘stewed’ using solvents such as ethanol or aqueous ethanol mixtures. The technique is sometimes referred to as total extraction and has the advantage that, with ethanol, the majority of lipophilic and polar compounds is extracted. An equilibrium between compounds in solution and in the biomass is established, resulting in moderate extraction of natural products. Heating the extracts for long periods may also degrade labile compounds; therefore a pilot experiment should first be attempted and extracts assessed for biological activity to ascertain whether this extraction method degrades the bioactive natural products. Care should be taken, as extraction is never truly total; for example, some highly lipophilic natural products are insoluble in polar solvents (e.g. the monoterpenes).

Supercritical fluid extraction utilizes the fact that some gases behave as liquids when under pressure and have solvating properties. The most important example is carbon dioxide which can be used to extract biomass and has the advantage that, once the pressure has been removed, the gas boils off leaving a clean extract. Carbon dioxide is a non-polar solvent but the polarity of the supercritical fluid extraction solvent may be increased by addition of a modifying agent, which is usually another solvent (e.g. methanol or dichloromethane).

The most widely used method for extraction of plant natural products is Soxhlet extraction (Fig. 7.1). This technique uses continuous extraction by solvents of increasing polarity. The biomass is placed in a Soxhlet thimble constructed of filter paper, through which solvent is continuously refluxed. The Soxhlet apparatus will empty its contents into the round-bottomed flask once the solvent reaches a certain level. As fresh solvent enters the apparatus by a reflux condenser, extraction is very efficient and compounds are effectively drawn into the solvent from the biomass due to their low initial concentration in the solvent. The method suffers from the same drawbacks as other hot extraction methods (possible degradation of products), but it is the best extraction method for the recovery of big yields of extract. Moreover, providing biological activity is not lost on heating, the technique can be used in drug lead discovery.

In general terms, regardless of the extraction method used, extracts are of two types: lipophilic (‘fat-loving’), resulting from extraction by non-polar solvents (e.g. petrol, ethyl acetate, chloroform, dichloromethane), and hydrophilic (‘water-loving’), produced by extracting biomass with polar solvents (e.g. acetone, methanol, water).

The value of using solvents of different polarities is that the chemical complexity of the biomass is simplified when taken into the extract, according to the solubility of the components. This can greatly simplify the isolation of an active compound from the extract. Additionally, certain classes of compounds may have high solubilities in a particular solvent (e.g. the monoterpenes in hexane), which again can simplify the chemical complexity of an extract and help with the isolation process.

Regardless of the extraction technique used, extracts are concentrated under vacuum using rotary evaporators for large volumes of solvent (> 5 ml) or ‘blown down’ under nitrogen for small volumes (1–5 ml), ensuring that volatile components are not lost. Removal of solvent should be carried out immediately after extraction, as natural products may be unstable in the solvent. Aqueous extracts are generally freeze-dried using a lyophilizer. Dried extracts should be stored at –20 °C prior to screening for biological activity as this will decrease the possibility of bioactive natural product degradation.

If it is known that certain classes of compounds, such as acids or bases, are present in the biomass, they can be extracted using a tailored protocol. The most common group of natural products that are extracted in this manner are the alkaloids (see Chapter 6 and below), which are often present in plant material as salts. A brief outline of how these basic compounds may be extracted is as follows:

This extraction method generates a mixture of alkaloids that are essentially free of neutral or acidic plant components and is specific for compounds that are basic (able to form free bases). However, care should be taken with alkaloid extractions as the acids and bases employed may destroy active natural products that have functional groups which are readily susceptible to degradation (e.g. glycosides, epoxides and esters). Additionally, the stereochemistry of a molecule may be affected by the presence of these strong reagents. The most important factor to consider is: is biological activity retained following the extraction protocol?

Isolation methods

Once an extract has been generated by a suitable extraction protocol and activity is demonstrated in a bioassay (e.g. an antibacterial test), the next step is to fractionate the extract using a separation method so that a purified biologically active component can be isolated.

Gel chromatography

Assuming that the extract is still active, the next step is chromatography. A procedure that is widely used as an initial clean-up is gel chromatography, also known as size exclusion chromatography. This technique employs a cross-linked dextran (sugar polymer) which, when added to a suitable solvent (e.g. chloroform or ethyl acetate), swells to form a gel matrix. The gel contains pores of a finite size that allow small molecules (< 500 Da) to be retained in the matrix; larger molecules (> 500 Da) are excluded and move quickly through the gel. This gel is loaded into a column and the extract is added to the top of the column. Large molecules are the first to elute, followed by molecules of a smaller size. This is an excellent method for separating out chlorophylls, fatty acids, glycerides and other large molecules that may interfere with the biological assay. Different sorts of gels are available which may be used in organic solvents (e.g. LH-20) or aqueous preparations such as salts and buffers (e.g. G-25). Therefore both non-polar and polar natural products can be fractionated using this technique. Additionally, compounds are not only fractionated according to size but there is also a small amount of adsorption chromatography occurring, as the dextran from which the gel is made contains hydroxyl groups that interact with natural products, facilitating some separation according to polarity.

This is a non-destructive ‘soft’ method with a high recovery (compounds are rarely strongly adsorbed) and a high quantity of extract (hundreds of milligrams to grams) may be separated. A further benefit of this technique is that many different gels are available with a variety of pore sizes that can be used to separate compounds from 500 to 250,000 Da. This is the method of choice for large molecules, in particular proteins, polypeptides, carbohydrates, tannins and glycosides, especially saponin and triterpene glycosides.

Ion-exchange chromatography

The separation of small polar compounds, in particular ionic natural products, is often problematic. It is possible to separate these metabolites from larger molecules (using gels) but they are generally very strongly adsorbed with normal-phase sorbents such as silica or alumina, and, even with the use of polar solvents and modifiers (e.g. acid and base), efficient separations may not be achievable. Additionally, these compounds are not retained on reverse-phase sorbents such as C18 or C8. These natural products possess functional groups, such as CO2H, -OH, -NH2, that contribute to the polarity of the molecule, and this may be used to develop a separation method using ion-exchange chromatography.

This technique is limited to natural products that can carry charge on their functional groups. The sorbent or stationary phase has charged groups and mobile counter ions which may exchange with ions of the functional groups present in the natural product as the mobile phase moves through the sorbent. Separation is achieved by differences in affinity between ionic components (polar natural products) and the stationary phase. These ion-exchange sorbents or resins are divided into two groups: cation exchangers, which have acidic groups (CO2H, -SO3H) and are able to exchange their protons with cations of natural products, and anion exchangers, which have basic groups (-N+R3) that are incorporated into the resin and can exchange their anions with anions from the natural product. These ion-exchange resins may be used in open column chromatography or in closed columns in applications such as high performance liquid chromatography (HPLC).

An example of the technique is shown in Fig. 7.2. 2,5-Dihydroxymethyl-3,4-dihydroxypyrrolidine (DMDP) from Lonchocarpus sericeus (Fabaceae) is a nematocidal polyhydroxylated alkaloid (PHA), and also inhibits insect α- and β-glucosidases. Compounds of this type are bases and form cations in acidic solutions. When added to a cation exchanger [e.g. Amberlite CG-120, which has a sulfonic acid bound to the resin which can exchange its proton (cation)], the DMDP cations are retained (bound) by the cation exchanger and protons are displaced. If the cation exchanger is then eluted with a solution containing a stronger cation such as image (e.g. from 0.2 M NH4OH), then the DMDP cation is desorbed from the exchanger and is unbound and mobile. This affinity can be used to separate such alkaloids from acidic (anionic) or neutral components which would not be retained by the cation exchanger and may be washed from the resin by water.

Plant extracts that contain DMDP are used as a nematocide against infected crops (bananas) in Costa Rica and are licensed by the National Institute of Biodiversity. This is an example of a renewable resource as the extracts may be prepared from the seeds of the plant and DMDP is ecologically friendly as it is biodegradable.

Biotage™ flash chromatography

Biotage™ flash chromatography may be used for quick efficient separations. This employs pre-packed solvent-resistant plastic cartridges (Fig. 7.3), which contain the sorbent (silica, alumina, C18, HP-20, or ion exchange resin). These cartridges are introduced into a radial compression module (the metal cylinder in Fig. 7.3), which pressurizes the cartridge and sorbent radially. This results in a very homogeneous packed material (sorbent), reduces the possibility of solvent channelling when the system is run and minimizes void spaces on the column head.

Using this technique, milligrams to tens of grams can be separated. The bioactive extract can be dissolved in solvent and loaded onto the column directly; solvent is then pumped through the column and fractions are collected, resulting in a rapid separation of extract components. This is a rapid method; 10 g of extract can be fractionated into 12 fractions of increasing polarity in 30 min using a step gradient solvent system. There are a number of benefits to this, particularly that speed minimizes contact with reactive sorbents (e.g. silica) and that hazardous sorbents such as silica, which when free may cause silicosis, are contained in the cartridges. Additionally, the cartridges may be re-used, reducing the cost of the bioassay-guided process. The high flow-rates employed by this technique (20–250 ml/min) retain ‘band-like’ movement of the components through the column, resulting in a high resolution. Compounds eluting from the column may be detected by TLC (of fractions) or the eluant may be passed through a UV detector so that compounds that absorb UV light can be detected as they elute from the column. Some laboratories run several of these flash columns simultaneously, resulting in a high number of fractionated extracts having sufficient mass for further purification of the active components.

Thin-layer chromatography

Thin-layer chromatography (TLC) is one of the most widely used and easiest methods for purifying a small number (2–4) of components, typically following a Biotage flash separation. This method employs glass or aluminium plates that are pre-coated with sorbent (e.g. silica gel) of varying thickness dependent on the amount of material to be loaded onto the plates. The coating on analytical plates is generally of 0.2 mm thickness; preparative plates may have a coating 1–2 mm thick. The compound mixture is loaded at 1–2 cm from the bottom edge of the plate as either a spot or a continuous band. The plate is then lowered into a tank containing a predetermined solvent which will migrate up the plate and separate the compound mixture according to the polarity of the components.

In analytical use, micrograms of material may be separated using this technique and samples such as drugs of abuse (e.g. cannabis resin) may be compared with standards (e.g. tetrahydrocannabinol) for quick identification.

Sorbent-coated plates often incorporate a fluorescent indicator (F254) so that natural products that absorb short-wave UV light (254 nm) will appear as black spots on a green background. Under long-wave UV light, certain compounds may emit a brilliant blue or yellow fluorescence. Both UV absorbance and fluorescence properties may be used to monitor the separation of compounds on a TLC plate.

Preparative scale TLC has great use and loadings of 1–100 mg can readily produce enough purified material for biological assays and structure elucidation. It is rapid and cheap and has been the method of choice for separating lipophilic compounds. Preparative plates are available from suppliers as pre-coated plates of 1–2 mm thickness in silica, alumina or C18. However, home-made plates offer greater flexibility by allowing the incorporation of modifying agents into the sorbents (e.g. silver nitrate for separation of olefinic compounds – known as argentation TLC), use of other sorbents (ion exchange, polyamide, cellulose) and the addition of indicators and binders.

The scale-up from analytical to preparative mode is crucial, as an increase in the sample load may drastically change the separation of the components. Normally, the method developed on the analytical scale must be modified, generally with a reduction of solvent system polarity. Preparative TLC is used as a final clean-up procedure to separate 2–4 compounds. The sample is dissolved in a small volume of solvent and applied as a thin line 2 cm from the bottom of the plate and dried. The plate is then eluted in a suitable solvent and UV-active compounds are visualized at 254 or 366 nm. Natural products that are not UV-active will need development using a suitable spray reagent such as vanillin-sulphuric acid, Dragendorff’s reagent, phosphomolybdic acid or antimony trichloride. In this case, an edge of the plate is sprayed with the reagent (taking care that only a small area of the plate is covered) and separated compounds are visualized as coloured bands. The bands containing pure natural product are scraped off the plate and the natural product is desorbed from the sorbent. This desorption may be carried out by placing the compound-rich sorbent into a sintered glass funnel and washing with a suitable solvent followed by collection and concentration of the filtrate. The purified ‘band’ should then be assessed for purity by analytical TLC.

There are a number of advantages of this method for the analysis and isolation of biologically active natural products:

The major disadvantages of TLC are that:

High-performance liquid chromatography

The final separation technique discussed in this section is high-performance liquid chromatography (HPLC). This method is currently in vogue and is widely used for the analysis and isolation of bioactive natural products. The analytical sensitivity of the technique, particularly when coupled with UV detection such as photodiode array (PDA), enables the acquisition of UV spectra of eluting peaks from 190 nm to 800 nm. The flow-rates of this system are typically 0.5–2.0 ml/min and sample loading in the analytical mode allows the detection and separation of tens to hundreds of micrograms of material. With PDA UV detection, even compounds with poor UV characteristics can be detected. This is especially useful in the analysis of natural products such as terpenoids or polyketides which may have no unsaturation or chromophores that give rise to a characteristic UV signature.

HPLC is a highly sensitive technique when coupled to electronic library searching of compounds with a known UV spectrum. Modern software enables the UV spectra of eluting peaks to be compared with spectra stored electronically, thereby enabling early identification of known compounds or, usefully, the comparison of novel compounds with a similar UV spectrum, which may indicate structural similarity. It is also possible to increase the size of these electronic libraries and improve the searching power of the technique. HPLC is a powerful technique for fingerprinting biologically active extracts and comparisons can be drawn with chromatograms and UV spectra stored in an electronic library. This is currently very important for the quality control of herbal medicines for which appropriate standards in reproducibility of extract quality must be met.

HPLC can be run in fully automated mode and with carousel autosamplers it is possible to analyse tens to hundreds of samples. These HPLC systems are computer-driven and not only run samples, but may be programmed to process data and print out chromatograms and spectra automatically. Radial compressed column technology can also be made use of in HPLC and, as with Biotage flash chromatography, can access highly varied column technology, including standard sorbents such as normal phase (silica) and reverse-phase (C18 and C8) and more ‘exotic’ stationary phases such as phenyl, cyano, C4, chiral phases, gel size exclusion media and ion exchangers. This versatility of stationary phase has made HPLC a highly popular method for bioassay-guided isolation.

HPLC is a high-resolution technique, with efficient, fast separations. The most widely used stationary phase is C18 (reverse-phase) chromatography, generally employing water/acetonitrile or water/methanol mixtures as mobile phase. These mobile phases may be run in gradient elution mode, in which the concentration of a particular solvent is increased over a period of time, starting, for example, with 100% water and increasing to 100% acetonitrile over 30 min, or in isocratic elution mode, in which a constant composition (e.g. 70% acetonitrile in water) is maintained for a set period of time.

HPLC is also used preparatively and, with the aid of computer-controlled pumping systems, very accurate mixing of solvents can be achieved leading to superb control of elution power. Many preparative columns employ radial compressed column technology as these columns have a long column life, few void volumes, homogeneous column packing with little solvent channelling and excellent ‘band flow’ of components as they flow through the column. As with TLC, an analytical HPLC method is developed for scale-up to preparative HPLC; flow-rates of 50–300 ml/min are common. Detection in preparative HPLC generally utilizes a UV detector that has been optimized to detect the natural product of interest (e.g. at 254 nm) by knowledge of the spectra acquired by analytical PDA HPLC. A large sample loading of tens of milligrams to grams of material can be achieved and rapid isolation can be facilitated by the use of intelligent fraction collectors that can ‘peak collect’ compounds as they elute from the column by receiving input from the UV detector.

The technique can be used for the majority of natural products that are soluble in organic solvents and can be adapted to ion exchange for the isolation of highly polar compounds. HPLC is the method of choice for the pharmaceutical industry because of its excellent separating power, speed and reproducibility. The major disadvantage of this technique, however, is its expense, as analytical instrumentation may cost upwards of £20,000, and preparative HPLC may be £30,000. Consumables for this technique are also expensive, especially preparative columns (£2000), which may have a short life and at high solvent flow-rates there is a high cost for purchase and disposal of high-purity solvents.

Isolation strategy

How do the isolation methods described above fit together? Figure 7.4 gives a general isolation protocol starting with selection of biomass (e.g. plant or microbe), which is then extracted using a Soxhlet apparatus, cold or hot percolation or supercritical fluid extraction. Hydrophilic extracts will then typically undergo ion exchange chromatography with bioassay of generated fractions. A further ion exchange method of bioactive fractions would yield pure compounds, which could then be submitted for structure elucidation. Lipophilic extracts could initially be partitioned to generate a further hydrophilic fraction which could be dealt with by ion exchange chromatography as described above.

The lipophilic portion may then be either subjected to gel chromatography to remove or separate large components (e.g. chlorophylls, fatty acids or glycerides), or this step skipped, and then flash chromatography used to generate a series of fractions that would undergo bioassay. Active fractions can then be further purified by using either HPLC or TLC to give pure compounds, for submission to structure elucidation. This is only a general outline for isolation of bioactive natural products; there is no tailored protocol and the physicochemical properties of natural products differ enormously, sometimes making the isolation of chemicals from nature a highly difficult and intellectually challenging problem. In very rare cases, organisms produce very simple extracts (2–4 components), which may be readily separated into pure compounds, but in the majority of cases extracts are highly complex and the active component may be present at a very low concentration or even be unstable, further contributing to the difficulty of the bioassay-guided isolation process. It is also possible that activity may even diminish during the separation process; this could be due to a synergistic effect occurring through several components working in concert in the bioassay. There is currently much interest in this area, and it is possible that the value and efficacy of certain herbal medicines (which contain tens to hundreds, or even thousands, of natural products) are due to several active components; thus the bioassay-guided isolation approach may not be appropriate for the study of these agents. The facts remain, however, that many useful pharmaceutical entities were developed using this approach and, because of the sheer number of organisms still to be investigated, it is certain that bioassay-guided isolation will yield many new medicines in the future.

Structure elucidation

Ideally, bioassay-guided methods should afford a pure natural product of at least 5 mg in weight. Current structure elucidation techniques are available that can determine the structures of compounds on micrograms of material, although larger quantities are of benefit for further biological assays and will save time and money in the pharmaceutical industry setting, so that valuable resources do not need to be used in the acquisition of more biomass and a potentially lengthy bioassay-guided isolation process.

Structure elucidation of natural products generally employs the classical spectroscopic techniques of mass spectrometry (MS) and nuclear magnetic resonance (NMR) spectroscopy. The first steps, however, should be the recording of infrared (IR) and ultraviolet-visible (UV-Vis) spectra to determine the presence of certain functional groups and conjugation in the molecule. Rather than a theoretical approach to this subject, we will use a bioactive natural product to highlight the strengths of these techniques.

In a project to isolate and characterize antibiotics from plants, Thymus vulgaris (thyme), a member of the mint family (Lamiaceae), was extracted with hexane and ethyl acetate. Each of these extracts was highly active in a bioassay to discover compounds with activity against methicillin-resistant Staphylococcus aureus. Extracts were analysed by TLC and bulked due to similarity. Biotage flash chromatography, followed by preparative HPLC, led to the isolation of the pure active natural product, compound X, which was a pale yellow volatile oil with a pungent aroma. The UV spectrum showed a maximum at 277 nm, indicative of the presence of an aromatic ring. The IR spectrum showed absorptions attributable to aromatic and aliphatic C-H groups and a broad peak at 3600 nm indicative of an hydroxyl functional group.

Mass spectrometry

This technique allows the measurement of the molecular weight of a compound and, once a molecular ion has been identified, it is possible to measure this ion accurately to ascertain the exact number of hydrogens, carbons, oxygens and other atoms that may be present in the molecule. This will give the molecular formula. A number of ionization techniques are available in MS, of which electron impact is widely used. This technique gives good fragmentation of the molecule and is useful for structure elucidation purposes as the fragments can be assigned to functional groups present in the compound. The disadvantage of this technique is that molecular ions are sometimes absent. Softer techniques such as chemical ionization (CI), electrospray ionization (ESI) and fast atom bombardment (FAB) mass spectrometry ionize the molecule with less energy; consequently, molecular ions are generally present, but with less fragmentation information for structure elucidation purposes.

Compound X was submitted to FAB-MS; the spectrum is shown in Fig. 7.5. The scale on the x-axis is the mass (m) to charge (z) ratio (m/z). As compound X readily forms single ions, m/z is in effect m/1 and therefore directly related to the weight of fragments and, in the case of the molecular ion, the molecular weight of the compound.

A molecular ion (M+) is seen at m/z 150. This is supported by additional peaks where the molecule picks up a hydrogen ion at m/z 151 [M+H]+ and loses a hydrogen ion at m/z 149 [M–H]+. The spectrum was run using FAB ionization and little fragmentation is evident. There are some useful fragments, however, in particular at m/z 135, which is 15 mass units less than the molecular ion and almost certainly corresponds to [M–Me]+, indicating that this molecule contains a methyl group (15 mass units), which is readily lost in the mass spectrometer.

Accurate mass measurement of the molecular ion at m/z 150 gave a figure of 150.104700. If a computer program is used to calculate the number of carbon, hydrogen and oxygen atoms that would be required to give this weight, a formula of C10H14O is produced. The theoretical mass of this formula is 150.104465, which is very close to the measured accurate mass. The theoretical mass takes into account the accurate masses of carbon, hydrogen and oxygen, and the nearest ‘fit’ to the measured mass gives the C10H14O formula. Interestingly, compound X has 10 carbon atoms and, as it is a volatile oil, it is likely to be a member of the monoterpene group of natural products.

At this stage it would be possible to perform a database search on this molecular formula and the producing organism (Thymus vulgaris) from sources such as SciFinder (Chemical Abstracts) or the Dictionary of Natural Products, although there are many natural products with this formula and therefore further structure elucidation is required.

NMR spectroscopy

1H NMR spectroscopy

The next step in this process is the recording of a 1H NMR spectrum (Fig. 7.6). This will indicate the number of hydrogen atoms associated with a particular group (integration) and how shielded or deshielded that group is. Shielding and deshielding occur due to the presence of groups that are either electron-withdrawing (deshielding) or electron-donating (shielding).

Inspection of the 1H NMR spectrum of compound X recorded in the solvent deuteron-chloroform CDCl3, which has a peak at 7.27 ppm (Z in the spectrum), shows three deshielded peaks (A, B and C), each integrating for one proton (the figures under the x-axis are the integration and indicate how many protons are associated with each peak). These protons occur in the aromatic region (6.00–8.00 ppm) and have a particular coupling pattern. Additional signals include a broad peak (D), indicating that it is exchangeable and possibly an hydroxyl group, a multiplet at 2.84 ppm (E) integrating for one proton and a singlet at 2.23 ppm (peak F) integrating for three protons, which is due to a methyl group. The last peak in the spectrum (G) is a doublet (two lines) integrating for six protons. This is due to two methyl groups occurring in the same position of the spectrum as they are equivalent. This equivalence occurs because they are in the same ‘environment’. This signal appears as a doublet because these methyls are coupled to one proton (multiplicity = n + 1, where n is the number of nearest neighbouring protons), and the most likely candidate for this single proton is the multiplet at 2.84 ppm. This proton is a complex multiplet because it couples to all six of the protons of the two coincident methyl groups. This coupling system indicates that these two groups form an isopropyl group [(CH3)2CH-]. The one-proton multiplet at 2.84 ppm (peak E) and methyl group at 2.23 ppm (F) are slightly deshielded (higher ppm) with respect to the methyl groups at 1.23 ppm (G), indicating that they are attached to a group that causes electron withdrawal (possibly an aromatic ring, which is inferred by the presence of the aromatic protons).

Expansion of the aromatic region 6.6–7.1 ppm (Fig. 7.7) shows the coupling pattern of the aromatic ring. Inspection of this area allows measurement of coupling constants, referred to as J values. Taking peak A, which is a doublet (two lines), as an example, this is done by subtracting the lower ppm value for this peak from the higher ppm value and multiplying the difference by the field strength at which this experiment was measured (400 MHz in this case). This gives:

image

The size of this coupling constant indicates that peak A is coupled to another proton which is ortho to itself (ortho coupling constants are of the order of 6–9 Hz). Peak B is a double doublet (and has four lines) with two couplings (8 and 1.6 Hz), indicating that this is the proton which is ortho to peak A (it has the same coupling constant of 8 Hz) and the smaller coupling constant (1.6 Hz) is indicative of a meta coupling to another proton (meta coupling constants are typically 1–2 Hz). Peak C at 6.67 ppm is a doublet (1.6 Hz) which has the same coupling constant as one of those for peak B, indicating that this is the proton that is meta to B. This part of the spectrum therefore tells us that we have an aromatic ring with three protons attached to it in the 1, 2 and 4 positions (Fig. 7.7).

Taking all of the fragments from the 1H spectrum into consideration, there are three aromatic protons, a broad exchangeable peak, a multiplet, a methyl singlet and a six-proton doublet corresponding to two coincident methyl groups. This total of 14 protons is identical to the number found in the molecular formula using MS.

13C NMR spectroscopy

The proton spectrum has revealed much about the number of protons present and their chemical environments – i.e. whether they are shielded or deshielded by electron-donating or electron-withdrawing groups, respectively. The next step is the acquisition of a 13C NMR spectrum which will give further information regarding the environment of the different groups and the number of carbons present. Two 13C NMR spectra of compound X are shown in Fig. 7.8.

The top spectrum is the broadband decoupled spectrum which shows all of the carbons present; the carbons appear as singlets due to proton decoupling. The top spectrum of compound X was recorded in CDCl3, which occurs as three lines at 77.0 ppm. There are only nine carbons evident, which might be confusing as we know from the mass spectrum that compound X should contain 10 carbons. However, as the 1H spectrum has two coincident methyl groups, it is possible that there are two carbons associated with the peak at 24 ppm. If both methyl groups were in the same environment (as they are in an isopropyl group), then they would occur at the same position in the spectrum. As with the proton spectrum, the carbon signals occur over a large range, which is again determined by whether the carbons are deshielded (high ppm value) or shielded (low ppm value). The lower 13C spectrum has been produced by a special experiment called DEPT-135, which lacks the solvents signals, but, more importantly, it only shows carbons that have protons attached to them (CH, CH2 and CH3). As compound X does not have any CH2 groups (there are no groups in the 1H spectrum which integrate for two protons), only CH and CH3 carbons are shown. This is useful as it allows quaternary carbons (carbons with no protons attached) to be identified, and it can be seen that there are three additional aromatic quaternaries in the top spectrum, at 153.6, 148.5 and 120.8 ppm. The range for aromatic carbons is 110–160 ppm. The carbon at 153.6 is highly deshielded and it is possible that this carbon is attached to an oxygen atom (from the OH group in compound X). The three peaks at 130.8, 118.8 and 113.0 are all carbons bearing one proton (CH or methine carbons); these correspond to proton peaks A, B and C in the 1H spectrum (Fig. 7.6). The remaining peaks at 33.7, 24.0 and 15.3 are carbons associated with the multiplet (peak E), two coincident methyl groups (peak G) and a methyl singlet (peak F).

Homonuclear correlation spectroscopy

The next technique that can aid in the structure determination of compound X is COrrelation SpectroscopY (COSY), which reveals couplings between protons that are close (two, three or four bonds distant from each other). It is referred to as a homonuclear (same nuclei, both of which are 1H) two-dimensional technique because the data are displayed in a matrix format with two one-dimensional experiments (1H spectra) displayed on the x– and y-axes (Fig. 7.9). A diagonal series of peaks correspond to the 1H spectrum signals. Peaks that are away from the diagonal (referred to as cross-peaks) indicate coupling between signals.

For example, inspection of Fig. 7.9 shows a cross-peak between the signal at 1.23 ppm (coincident methyl groups, G) and the multiplet signal at 2.84 ppm (group E), confirming that they are coupled to each other (implied by the couplings in the 1H spectrum) and that together E and G are an isopropyl group. Additionally, group F (a methyl singlet at 2.23 ppm) shows a coupling to proton A, indicating that the methyl group is ortho to this proton (Fig. 7.9).

Inspection of an expansion of the aromatic region (Fig. 7.10) provides further support for the coupling pattern already suggested by the 1H spectrum. HA has an ortho coupling to HB and, in addition to coupling to HA, HB has a meta coupling to HC and appears as a double doublet (four lines). This coupling pattern is indicative of a 1,2,4-protonated aromatic ring and confirms the data from the 1H spectrum.

The related technique of Nuclear Overhauser Effect SpectroscopY (NOESY) is also useful as it shows through space correlations and through bond coupling between protons. Once the through bond correlations are determined by a COSY spectrum, the through space correlations can be seen. This allows the measurement of how close one proton is to another, which can be very useful in assigning the stereochemistry of a natural product.

Heteronuclear correlation spectroscopy

COSY spectra are referred to as homonuclear spectra as they are acquired by detecting only one type of nucleus (1H), but it also possible to detect the interactions between two different nuclei such as 1H and 13C. This is known as heteronuclear correlation spectroscopy, of which two types will be discussed here: Heteronuclear Single Quantum Coherence (HSQC) and Heteronuclear MultiBond Coherence (HMBC).

HSQC shows which protons are attached to which carbons. Figure 7.11 shows an HSQC spectrum for compound X from which clear correlations can be seen for protons A–C and E–G with the carbons to which they are attached. Proton D is a proton attached to oxygen (an hydroxyl group), so there is no carbon to correlate to (and therefore no signal). The aromatic protons all correlate to carbons at higher ppm; the aliphatic protons correlate to lower ppm carbons.

HMBC shows correlations between protons and the carbon atoms that are two and three bonds distant; these couplings are referred to as 2 J and 3 J, respectively. The experiment is set to show correlations that occur where the coupling constant between protons and carbons is of the order of 7 Hz. Two bond correlations are not always present in the spectrum as the coupling constant for 2 J correlations may be less or greater than 7 Hz. Figure 7.12 shows the HMBC spectrum for compound X.

HMBC spectra are highly informative and allow partial structure fragments to be constructed which can enable the full structure elucidation of natural products. The correlations for each proton group of compound X are given in Fig. 7.13.

It is already known from the HSQC spectrum which protons are attached to which carbons and the HMBC spectrum allows the final structure of compound X to be pieced together. For peak A (1 H aromatic doublet proton at 7.05 ppm) there are three correlations, one to the carbon associated with peak F and to two quaternary carbons.

Peak B (1 H aromatic double doublet proton at 6.74 ppm) correlates to the carbon to which peak E is attached; this fixes the isopropyl group next to peak B on the aromatic ring (peak E is part of the isopropyl system with peak G). Further correlations for peak B include couplings to carbons that are directly attached to peak C and to a quaternary carbon.

Peak C (1 H aromatic doublet proton at 6.68 ppm) also couples to the carbon bearing the proton associated with peak E; this confirms the position of the isopropyl side-chain between protons B and C. Proton C also couples to the same quaternary carbon as B and to the carbon attached to proton B. There is also a small coupling to the most downfield carbon at 153.6 ppm.

Peak D (hydroxyl group, 4.68 ppm) is broad, and long-range correlations to carbons are absent. Peak E (1 H multiplet proton at 2.84 ppm) shows correlations to the carbons attached to peak G (these are the coincident methyl groups), to both carbons attached to peaks B and C and to a quaternary carbon, which is the carbon to which the isopropyl group is directly attached.

For peak F (3 H singlet protons at 2.23 ppm) there are two correlations that appear equidistant at 15.3 ppm in the carbon domain and are an artifact of the HMBC spectrum (they are in fact the unsuppressed direct correlation between the protons of peak F and the carbon to which they are directly attached; compare with the HSQC spectrum in Fig. 7.11). There are three couplings for peak F: to the carbon attached to proton A, and to two quaternary carbons, one of which is the most downfield carbon (153.6 ppm).

Finally, peak G (6 H doublet at 1.23 ppm) shows a correlation to the neighbouring methyl carbon of the isopropyl group (and an unsuppressed one-bond signal equidistant about the peak G signal), a correlation to the carbon directly attached to peak E and to an aromatic quaternary carbon.

Figure 7.13 shows all of the correlations for each peak. The position of the hydroxyl group has yet to be assigned, but, as there is only one position available on the aromatic ring, the hydroxyl must be placed ortho to the methyl group. This is supported by the fact that the carbon to which this hydroxyl group is attached is the most downfield aromatic quaternary carbon (153.6 ppm), and heteroatoms such as oxygen are known to deshield carbon nuclei (compare the ppm value of this quaternary carbon with other quaternary aromatic carbons in compound X).

A database search indicates that compound X is carvacrol (Fig. 7.13), which is a common component of volatile oils, especially those from plants belonging to the mint family (Lamiaceae) of which Thymus vulgaris is a member.

The industrial approach to natural product drug lead discovery

Industry is currently putting much emphasis on discovering drugs from synthetic rather than natural sources, which, given the rich history that natural products have in the development of new drugs is foolish. Nevertheless, the companies involved in natural product drug discovery use a highly organized approach to reduce the time taken to find a biologically active compound and put it into drug development.

Programme structure

Assays are part of the remit of a specific programme. Examples include anti-infectives (assays for antibacterial, antifungal and antiviral agents), immune-inflammation (assays for disease states such as arthritis, eczema, asthma and psoriasis) and anticancer programmes (assays for cytotoxic agents and resistance reversing agents). These programmes are managed by a programme head who is responsible for several project teams working on various assays within the programme to discover new drug leads. The project teams are a truly multidisciplinary group with expertise in all aspects of the drug discovery process and include a project leader who runs the team, reports to the programme head and manages the logistics of the team, such as budgets and timelines for implementation of all aspects of the project. Many project teams use a microbiologist who is a specialist in the collection and taxonomy of organisms (sometimes working in the field). If organisms have already been collected and stored, the microbiologist will be involved in the fermentation of microbes for screening and in the optimization of fermentation for scale-up once a biologically active extract has been identified.

Some projects use the expertise of a field botanist, who is often based in a university botanic garden and has expertise in the identification and collection of plant species. The botanist will collect and dry samples and send them to the company for extraction and assay. Collaboration with such an expert can be extremely valuable as it will allow access to diverse biological samples from biota-rich tropical sources (e.g. Central America), or access to plants that are known to come from taxa that are adept at producing bioactive molecules.

An essential member of the team is the biochemist, who will assay extracts prepared by the microbiologist or chemist. This is a vital step in which thousands (sometimes tens of thousands) of extracts are assayed (screened for biological activity) in an HTS format. The biochemist then retests active extracts to ascertain if there is any cross-screening data on an extract – i.e. is it active in other assays? This is important, as some active compounds may be non-specifically active or show adverse activity (toxicity). Cross-screening can enable a decision to be made on the selection of an extract for further evaluation. The project chemist is responsible for the extraction of biomass (plants/microbes), the isolation of pure bioactive natural products and elucidation of their structure using spectroscopic techniques. A molecular pharmacologist carries out detailed evaluation of pure active compounds and performs preclinical evaluation.

Dereplication

After a screening process (the assay), a series of extracts will be active, but it is particularly important to establish that no replicates (i.e. extracts with the same chemistry) are present or that no compounds are present that are already known to be active in the assay. This is done by the process of dereplication, which is the science of avoiding known compounds or common molecules that interfere with assays (e.g. tannins which may bind in a non-specific manner to many proteins). This may be important if the assay uses a protein (e.g. an enzyme). To give a further example, if plant extracts are being screened for antitumour activity, it would be unhelpful to re-isolate taxol. The main reason for dereplication is cost; because projects and programmes are expensive to run and it can take weeks (sometimes months) to isolate active components, it is important that they are novel and are not known to be active in an assay. Novelty is an important factor, and the technology must be protected by patent so that the drug can be exploited commercially.

It is possible to avoid some known metabolites by accurate literature searching of the organisms under study. This is best achieved by on-line searching of databases such as Chemical Abstracts, Dictionary of Natural Products and NAPRALERT which offer a quick method to find chemical information on a species. There is no perfect database (they are all incomplete by their nature), but possibly the most comprehensive survey of natural chemistry can be found in Chemical Abstracts, which may be accessed online through a fee-paying service, although it is wise to use as many databases as possible.

The databases can lead to primary literature information on molecular weight, IR, UV and NMR data which can be used to recognize common metabolites at an early stage.

Most dereplication methods use physical methods on plant/microbial extracts before the assay, possibly the most comprehensive of which is automated HPLC, which can give retention times and UV data on eluting peaks. Combined techniques are more important. For example, HPLC-MS can provide more information such as retention times of peaks, their UV spectra (with photo-diode array) and molecular weight (and fragment) information, which is acquired as peaks elute from the UV detector into the mass spectrometer. All of these data can be built into a spectral library to allow searching of peaks with characteristic retention times, UV and mass spectra with those already acquired. This process is very powerful and enables recognition of known peaks to be made rapidly, thereby reducing costs in the discovery process.

For compounds that are lipophilic and volatile, a combination of gas chromatography and mass spectrometry (GC-MS) can be used to separate components to give retention times and eluting peaks which can be fed into a mass spectrometer to acquire characteristic molecular weight and fragmentation information. This technique can also be applied to polar water-soluble components such as the calystegines and other polyhydroxylated alkaloids if they are first derivatized with a suitable agent (e.g. trimethylsilyl chloride) to increase their volatility. Other more exotic dereplication processes for polar compounds include capillary electrophoresis–mass spectrometry (CE-MS), which can be used for compounds that are positively or negatively charged. In capillary electrophoresis, extracts are separated in a capillary filled with buffer, which has a potential difference (voltage) applied to it. Compounds of varying charge can be separated by this technique and, when coupled to a UV detector and mass spectrometer, the technique has great utility.

Natural product libraries

Traditionally, drug-lead discovery has used HTS of extracts in microtitre plates in high number to discover active extracts. Extracts are then produced in large amounts (scale-up) and compounds are isolated using the bioassay-guided route. This process is highly productive and there are many examples of drugs that have been discovered in this way. There are, however, a number of drawbacks with this process; it is expensive, time-consuming and may not lead to a drug candidate if there are problems with acquiring large amounts of the natural product or if the active component is a well-known compound with an uninteresting broad spectrum of activities. It is also true that this approach has come under threat from other competing methods of drug lead discovery, in particular combinatorial chemistry libraries in which high numbers of compounds are synthesized and dispensed into microtitre plates and screened for bioactivity.

Natural product chemistry has had to evolve to cope with this approach, and several companies now market natural product libraries. These libraries are banks of microtitre plates with pure natural products in individual wells at a known concentration and, in effect, the chemistry has already been done. Organisms are selected on the basis of unknown chemistry or chemotaxonomic information, extracted with solvents, and teams of chemists isolate hundreds of compounds from the extracts. This process can be highly automated, with flash chromatography and preparative HPLC being run in parallel to separate extracts into fractions and fractions into pure compounds. There are varying levels of structure elucidation information available on pure compounds (i.e. full NMR, MS, IR, UV); this is determined by the customer, who will be the company who buys (or leases) the library from the library producer.

There are a number of benefits to this procedure over conventional HTS and synthetic libraries:

The isolation chemistry in HTS has always been the slow bottleneck. The process is faster in the library generation which enables decisions about active natural products to be made rapidly.

All compounds have a defined concentration in the microtitre plate and so the potency of the compound in the assay is known immediately; this is not true with extract screening. Most active extracts may contain one major component with poor activity or, even worse, the least-active extract may have a low concentration of a very potent compound which may be deprioritized.

Screening a pure compound will give a better assay response, as single components have a cleaner interaction with the biological target and there are no interfering compounds present in the extract (e.g. tannins, which may mask the presence or absence of an active component).

The dereplication process is more successful with pure compounds and a large amount of data can be acquired.

Identification of the active compound may be total before screening and this can give useful information on any structure–activity relationships with other screened compounds.

The whole process is now much quicker; the limiting step is now assay technology, which is itself very rapid.

Large amounts of pure compounds can be isolated (5–10 mg), so retesting and further assays can be performed quickly without delay in further isolation.

Most importantly, the cost is less than HTS as the time for discovery of a natural lead is reduced.

Several companies have adopted this approach, notably Hypha Discovery Ltd (http://www.hyphadiscovery.co.uk) who market a fungal natural product library produced using a novel fermentation technique. The company sells (or licences) the library for drug and agrochemical discovery to larger companies. For compounds that are of further interest to the customer, Hypha Discovery can then scale-up the extraction and purification of the compound by large-scale fermentation at their facilities.

Alternative approaches in natural product drug lead discovery

Many fungi and bacteria do not produce natural products when fermented and it is possible that they require an external stimulus from another organism to do so (e.g. other microbes or molecules secreted into their environment by another organism). This presence of unproductive organisms results in a loss of access to chemical diversity. Additionally, only 5% of microbes may be culturable and fermentable and there is, therefore, great potential to discover new therapeutic agents if ways are discovered to tap into the genetic, and, therefore, chemical, capability of these organisms.

Some companies utilize the procedure of combinatorial biosynthesis, in which DNA is taken from uncultivable organisms or from soil to build a library of characterized DNA fragments. It is then possible to insert this DNA into a host such as Escherichia coli or a yeast, which are readily fermentable organisms. The importance of this technique is based on the assumption that the host may use this DNA in the biosynthesis of new natural products. The host is then fermented and screened for bioactivity; this process may also be used to generate compounds for natural product library production. This is an innovative approach and may lead to new classes of natural products and allow full exploitation of microbial chemistry. Unfortunately, it may need much research to exploit this opportunity, especially given that many kilobases of DNA are required to produce even simple natural products. Additionally, the host may not utilize the foreign DNA, or it is possible that the DNA inserted may not code for proteins that make natural products.

Why natural products as drugs?

In the previous sections we have briefly looked at some of the classes of compounds produced by nature and how they can be isolated and their structures elucidated; but why are natural products such an important source of drugs? There are many examples of drugs from nature and this is possibly a result of biological and chemical diversity.

In screening for new bioactive compounds, it is important to access as wide a range of diverse biological specimens as possible because it is thought that this high range of biological diversity mirrors or gives rise to a high degree of chemical diversity (i.e. a wide array of structurally unrelated molecules). Ideally, a drug discovery programme should have access to as many different species as possible from groups such as plants, fungi, filamentous bacteria, corals, sea animals and amphibia. The tropics hold a vast repository of numbers of such species, and there is, therefore, enormous potential to discover new bioactive entities in this region of the world. Using Costa Rica as an example, the Instituto Nacional de Biodiversidad (INBio) has been conducting a national inventory of plants, insects, microbes and animals, and has so far assessed almost 500,000 species. This incredible genetic resource could potentially generate millions of natural products to be assessed for biological activity. Individual species are also adept at producing not only different classes of natural product (e.g. flavonoids and monoterpenes simultaneously), but also analogues of the same natural product class (Fig. 7.14).

It is this richness in chemistry, coupled with the structural complexity of certain types of natural product (e.g. paclitaxel), that makes natural products such a valuable commodity for the discovery of new drugs. It has been estimated that 25% of all prescription medicines owe their origin to a natural source and that, in the field of anticancer drugs, almost 60% of agents are either natural products or are derived from a natural product source. Additionally, many natural products are produced by the organism as a chemical defence (e.g. as an antimicrobial or as an antifeedant substance) against another organism; thus, in many cases, there is already an inherent biological activity associated with natural products.

Large pharmaceutical companies have been scaling down their interest in natural products, preferring to access their chemical diversity through synthetically produced libraries of compounds. Unfortunately, many of these libraries lack the true chemical diversity, chirality, structural complexity and inherent biological activity of natural products. It is therefore a matter of time before tried and tested natural product sources once again become widely used in drug discovery.

The Convention on Biological Diversity is a treaty between 182 countries to recognize the authority that countries/states have over their genetic resources (see also Chapter 5, p54-55). The treaty recognizes that access to biodiversity, be it plant, microbial or animal, is governed by the sovereign authority of that particular state. This means that it is not possible to acquire biological specimens from an area without ‘prior informed consent’ on ‘mutually agreed terms’ and that, should any commercial benefit arise from collection of biota (e.g. in the discovery of a new drug), then ‘equitable sharing of benefits’ should occur. This is an exceptionally important concept for profit sharing and one that can be highlighted by the example of prostratin (Fig. 7.15) from Homolanthus nutans (Euphorbiaceae).

Paul Cox, an ethnobotanist working in Samoa, became intrigued by the local use of the inner bark of Homolanthus nutans to treat yellow fever, which is a clinical manifestation of the viral disease hepatitis. He collected samples of these species and sent them for testing at the National Cancer Institute (NCI) for assessment of antiviral activity in anti-AIDS assays. The active component, prostratin, was isolated and the structure determined as a diterpene related to the phorbol ester group of natural products. Interest waned in this compound as the phorbol esters are known to be strongly tumour-promoting. However, Paul Cox was not deterred by this because of the local use of the plant and he urged the NCI to assess prostratin for tumour promotion. Interestingly, prostratin does not promote tumour growth. It appears to prolong the life of HIV-infected cells and stops infection of healthy cells by HIV. Whilst prostratin is still in development, the authorities of the village from which the discovery originated have negotiated an agreement signed by the prime minister of Samoa. The AIDS research Alliance who are developing prostratin will ensure that 20% of commercial profits that come from prostratin will go back to Samoa. In this settlement, revenues will go back to the village and to the families of the traditional healers who gave the original information on the use of H. nutans. This example highlights the fact that it is not only important that financial recompense is made to the originators of ethnobotanical research, but also that this traditional knowledge has value which must be preserved.