10. Massage therapy research methods
Tiffany Field
Chapter contents
An evolving research process199
Specific considerations for specific conditions206
Statistical analyses212
An evolving research process
Historical background
Massage therapy dates to prerecorded time and was epitomized by Hippocrates in 400 bc as ‘medicine being the art of rubbing’. Research in the field of massage therapy also goes back many years. The first academic journal publications date back to the 1930s when massage therapy research on humans and animals was fairly popular. Many of those research projects focused on documenting the increased blood flow associated with massage therapy as well as reducing muscle atrophy. Although many of the questions then are the same questions now, the approach to research was limited by the measurement technology and the studies often featured either single cases or very small sample sizes, which were typically self-selected samples of clinical patients being treated for one condition. Measurement technology, for example, was limited to physiological measures such as heart rate, blood pressure and temperature and was basically a biomechanical model. The advent of biochemical assay technology has enabled more expansive models. Even in the last few years the ability to assay neurohormonal activity in non-invasive procedures has advanced the field significantly in terms of looking at underlying mechanisms.
Other methodological problems were that the control groups were non-treatment groups that did not control for attention from the therapist. Even more recently, massage therapy has been compared to other treatments such as relaxation therapy. However, these group comparisons were confounded by compliance problems as relaxation therapy is often viewed as requiring work and concentration. Thus we have been using a light pressure massage as a sham massage therapy group to control for touch/attention effects inasmuch as recent studies have documented the need for moderate pressure to achieve positive massage therapy effects (Diego et al. 2004).
The research question
Typically the research question for massage therapy studies is whether massage therapy is an effective and cost-effective treatment for a given condition. Much of research is ‘me-search’, the questions often derived from the investigator’s personal interest, such as somebody close having experienced that condition or because that is a condition seen in one’s practice or research setting or because the condition is a recent funding priority and the research is intended to provide pilot data for seeking funding in that area. Determining how effective the massage therapy is means finding meaningful variables such as the gold standard variables for that particular condition, designing the most effective massage therapy technique for that condition and selecting the most appropriate treatment comparison group and attention control group. To address these questions typically leads first to a literature search.
Literature search
PubMed (accessed through Google) is the biggest source of current literature abstracts. Although it is fascinating to read the older literature, which often serves as a source of good ideas for replication studies using more sophisticated approaches to problems, the typical published paper features references from the last decade. Thus, literature searches are typically confined to the last decade. In searching through the abstracts yielded by the computer literature search, researchers look to see if the question has already been addressed, if the condition has been treated by massage therapy and if the literature suggests the next steps. Entering a term for the condition along with massage therapy is likely to yield the most specific information needed. However, starting with a more global approach using simply the terms ‘treatment’ or ‘therapy’ and the name of the condition would yield significantly more abstracts and more general information about:
• the condition
• the hypothesized underlying aetiology
• the gold standard and other measures that have been used in research on other treatments of the condition
• ideas for treatment comparisons that might serve as an attention control group.
The literature search can serve as the background for the first part of the paper, the introduction. To be sure one knows the problem being addressed and the methods used to study that problem, it is always a good idea to write the first half of the paper before starting the study. The literature search will provide background on the incidence of the problem, the symptoms, possibly the aetiology or hypothesized aetiology, previous treatments both allopathic and alternative medicine and the efficacy of those treatments. Once the background and methods sections are drafted, the paper can serve as a proposal that can go to potential collaborators who will facilitate the research.
Selection of collaborators
Selecting a clinic or a hospital setting or a school setting for non-clinical problems is always advantageous given their provision of participants for the research. Another advantage is that clinical settings are often the places where potential collaborators work, such as allopathic or alternative physicians in osteopathic medicine. Having a medical collaborator is important in terms of being able to keep abreast of the most important clinical measures for the condition being studied, having a referral source and someone who can administer the clinical measures and having clinically relevant research that is considered credible by potential journals for publication and by potential reviewers for grant funding. Important scientific collaborators include a neuroscientist for assays of biochemical measures or for interpretation of physiological, e.g. electroencephalogram (EEG), data and a statistician or PhD researcher to assist with designing and conducting the statistical analyses for the project. Massage therapy collaborators are also needed, particularly if the researcher is not a massage therapist, for the design of the massage therapy procedure to be used and to help identify measures that can directly assess the effects of that procedure. Another important collaboration consideration is locating a source of volunteer massage therapists for the actual treatments or for demonstration of the treatments if parents or significant others are going to be the therapists.
Selecting treatment and attention control comparison groups
Traditionally, the alternative treatment group has been compared to a standard treatment control group, for comparison assessments made on the first and last days of treatment. However, a potential placebo effect or an effect of the therapist simply providing attention to the subject has highlighted the need for using treatment comparison and/or attention control groups. In much of our early work we used relaxation therapy as a comparison treatment group because relaxation therapy has been shown to be effective particularly in alleviating stress and anxiety, which often exacerbate the medical conditions we are studying. Also, we considered it important to establish a greater efficacy of massage therapy (versus relaxation therapy) in order to justify the greater expense of massage therapy treatment. The problem we found was that it may be a biased comparison inasmuch as people view relaxation therapy as hard work, requiring significant concentration and self-discipline. Thus, we may be experiencing compliance problems when we use relaxation therapy as a control. In addition, because relaxation therapy requires a certain amount of cognitive sophistication along with a reasonable attention span, it may be too difficult for young children. Therefore, we have used attention controls such as rocking the child, holding the child or playing with toys and holding and reading to the child as comparison groups in massage therapy research with children.
More recently, since we discovered the critical importance of stimulating pressure receptors for the massage to be effective, we have elected to use a sham massage therapy procedure comparison group which receives exactly the same massage as the treatment group, but with light pressure. This also enables the subjects to be ‘naïve’ or ‘blind’ to expecting a unique effect of their particular treatment condition. The subjects or participants in each group would expect to receive some benefit from massage whether it was deep pressure (in the case of the real treatment group) or light pressure, as applied in the sham group. Double blinding is also possible insofar as the physicians who are providing the standard treatment and the massage therapists providing the experimental treatment do not necessarily have expectations that one or the other massage style is going to be more effective. This is the closest we can come to a double-blinded situation in massage therapy research. This way neither the participant nor the therapists are biased towards the treatment.
Selection of sample parameters and random assignment to groups
Demographic variables including age, gender, ethnicity and socioeconomic status are considered the most basic sampling parameters that need to be equivalent across groups. Generally, by virtue of the location and demographics of the clinical setting, the age, ethnicity and socioeconomic status of the participants are somewhat homogeneous. The ethnicity is often predominantly one ethnic group or another and the socioeconomic status is limited in range. This helps prevent the research design from being confounded by variability as a function of varying ethnicity or socioeconomic status and the random assignment to groups would be expected to result in a roughly equivalent distribution in each group. Clinics are also generally separated by paediatrics, adulthood and sometimes even by ageing, by function of the condition and by the specialty of the physician, so they are also typically homogeneous on age and condition.
Another critical background variable, particularly in medical research, is the severity of the condition. This variable is more likely to result in a heterogeneous grouping and therefore would need careful matching or stratification. Typically, participants are randomly assigned to groups by a table of random numbers or by flipping a coin and it is intended that the randomization would yield roughly equivalent groups on background variables. The most conservative way to ensure equivalent groups is to match subjects across groups. For example, in studies on premature babies, the subjects are frequently matched on birthweight and gestational age and then randomly assigned to treatment and control groups. The less conservative way is a random stratification procedure whereby cells would be made so that if there were two birthweight groups (low birthweight and very low birthweight) and two gestational age groups (short gestation and very short gestation), there would be four cells (a very short gestation and very low birthweight group, a very short gestation and low birthweight group, a short gestation and very low birthweight group, and a short gestation and low birthweight group). Subjects would then be randomly assigned to these cells and there would be a roughly equal number of subjects assigned to each cell in each group by the end of the study.
The selection of the sample size involves several considerations, including economic considerations. The typical first step for determining sample size is to conduct a power analysis to determine whether there will be enough statistical power for the data analysis given the sample size. Power can be determined by taking the difference of the two group means from a previous study and dividing that by the larger of the two standard deviations for the same means.
Despite the sample size determined by the power analysis, economic considerations constrain the sample size to a minimum. One way to remain economical is by conducting data analyses at intervals of 10 subjects per group to determine whether groups are significantly different on the key variables. If there is simply a trend for statistical significance for those variables, then the absence of significance may mean that the sample is still too small and more subjects are needed. This can be done at intervals of two subjects per group.
Selection of variables
The most important variable is the gold standard clinical variable that is typically viewed as a criterion for clinical improvement in any condition. For example, in diabetes the gold standard variable is typically the glucose level and in asthma it is typically the peak air flow measure. The clinical gold standard measure can be designated by a collaborating physician or can be found in the literature. Often there is more than one clinical gold standard measure. If there are multiple variables that would be considered redundant, some selection needs to be made. For statistical analysis reasons, researchers try to keep in mind a five-subjects-to-one-variable ratio, attempting not to be variable-heavy.
The second important set of variables are stress variables as they are thought to exacerbate any clinical condition. Because of the subjective nature of stress, it is good to have not only self-report stress variables such as the State Anxiety Index and the Profile of Mood State measures (to be elaborated later) but also a converging physiological measure, e.g. vagal tone or a chemical measure, e.g. salivary cortisol, to provide validation of the subject’s self-report on stress.
Typically, treatment research involves assessing the immediate effects of the therapy session and the longer-term effects at the end of the treatment period. Occasionally effects are also assessed after some interval of time after the end of therapy as a follow-up assessment. The immediate effects of the session are often measured by self-reports of how the subject feels, the anxiety level and the mood state, and saliva samples are taken for assaying stress hormone (cortisol). Sometimes a heart rate or a blood pressure measure would also be taken as a physiological index of stress. In the clinical condition of itching, as in burning during healing, some kind of temperature gauge of the itchiness immediately following the treatment would be made. Similarly, if the pain condition of juvenile rheumatoid arthritis was the subject of study, the immediate effects might be the response to a dolorimeter, which is a pressure gauge that determines the threshold beyond which the subject could no longer tolerate the pressure of the rod-like dolorimeter. Longer-term measures are, of course, the gold standard or criteria for the success of the therapy. Typically the longer-term measures include a clinical index such as the number of back pain-free days or number of migraine-free days, the glucose levels or the pulmonary measures taken in children with asthma. Those might also include a change in the level of depression and a change in the level of urinary stress hormones (noradrenaline (norepinephrine), adrenaline (epinephrine)).
The importance of having converging variables from several levels (behavioural, physiological and biochemical) cannot be overstated. Almost invariably, self-report measures are taken on pencil and paper forms. Recently, manual physiological assessments have been popular, such as heart rate and blood pressure. More sophisticated measures, such as vagal tone, which needs to be derived from the respiratory sinus arrhythmia of heart rate, are more difficult to collect because of the sophisticated equipment required and the data reduction that not only requires technical expertise but is also considered labour-intensive. Serious consideration needs to be given to biochemical assays and the significance of those to the research because assaying salivary and urinary cortisol levels requires expensive assay kits or neuroscientists in an expensively equipped laboratory. For example, a salivary cortisol assay costs $25. If you consider that at least two (one pre- and one posttherapy session) would need to be taken at each of the two assessment periods (first day, last day), the salivary cortisol protocol for each subject would cost approximately $100.
Other measures include sleep/wake behaviour observations which are typically valuable as they indicate how sleep and wake behaviour can be significantly affected across treatment and as the clinical condition changes. Because sleep and wake behaviour are the best index of the subject’s functioning, these are important measures. They do, however, require training of observers either to conduct live observations by using time sample unit coding systems or laptop computers or to code videotapes of the behaviour if it has been videotaped. This then requires assessing interobserver reliability or the process whereby observers come to agree on the behaviours that they are observing and to code them similarly. The standard for interobserver reliability is 90%; that is, that the two observers agree on 90% of the time sample intervals on the behaviours being observed. This requires significant amounts of practice time on the part of the observers and interobserver reliability assessment time. Interobserver reliability needs to be calculated using what is called a kappa coefficient, a statistical calculation that corrects for chance disagreement.
Other important variables have already been mentioned, including the gold standard clinical measure that is typically performed by the physician. Sometimes when children are involved, it is important to tap measures of parental stress and mood state to determine whether their stress may be affecting the child’s clinical course and whether the child’s clinical course is, in turn, affecting them.
Procedures
The treatment and research procedures need considerable attention and careful thought prior to the beginning of the study. In a sense, it is good to have completed the first half of the paper (the introduction and the methods) before starting the study so that they can be critiqued by colleagues and collaborators and so that every person in the treatment and research process is ‘on the same page’.
One of the most important aspects of the research procedure is that the observers be blind to the hypotheses of the study and to the subject’s group assignment. Otherwise their treatment of the subject and their observations would be biased by knowing the intent of the study and the subject’s group assignment. Having multiple assessors and multiple observers often prevents this biasing process but training each of the individuals and then working to achieve interobserver reliability is a costly process.
Similarly, the treatment procedure requires careful thought. On the one hand, the procedures need to be extremely detailed and in most cases need to cover many muscle groups and different parts of the body to be effective. However, there are also cost considerations, like the length of the session, both for the cost of the study but also for the cost of transporting the treatment once it is determined to be effective. Most individuals are unable to afford more than one half-hour session of a professional massage therapist per week and even if their significant other is trained in the procedure, that person is not likely to conduct massages more than a couple of times a week at 20–30 minutes per session. Thus, these are important time and cost-effectiveness considerations. Volunteer massage therapists can often be found for research studies given that they appreciate the research experience, particularly when that involves children, whom they are rarely able to see otherwise. The research can end up being fairly inexpensive in that way. But, at the termination of the study, the subjects are less likely to continue their treatment if the procedure that has proven effective is too costly. Thus, in our studies we try to limit the therapy sessions to once or twice a week at approximately 20 minutes a session and have a built-in period for training a significant other to conduct the massage following the end of the study and we provide that individual with a video demonstration of the therapy procedure. In the case of children, parents are often the therapists for the massage studies and this becomes part of their ritual (typically bedtime ritual) that helps not only the children but themselves as well. We have documented that the therapist benefits from providing the massage in the same way that the recipient benefits (Field et al. 1998a). Massage therapy by parents, of course, is a very cost-effective procedure and one that not only helps the child’s clinical condition but also helps make the parent feel empowered as part of the treatment process and helps the relationship between parent and child.
In the next section, special considerations are reviewed for specific research protocols, including growth and development prenatally and in early infancy, attention deficit disorders, psychiatric conditions and addictions, pain syndromes, autoimmune conditions including asthma, diabetes and dermatitis, and immune conditions including human immunodeficiency virus (HIV) and breast cancer.
Specific considerations for specific conditions
Growth and development
Prenatal development
Elevated prenatal cortisol has been associated with several negative conditions, including excessive fetal activity, delayed fetal growth and development, prematurity and low birthweight (Wadwha, 2005 and Field et al., 2009). These data highlight the importance of conducting stress and depression-reducing interventions during pregnancy. In recent studies we have documented that pregnancy massage conducted twice a week over the last trimester of pregnancy reduced perinatal complications, the most important one being a reduction in prematurity (Field et al. 1999). In addition, the mothers’ leg and back pains were reduced and, in turn, they were able to sleep better. In a more recent study we looked at the effects of pregnancy massage specifically on depressed mothers with the expectation that massage would not only reduce depression and anxiety but would also improve the newborn outcome, including a reduction in neonatal stress hormones and complications such as prematurity and an increase in birthweight (Field et al. 2004). Over the course of that study, the massage groups again experienced fewer symptoms and less prematurity. This study had the additional cost-effective advantage of showing that significant others could serve as the massage therapists. We later replicated this study using the women’s partners and found similar results (Field et al. 2008a).
Labour massage
We also conducted a labour massage study using the significant others as the massage therapists (Field et al. 1998c). In that study we were able to reduce labour pain simply by the significant other being taught the labour massage and giving the massage for the first 15 minutes of every hour of labour. The labours were shorter, the need for labour medication was less and the mothers were hospitalized for a shorter period of time and experienced less postpartum depression.
Preterm growth and development
Unfortunately for some infants, preterm deliveries are unavoidable and those infants are hospitalized in neonatal intensive care units (NICUs) for sometimes 2–5 months. Prematurity-related stress is accompanied by the iatrogenic stresses of the nursery, including loud sounds and bright lights. At some point when the newborn is no longer in medical jeopardy, the only reason for the infant remaining in the intensive care unit is to gain sufficient weight to be discharged. At this time we conduct massage. Typically we have conducted the massages for three 15-minute periods a day for a 10-day period, although, in one study, we were able to establish a 47% weight gain (same weight gain achieved in the 10-day study) in a 5-day period (Dieter et al. 2001). Therefore, we have converted to using the more cost-effective 5-day treatment period. We are also trying to teach the parents to continue the massage following discharge.
In our early studies, the most important measure was weight gain. We also learned a significant amount about the infant’s state by recording 45-minute sleep/wake sessions. We were able to document that indeterminate sleep (a sleep state that is very difficult to code because it is disorganized and does not look like deep sleep or active sleep) was a very important variable. In fact, the only neonatal variable that was noted to relate to childhood IQ was the amount of indeterminate sleep (which was negatively related). The other critical measures were the number of days in the hospital and the cost associated with that. We were able to document that we could save $4.7 billion in hospital costs if we were to provide 10 days of massage to the approximately 470,000 infants born prematurely in the US each year (saving $10,000 in hospital costs per infant). NICU costs have increased significantly since that time period, so the cost savings would now be even greater. One of the other important variables was conducting the Brazelton newborn assessments with these infants. If we had not known how much more responsive the baby was to social stimulation following the massage therapy, we would never have been able to hypothesize why these infants go on to have a weight and developmental advantage at 8 months postdischarge. Knowing that their newborn behaviour was more responsive, we argued that their interactions with their parents were better and thus, the infants were able to ‘pull’ better stimulation from their parents and eventually show better growth and development (Field et al. 1987).
In a more recent study we added growth hormone (insulin-like growth factor-1 or IGF-1) and insulin measurements (Field et al. 2008b). These variables are considered important for growth. In an earlier study we speculated that the underlying mechanism for the weight gain in the preterm babies following massage was that their vagal activity (activity of the 10th cranial nerve, the vagus) was increased and thereby gastric motility was increased, leading to more efficient food absorption (Diego et al. 2005). More hormones were being released for more efficient food absorption, since that is the function of the vegetative branch of the vagus nerve. Vagal activity was measured and shown to increase and plasma samples of insulin and IGF-1 increased (one of the more active growth hormones) (Field et al. 2008b). Having the additional measure of IGF-1 and insulin enabled us to determine further the underlying mechanism. Knowing underlying mechanisms is the most effective way to ensure that the massage gets put into practice in NICUs.
Attention and attention disorders
Two of the most reliable indicators of attention are vagal activity and EEG patterns. The stimulation of the 10th cranial nerve, the vagus, is critical for attention. Attention is accompanied by slower heart rate and increased vagal activity. Vagal activity is the heart rate that accompanies sinus arrhythmia, so it can be easily transferred from heart rate recordings by a computer program. The very expensive $6000 vagal tone monitor is not necessary, although it more readily yields vagal activity than using the computer package. EEG also requires sophisticated equipment and technical expertise for reducing the data. EEG patterns that accompany attentiveness include decreased alpha, decreased beta and increased theta. In one study we have documented this pattern of heightened alertness/attentiveness following 15-minute chair massages in the subjects’ offices (Field et al. 1997a). The heightened alertness/attentiveness EEG pattern was accompanied by improved performance on math computation tasks, including being able to perform them in less time with greater accuracy following the massage sessions. We have also documented enhanced performance by infants on habituation tasks (primitive learning tasks involving learning that a repeated stimulus like the sound of a bell, used in the Brazelton scale, becomes irrelevant because it does not signal anything else happening and is therefore no longer responded to) (Cigales et al. 1997). Finally, in a study on preschoolers we were able to show that following a brief massage the children were able to perform IQ tasks in less time and more accurately (Hart et al. 1998).
Children with attentional disorders
We have shown that children with autism (Field et al., 1997 and Escalona et al., 2001) and children with attention deficit hyperactivity disorder (ADHD) (Field et al., 1998 and Khilnani et al., 2004) are also able to perform better and stay on task for longer periods of time following massage therapy sessions. In the study on children with autism, we recorded their classroom behaviour, including how much time they were on task, how little attention they paid to irrelevant stimuli, how much stereotypic behaviour they showed and how much social relatedness was observed toward the teacher (Field et al. 1997b). Following massage therapy (two sessions per week by a massage therapist) these children were able to stay on task longer and relate more to their teachers. In a subsequent study we used parents as the therapists and we were able to show that the children not only spent more time being attentive and on task in the classroom but they also showed fewer sleep problems (Escalona et al. 2001). Since sleep problems are prevalent in children with autism, the sleep diaries recording the onset of sleep, duration of sleep and number of sleep wakings were a critical measure and one that surprisingly improved over this brief period of time.
Because parents of children with autism may be biased toward seeing any signs of improvement, it is particularly important to have converging measures. Since sleep behaviour is likely to affect classroom behaviour, classroom behaviour is worth observing. In addition, in some studies we have used time lapse video equipment to record nighttime sleep by simply turning on a nightlight and running the time lapse video camera. Subsequently, 8 hours of sleep can be coded in 1 hour. The movements on the tape look like Charlie Chaplin moving about, so they are very easy to record if you are interested in a gross rating of activity. Gross activity can differentiate deep sleep from active sleep and, of course, night wakings and moving out of the bed are easy to code from videotapes.
A measure somewhere between the use of videotaping, which can be somewhat intrusive and requires compliance of the subject, and the more subjective sleep diaries is a device called an actometer. The actometer is a Timex watch with the spring removed such that every time the subject moves his or her arm, the time hand on the watch moves so that the time elapsed from nighttime to daytime is a total amount of activity that has occurred during nighttime sleep. These are easy to use with all age groups.
Similar measures were used with ADHD children (Field et al. 1998b) and, in a subsequent study, with adolescents with ADHD (Khiilnani et al. 2004). Here again the most meaningful measures are those taken in the classroom where the children express and perhaps experience their worst problems. In these studies, we also used teacher and parent rating scales (the Conners Scale because it is shorter than the Child Behavior Checklist, which is another measure of the same type that is frequently used by parents and teachers).
Psychiatric conditions
Most psychiatric conditions are accompanied by depression or at least depressed mood state and anxiety. Thus, in all psychiatric conditions we have studied, we have used the following self-report scales:
• the Profile of Mood States (McNair et al. 1971) which taps depressed mood state, anxiety, anger and confusion.
• the Center for Epidemiological Studies Depression (CESD) scale (Radloff 1977) or the Beck Depression Inventory (Beck et al. 1961), both of which measure depressive symptoms. We have found that the CESD is more sensitive (detects more individuals with depressive symptoms) and is also more user-friendly or simpler, such that adolescents and less educated people are more able to complete this instrument.
• the State Trait Anxiety Inventory (Spielberger et al. 1970) (which also has a children’s version called the STAIC) assesses state anxiety (current, short-term anxiety) and trait anxiety (closer to being a personality trait). The authors of these scales have now created two new instruments that measure depression and anger. These 20-item Likert-type scales are extremely easy to complete and have good psychometric properties, including good test–retest reliability.
In psychiatric conditions we also try to have a converging measure of behaviour, including the symptoms that are reported in the self-report scales such as depressed affect, behavioural agitation and angry behaviour. For behavioural observations we designed the Behavior Observation Scale. These behaviours, along with others, are rated on a five-point scale following a brief observation. Typically, like the mood scales which are completed by the subject before and after the therapy session, the behavioural observations are also made before the massage therapy session and after.
The third set of measures we have invariably collected in psychiatric conditions are saliva (for an assay of the stress hormone cortisol before and after the first and last massage therapy session) and urine (for assays of the stress neurotransmitters noradrenaline and adrenaline and the body’s natural antidepressants, dopamine and serotonin). The salivary measure of cortisol is taken as an immediate index of the reduction of stress during the therapy session and the urinary assays are made to assess the longer-term effects of massage therapy over the course of the study.
We have recorded the above self-help, behavioural observation and biochemical samples in all of our studies on depressed children and adolescents (Field et al. 1991) and depressed mothers (Jones & Field 1999), as well as the studies we have conducted on anorexia (Hart et al. 2001) and bulimia (Field et al. 1998d) to assess the longer-term effects of the massage therapy treatment. While depression, eating disorders and addictions are considered to have an underlying depression base, which suggests the use of depression measures across studies, there are also, of course, measures that are unique to each of the conditions. Measures that were unique to the different studies include the following:
• Having time lapse videotaped sleep during the child and adolescent psychiatry study as well as a set of nurses’ ratings of the children’s and adolescents’ behaviour on the psychiatric unit (Field et al. 1992).
• Additional measures for the eating disorder study included the Eating Disorders Inventory (Garner et al., 1983 and Field et al., 1998).
• For a smoking addiction study, the number of cravings and cigarettes smoked were recorded (Hernandez-Reif et al. 1999).
• For the depressed adult studies we have also recorded EEG. Depressed individuals typically have relative right frontal EEG during the expression or reception of emotional expressions. The right frontal area of the brain is an area for processing negative emotions. Activation of this area has been noted to shift to symmetry following massage therapy (Jones & Field 1999). Thus, in the depression studies we have employed the use of EEG (see Appendix D for an elaboration of these methods).
• For the posttraumatic stress disorder group of children (following Hurricane Andrew), we also employed the children’s self-drawings using magic markers as an index of the children’s change in depression (Field et al. 1996). The drawings are simply scored on seven points, including: (1) small self-figure on page; (2) use of dark colours; (3) missing facial features; (4) sad face; (5) distorted figure; (6) displaced body parts; and (7) agitated lines. Typically, depressed children have made drawings that feature very few facial features, distorted body parts and a small figure on the page. Self-drawings are a very reliable index of the children’s mood state.
Pain syndromes
For pain alleviation, we have studied a number of pain syndromes, including migraine headaches, lower-back pain, premenstrual syndrome, pain from burns, fibromyalgia and juvenile rheumatoid arthritis. In these conditions we have used very similar self-report scales on pain, including the McGill Pain Questionnaire, the Pain Intensity Scale and a visual analogue scale which is generally in the form of a ruler with ratings along its scale or a thermometer with similar ratings or a series of sad to happy faces in the case of children’s ratings of pain. In the case of adults these self-report scales are completed by the adults alone and, in the case of children, we often have parents rating the amount of pain existing as well as the physician making a rating. For each of the syndromes we have also used measures unique to that syndrome for assessing functioning of the individual. So, for example, for the burn subjects who we hoped would have higher pain thresholds following the massage therapy sessions, we assessed their affective reaction to debridement (skin brushing). For juvenile rheumatoid arthritis, we had a parent’s assessment of the child’s ability to continue activities of daily living (Field et al. 1997c). For lower-back pain we had functional assessments of range of motion and the ability to touch toes (Hernandez-Reif et al. 2001). For fibromyalgia we used a dolorimeter (a rod that exerts pressure until the patient winces, which represents the pain threshold) (Sunshine et al. 1996) and, for migraine headaches, we had a measure of headache-free days (Hernandez-Reif et al. 1998).
Because anxiety exacerbates pain syndromes, we also used anxiety scales (State Trait Anxiety Inventory) to assess the pre–post massage therapy session anxiety levels. In addition, we used salivary cortisol as the secondary index of anxiety/stress levels before and after massage therapy sessions. In addition, because sleep is considered disturbed in most pain syndromes, either because of the pain or because the sleep syndrome is contributing to the pain (the direction of effects is not certain here), we have used sleep recordings (typically sleep diaries). More recently, because we have come to notice that in all of the pain syndrome studies sleep improved following massage therapy and pain in turn was reduced, we are now trying to get better measures of sleep, including actometer readings during sleep. One current theory about the origins of pain syndromes is that there is insufficient quiet or restorative sleep and when that happens, there are increased levels of substance P which causes pain. Because substance P can be measured in salivary samples, we are assaying substance P at the beginning and end of the massage therapy treatment periods.
Autoimmune disorders
Once again, because stress and particularly stress hormones such as cortisol are known to interact and affect autoimmune and immune conditions, we measure cortisol in saliva before and after the massage therapy sessions and in urine at the beginning and end of the study period. In all of the autoimmune diseases we have studied to date (asthma, diabetes and dermatitis) we have studied children and used the parents as the massage therapists. We know that the parents are also likely to benefit from the therapy because grandparent volunteer massage therapists who massaged infants became less depressed and had lower cortisol levels (Field et al. 1998a).
For asthma, the gold standard clinical measure is the peak air flow monitor value recorded by the child and parent (Field et al. 1998e). Typically these are done on a daily basis and a diary-like recording is made. The pulmonologist also has standard pulmonary measures that are recorded at the beginning and the end of the study.
In the case of diabetes, a self-report measure was completed by the parents, in this case on the child’s glucose regulation, insulin and food regulation and exercise (Field et al. 1997d). In addition, glucose levels were measured by the children and their parents using a calibrated glucometer which we provided. Here it was important for the readings to be taken at the same time of day by the parents.
Immune disorders
One of the immune measures that seems to be invariably improved following massage therapy across all of our immune studies, including HIV in adults (Ironson et al. 1996), HIV in adolescents (Diego et al. 2001) and breast cancer (Hernandez-Reif et al. 2004, 2005), is the increased production of natural killer cells. Natural killer cells are considered the front line of the immune system and they ward off viral and cancer cells. In more immune-compromised conditions such as the study on HIV men, the CD4 cell number was so low that it was not possible to reverse those numbers, whereas in the less immune-compromised adolescent HIV study, we were able not only to increase natural killer cells but also reverse the CD4 cell number and improve the CD4:CD8 ratio (the HIV disease marker).
Statistical analyses
The statistical analyses should be left to a statistician or a PhD trained researcher on the project. While the actual use of the statistical analysis software is like following a recipe, it is important to know the appropriate statistics to use and how to interpret the results. A brief description of the basics will be given here.
In any group treatment comparison research, the scores or values of the variables are averaged across the group to obtain a mean score/rating/value. The distribution of the individuals around the mean performance is called variability and a typical distribution would have a mean in the middle of a line with most of the individuals falling in a hill around the mean but as individuals depart from the group value, they become farther out on ‘downward slopes of the hill’ either to the left or to the right of the peak of the hill. To give an example, the mean IQ score for the population at large is 100. However, there are many individuals who perform higher and many who perform lower. One standard deviation from the mean would be 16 points higher, or a score of 116, or 16 points lower, or a score of 84. Two standard deviations would be twice 16 or 132 or 100 minus 32, and very few people would be out beyond the two standard deviations. The standard deviation is the term for variability.
A simple statistical comparison between groups can be made by a t-test where the group means and the group standard deviations are taken into consideration to arrive at a t-value which is then indicated as being significant if this value could only happen five times out of 100 times or significantly more often than chance (which would be five in 100 times) (at the P ≥ 0.05 level). Typically t-values greater than 2.00 are significant. After performing a t-test by hand on the calculator or computer, the t-values can be looked for in a table of t-values and the P-value or significance level also checked. The significance level only indicates whether the test result was statistically significant, suggesting that the groups were significantly different on that value.
Another group comparison test is called the F-test, which is basically the same as a t-test but is performed when there are more than two groups being compared. The test for yielding an F-value is called the analysis of variance (ANOVA). Once again, an F-test would be checked in a statistics table to determine the P-level (significance level). Typically F-values greater than 4.00 are significant. More complex ANOVAs can be performed when there are more than two groups and more than three variables. These are called MANOVAs, which is an abbreviation for multivariate analyses of variance. Whenever there are multiple variables and multiple groups, a MANOVA is performed on the group of variables followed by post hoc ANOVAs on each of the variables. The MANOVA indicates whether the groups are significantly different on the group of variables as a whole. Subsequently, the ANOVAs are conducted to determine whether the groups are different on each of the individual variables. If there is more than one independent measure describing the group, for example age and gender, and the MANOVAs and ANOVAs yield significant differences, it is then necessary to conduct post hoc t-tests to test all the possible comparisons.
Another way of looking at the data is to determine the relationships between the variables. For example, does gender relate to anxiety scores such that higher anxiety scores are noted for males versus females? The entire group of variables can be entered into a correlation analysis and the relationships between variables can be determined. The computer program prints out a matrix of correlation coefficients that range from 0 to 0.99. If gender and anxiety are correlated 0.83, this is an extremely high correlation or a strong relationship. If anxiety levels run from low to high with higher values reflecting higher anxiety and males are classified as a 1 and females as a 2, then the relationship between gender and anxiety would be a negative 0.83 (−0.83) relationship, with males having higher anxiety. If females had higher anxiety, the correlation coefficient would be a positive 0.83. Again, the table of numbers is checked for the P-level (if the computer output does not provide that).
Further analyses can be conducted, for example stepwise regression analyses. In this analysis it is possible to determine the relative importance of the predictor variables or the independent variables. Again, if we are discussing gender and anxiety we would enter into the stepwise regression analysis the anxiety score as the dependent measure (or the outcome measure) and gender along with age would be entered as the predictor variables. If gender has a high correlation (0.83), as was already noted, the computer program will enter gender as a predictor variable into the equation. This would be interpreted as explaining 64% of the outcome variance or variability. If you multiply 0.83 (which is the R or the correlation coefficient by itself to get the R square or variance), this would tell you how much the variable gender is contributing to the outcome variable anxiety (64%). If then age came into the equation at the second step and the correlation coefficient was 0.91 with an R square of 0.81 or 81%, that would have added 17% to the variance (64% plus 17% equalling 81%).
These, then, are some of the simplest analyses performed in treatment research. Many things are considered in selecting the types of data analysis, including whether the data are normally distributed or skewed, for example. It may be necessary to use non-parametric statistics (instead of the parametric statistics just described) because the database fails to meet the required assumptions to perform the parametric analyses. These considerations are complex and understanding them as well as using them appropriately requires considerable coursework in statistics.
Acknowledgements
We would like to thank the subjects who participated in our research and the researchers who collected and analyzed data. This research was supported by a National Institute of Mental Health (NIMH) Senior Research Scientist Award (MH#00331) and a National Center for Complementary and Alternative Medicine Senior Research Scientist Award (AT#001585) and by a merit award from NIMH to Tiffany Field (MH#46586) and by funding from Johnson & Johnson Pediatric Institute.
References
Beck, A.; Ward, C.; Mendelson, M.; et al., An inventory for measuring depression, Arch. Gen. Psychiatry 4 (1961) 561–571.
Cigales, M.; Field, T.; Lundy, B.; et al., Massage enhances recovery from habituation in normal infants, Infant Behav. Dev. 20 (1997) 29–34.
Diego, M.A.; Hernandez-Reif, M.; Field, T., Massage therapy effects on immune function in adolescents with HIV, Int. J. Neurosci. 106 (2001) 35–45.
Diego, M.A.; Field, T.; Sanders, C.; et al., Massage therapy of moderate and light pressure and vibrator effects on EEG and heart rate, Int. J. Neurosci. 114 (2004) 31–35.
Diego, M.A.; Field, T.; Hernandez-Reif, M., Vagal activity, gastric motility, and weight gain in massaged preterm neonates, J. Pediatr. 147 (2005) 50–55.
Dieter, J.; Field, T.; Hernandez-Reif, M.; et al., Maternal depression and increased fetal activity, J. Obstet. Gynaecol. 21 (2001) 468–473.
Escalona, A.; Field, T.; Singer-Strunk, R.; et al., Brief report: Improvements in the behavior of children with autism following massage therapy, J. Autism Dev. Disord. 31 (2001) 513–516.
Field, T.; Scafidi, F.; Schanberg, S., Massage of preterm newborns to improve growth and development, Pediatr. Nurs. 13 (6) (1987) 385–387.
Field, T.; Morrow, C.; Valdeon, C.; et al., Massage reduces anxiety in child and adolescent psychiatric patients, J. Am. Acad. Child Adolesc. Psychiatry 31 (1992) 125–131.
Field, T.; Seligman, S.; Scafidi, F.; et al., Alleviating posttraumatic stress in children following Hurricane Andrew, J. Appl. Dev. Psychol. 17 (1996) 37–50.
Field, T.; Quintino, O.; Henteleff, T.; et al., Job stress reduction therapies, Altern. Ther. 3 (1997) 54–56.
Field, T.; Lasko, D.; Mundy, P.; et al., Brief report: autistic children’s attentiveness and responsivity improved after touch therapy, J. Autism Dev. Disord. 27 (3) (1997) 333–338.
Field, T.; Hernandez-Reif, M.; Seligman, S.; et al., Juvenile rheumatoid arthritis: Benefits from massage therapy, J. Pediatr. Psychol. 22 (1997) 607–617.
Field, T.; Hernandez-Reif, M.; Lacoreca, A.; et al., Massage therapy lowers blood glucose levels in children with diabetes mellitus, Diabetes Spectrum 10 (1997) 237–239.
Field, T.; Hernandez-Reif, M.; Quintino, O.; et al., Elder retired volunteers benefit from giving massage therapy to infants, J. Appl. Gerontol. 17 (1998) 229–239.
Field, T.; Quintino, O.; Hernandez-Reif, M.; et al., Adolescents with attention deficit hyperactivity disorder benefit from massage therapy, Adolescence 33 (1998) 103–108.
Field, T.; Hernandez-Reif, M.; Taylor, S.; et al., Labor pain is reduced by massage therapy, J. Psychosom. Obstet. Gynecol. 18 (1998) 286–291.
Field, T.; Schanberg, S.; Kuhn, C.; et al., Bulimic adolescents benefit from massage therapy, Adolescence 33 (1998) 555–563.
Field, T.; Henteleff, T.; Hernandez-Reif, M.; et al., Children with asthma have improved pulmonary function after massage therapy, Journal of Pediatrics 132 (1998) 854–858.
Field, T.; Hernandez-Reif, M.; Hart, S.; et al., Pregnant women benefit from massage therapy, J. Psychosom. Obstet. Gynecol. 20 (1999) 31–38.
Field, T.; Diego, M.A.; Hernandez-Reif, M.; Schanberg, S.; Kuhn, C., Massage therapy effects on depressed pregnant women, J. Psychosom. Obstet. Gynecol. 25 (2004) 115–122.
Field, T.; Figueiredo, B.; Hernandez-Reif, M.; et al., Massage therapy reduces pain in pregnant women, alleviates prenatal depression in both parents and improves their relationships, J Bodyw Mov Ther 12 (2008) 146–150.
Field, T.; Diego, M.; Hernandez-Reif, M.; et al., Insulin and Insulin-Like Growth Factor I (IGF-1) Increase in Preterm Infants Following Massage Therapy, J. Dev. Behav. Pediatr. 29 (2008) 463–466.
Field, T.; Diego, M.; Hernandez-Reif, M.; et al., Pregnancy massage Reduces Prematurity, Low Birthweight and Postpartum Depression, Infant Behav. Dev. 32 (2009) 454–460.
Garner, D.M.; Olmstead, M.P.; Polivy, J., The Eating Disorders Inventory: a measure of cognitive behavioral dimensions of anorexia nervosa and bulimia, In: (Editors: Darby, P.L.; Garfinkel, P.R.; Garner, D.M.; et al.) Anorexia nervosa: recent developments in research (1983) Alan R Liss, New York, pp. 173–184.
Hart, S.; Field, T.; Hernandez-Reif, M.; et al., Preschoolers’ cognitive performance improves following massage, Early Child Dev Care 143 (1998) 59–64.
Hart, S.; Field, T.; Hernandez-Reif, M.; et al., Anorexia nervosa symptoms are reduced by massage therapy, Eat. Disord. 9 (2001) 289–299.
Hernandez-Reif, M.; Dieter, J.; Field, T.; et al., Migraine headaches are reduced by massage therapy, Int. J. Neurosci. 96 (1998) 1–11.
Hernandez-Reif, M.; Field, T.; Hart, S., Smoking cravings are reduced by self-massage, Prev. Med. 28 (1999) 28–32.
Hernandez-Reif, M.; Ironson, G.; Field, T.; et al., Immunological responses of breast cancer patients to massage therapy, Int. J. Neurosci. 115 (2005) 495–510.
Hernandez-Reif, M.; Field, T.; Krasnegor, J.; et al., Lower back pain is reduced and range of motion increased after massage therapy, Int. J. Neurosci. 106 (2001) 131–145.
Hernandez-Reif, M.; Ironson, G.; Field, T.; et al., Breast cancer patients have improved immune and neuroendocrine functions following massage therapy, J. Psychosom. Res. 1 (2004) 1–8.
Ironson, G.; Field, T.; Scafidi, F.; et al., Massage therapy is associated with enhancement of the immune system’s cytotoxic capacity, Int. J. Neurosci. 84 (1996) 205–218.
Jones, N.A.; Field, T., Massage and music therapies attenuate frontal EEG asymmetry in depressed adolescents, Adolescence 34 (1999) 529–534.
Khilnani, S.; Field, T.; Hernandez-Reif, M.; et al., Massage therapy improves mood and behavior of students with attention deficit/hyperactivity disorder, Adolescence 152 (2004) 623–638.
McNair, D.M.; Lorr, M.; Droppleman, L.F., POMS – profile of mood states. (1971) Educational and Industrial Testing Services, San Diego, CA.
Radloff, L., The CES-D Scale: a self-report depression scale for research in the general population, Applied Psychological Measures 1 (1977) 385–401.
Spielberger, C.D.; Gorusch, T.C.; Lushene, R.E., The State Trait Anxiety Inventory. (1970) Consulting Psychologists Press, Palo Alto, CA.
Sunshine, W.; Field, T.; Schanberg, S.; et al., Massage therapy and transcutaneous electrical stimulation effects on fibromyalgia, J. Clin. Rheumatol. 2 (1996) 18–22.
Wadwha, P.D., Psychoneuroendocrine processes in human pregnancy influence fetal development and health, Psychoneuroendocrinology 30 (8) (2005) 724–743.