10. Massage therapy research methods
Tiffany Field
Chapter contents
An evolving research process199
Specific considerations for specific conditions206
Statistical analyses212
An evolving research process
Historical background
Massage therapy dates to prerecorded time and was epitomized by Hippocrates in 400 bc as ‘medicine being the art of rubbing’. Research in the field of massage therapy also goes back many years. The first academic journal publications date back to the 1930s when massage therapy research on humans and animals was fairly popular. Many of those research projects focused on documenting the increased blood flow associated with massage therapy as well as reducing muscle atrophy. Although many of the questions then are the same questions now, the approach to research was limited by the measurement technology and the studies often featured either single cases or very small sample sizes, which were typically self-selected samples of clinical patients being treated for one condition. Measurement technology, for example, was limited to physiological measures such as heart rate, blood pressure and temperature and was basically a biomechanical model. The advent of biochemical assay technology has enabled more expansive models. Even in the last few years the ability to assay neurohormonal activity in non-invasive procedures has advanced the field significantly in terms of looking at underlying mechanisms.
Other methodological problems were that the control groups were non-treatment groups that did not control for attention from the therapist. Even more recently, massage therapy has been compared to other treatments such as relaxation therapy. However, these group comparisons were confounded by compliance problems as relaxation therapy is often viewed as requiring work and concentration. Thus we have been using a light pressure massage as a sham massage therapy group to control for touch/attention effects inasmuch as recent studies have documented the need for moderate pressure to achieve positive massage therapy effects (Diego et al. 2004).
The research question
Typically the research question for massage therapy studies is whether massage therapy is an effective and cost-effective treatment for a given condition. Much of research is ‘me-search’, the questions often derived from the investigator’s personal interest, such as somebody close having experienced that condition or because that is a condition seen in one’s practice or research setting or because the condition is a recent funding priority and the research is intended to provide pilot data for seeking funding in that area. Determining how effective the massage therapy is means finding meaningful variables such as the gold standard variables for that particular condition, designing the most effective massage therapy technique for that condition and selecting the most appropriate treatment comparison group and attention control group. To address these questions typically leads first to a literature search.
Literature search
PubMed (accessed through Google) is the biggest source of current literature abstracts. Although it is fascinating to read the older literature, which often serves as a source of good ideas for replication studies using more sophisticated approaches to problems, the typical published paper features references from the last decade. Thus, literature searches are typically confined to the last decade. In searching through the abstracts yielded by the computer literature search, researchers look to see if the question has already been addressed, if the condition has been treated by massage therapy and if the literature suggests the next steps. Entering a term for the condition along with massage therapy is likely to yield the most specific information needed. However, starting with a more global approach using simply the terms ‘treatment’ or ‘therapy’ and the name of the condition would yield significantly more abstracts and more general information about:
• the condition
• the hypothesized underlying aetiology
• the gold standard and other measures that have been used in research on other treatments of the condition
• ideas for treatment comparisons that might serve as an attention control group.
The literature search can serve as the background for the first part of the paper, the introduction. To be sure one knows the problem being addressed and the methods used to study that problem, it is always a good idea to write the first half of the paper before starting the study. The literature search will provide background on the incidence of the problem, the symptoms, possibly the aetiology or hypothesized aetiology, previous treatments both allopathic and alternative medicine and the efficacy of those treatments. Once the background and methods sections are drafted, the paper can serve as a proposal that can go to potential collaborators who will facilitate the research.
Selection of collaborators
Selecting a clinic or a hospital setting or a school setting for non-clinical problems is always advantageous given their provision of participants for the research. Another advantage is that clinical settings are often the places where potential collaborators work, such as allopathic or alternative physicians in osteopathic medicine. Having a medical collaborator is important in terms of being able to keep abreast of the most important clinical measures for the condition being studied, having a referral source and someone who can administer the clinical measures and having clinically relevant research that is considered credible by potential journals for publication and by potential reviewers for grant funding. Important scientific collaborators include a neuroscientist for assays of biochemical measures or for interpretation of physiological, e.g. electroencephalogram (EEG), data and a statistician or PhD researcher to assist with designing and conducting the statistical analyses for the project. Massage therapy collaborators are also needed, particularly if the researcher is not a massage therapist, for the design of the massage therapy procedure to be used and to help identify measures that can directly assess the effects of that procedure. Another important collaboration consideration is locating a source of volunteer massage therapists for the actual treatments or for demonstration of the treatments if parents or significant others are going to be the therapists.
Selecting treatment and attention control comparison groups
Traditionally, the alternative treatment group has been compared to a standard treatment control group, for comparison assessments made on the first and last days of treatment. However, a potential placebo effect or an effect of the therapist simply providing attention to the subject has highlighted the need for using treatment comparison and/or attention control groups. In much of our early work we used relaxation therapy as a comparison treatment group because relaxation therapy has been shown to be effective particularly in alleviating stress and anxiety, which often exacerbate the medical conditions we are studying. Also, we considered it important to establish a greater efficacy of massage therapy (versus relaxation therapy) in order to justify the greater expense of massage therapy treatment. The problem we found was that it may be a biased comparison inasmuch as people view relaxation therapy as hard work, requiring significant concentration and self-discipline. Thus, we may be experiencing compliance problems when we use relaxation therapy as a control. In addition, because relaxation therapy requires a certain amount of cognitive sophistication along with a reasonable attention span, it may be too difficult for young children. Therefore, we have used attention controls such as rocking the child, holding the child or playing with toys and holding and reading to the child as comparison groups in massage therapy research with children.
More recently, since we discovered the critical importance of stimulating pressure receptors for the massage to be effective, we have elected to use a sham massage therapy procedure comparison group which receives exactly the same massage as the treatment group, but with light pressure. This also enables the subjects to be ‘naïve’ or ‘blind’ to expecting a unique effect of their particular treatment condition. The subjects or participants in each group would expect to receive some benefit from massage whether it was deep pressure (in the case of the real treatment group) or light pressure, as applied in the sham group. Double blinding is also possible insofar as the physicians who are providing the standard treatment and the massage therapists providing the experimental treatment do not necessarily have expectations that one or the other massage style is going to be more effective. This is the closest we can come to a double-blinded situation in massage therapy research. This way neither the participant nor the therapists are biased towards the treatment.
Selection of sample parameters and random assignment to groups
Demographic variables including age, gender, ethnicity and socioeconomic status are considered the most basic sampling parameters that need to be equivalent across groups. Generally, by virtue of the location and demographics of the clinical setting, the age, ethnicity and socioeconomic status of the participants are somewhat homogeneous. The ethnicity is often predominantly one ethnic group or another and the socioeconomic status is limited in range. This helps prevent the research design from being confounded by variability as a function of varying ethnicity or socioeconomic status and the random assignment to groups would be expected to result in a roughly equivalent distribution in each group. Clinics are also generally separated by paediatrics, adulthood and sometimes even by ageing, by function of the condition and by the specialty of the physician, so they are also typically homogeneous on age and condition.
Another critical background variable, particularly in medical research, is the severity of the condition. This variable is more likely to result in a heterogeneous grouping and therefore would need careful matching or stratification. Typically, participants are randomly assigned to groups by a table of random numbers or by flipping a coin and it is intended that the randomization would yield roughly equivalent groups on background variables. The most conservative way to ensure equivalent groups is to match subjects across groups. For example, in studies on premature babies, the subjects are frequently matched on birthweight and gestational age and then randomly assigned to treatment and control groups. The less conservative way is a random stratification procedure whereby cells would be made so that if there were two birthweight groups (low birthweight and very low birthweight) and two gestational age groups (short gestation and very short gestation), there would be four cells (a very short gestation and very low birthweight group, a very short gestation and low birthweight group, a short gestation and very low birthweight group, and a short gestation and low birthweight group). Subjects would then be randomly assigned to these cells and there would be a roughly equal number of subjects assigned to each cell in each group by the end of the study.
The selection of the sample size involves several considerations, including economic considerations. The typical first step for determining sample size is to conduct a power analysis to determine whether there will be enough statistical power for the data analysis given the sample size. Power can be determined by taking the difference of the two group means from a previous study and dividing that by the larger of the two standard deviations for the same means.
Despite the sample size determined by the power analysis, economic considerations constrain the sample size to a minimum. One way to remain economical is by conducting data analyses at intervals of 10 subjects per group to determine whether groups are significantly different on the key variables. If there is simply a trend for statistical significance for those variables, then the absence of significance may mean that the sample is still too small and more subjects are needed. This can be done at intervals of two subjects per group.
Selection of variables
The most important variable is the gold standard clinical variable that is typically viewed as a criterion for clinical improvement in any condition. For example, in diabetes the gold standard variable is typically the glucose level and in asthma it is typically the peak air flow measure. The clinical gold standard measure can be designated by a collaborating physician or can be found in the literature. Often there is more than one clinical gold standard measure. If there are multiple variables that would be considered redundant, some selection needs to be made. For statistical analysis reasons, researchers try to keep in mind a five-subjects-to-one-variable ratio, attempting not to be variable-heavy.
The second important set of variables are stress variables as they are thought to exacerbate any clinical condition. Because of the subjective nature of stress, it is good to have not only self-report stress variables such as the State Anxiety Index and the Profile of Mood State measures (to be elaborated later) but also a converging physiological measure, e.g. vagal tone or a chemical measure, e.g. salivary cortisol, to provide validation of the subject’s self-report on stress.
Typically, treatment research involves assessing the immediate effects of the therapy session and the longer-term effects at the end of the treatment period. Occasionally effects are also assessed after some interval of time after the end of therapy as a follow-up assessment. The immediate effects of the session are often measured by self-reports of how the subject feels, the anxiety level and the mood state, and saliva samples are taken for assaying stress hormone (cortisol). Sometimes a heart rate or a blood pressure measure would also be taken as a physiological index of stress. In the clinical condition of itching, as in burning during healing, some kind of temperature gauge of the itchiness immediately following the treatment would be made. Similarly, if the pain condition of juvenile rheumatoid arthritis was the subject of study, the immediate effects might be the response to a dolorimeter, which is a pressure gauge that determines the threshold beyond which the subject could no longer tolerate the pressure of the rod-like dolorimeter. Longer-term measures are, of course, the gold standard or criteria for the success of the therapy. Typically the longer-term measures include a clinical index such as the number of back pain-free days or number of migraine-free days, the glucose levels or the pulmonary measures taken in children with asthma. Those might also include a change in the level of depression and a change in the level of urinary stress hormones (noradrenaline (norepinephrine), adrenaline (epinephrine)).
The importance of having converging variables from several levels (behavioural, physiological and biochemical) cannot be overstated. Almost invariably, self-report measures are taken on pencil and paper forms. Recently, manual physiological assessments have been popular, such as heart rate and blood pressure. More sophisticated measures, such as vagal tone, which needs to be derived from the respiratory sinus arrhythmia of heart rate, are more difficult to collect because of the sophisticated equipment required and the data reduction that not only requires technical expertise but is also considered labour-intensive. Serious consideration needs to be given to biochemical assays and the significance of those to the research because assaying salivary and urinary cortisol levels requires expensive assay kits or neuroscientists in an expensively equipped laboratory. For example, a salivary cortisol assay costs $25. If you consider that at least two (one pre- and one posttherapy session) would need to be taken at each of the two assessment periods (first day, last day), the salivary cortisol protocol for each subject would cost approximately $100.
Other measures include sleep/wake behaviour observations which are typically valuable as they indicate how sleep and wake behaviour can be significantly affected across treatment and as the clinical condition changes. Because sleep and wake behaviour are the best index of the subject’s functioning, these are important measures. They do, however, require training of observers either to conduct live observations by using time sample unit coding systems or laptop computers or to code videotapes of the behaviour if it has been videotaped. This then requires assessing interobserver reliability or the process whereby observers come to agree on the behaviours that they are observing and to code them similarly. The standard for interobserver reliability is 90%; that is, that the two observers agree on 90% of the time sample intervals on the behaviours being observed. This requires significant amounts of practice time on the part of the observers and interobserver reliability assessment time. Interobserver reliability needs to be calculated using what is called a kappa coefficient, a statistical calculation that corrects for chance disagreement.
Other important variables have already been mentioned, including the gold standard clinical measure that is typically performed by the physician. Sometimes when children are involved, it is important to tap measures of parental stress and mood state to determine whether their stress may be affecting the child’s clinical course and whether the child’s clinical course is, in turn, affecting them.
Procedures
The treatment and research procedures need considerable attention and careful thought prior to the beginning of the study. In a sense, it is good to have completed the first half of the paper (the introduction and the methods) before starting the study so that they can be critiqued by colleagues and collaborators and so that every person in the treatment and research process is ‘on the same page’.
One of the most important aspects of the research procedure is that the observers be blind to the hypotheses of the study and to the subject’s group assignment. Otherwise their treatment of the subject and their observations would be biased by knowing the intent of the study and the subject’s group assignment. Having multiple assessors and multiple observers often prevents this biasing process but training each of the individuals and then working to achieve interobserver reliability is a costly process.
Buy Membership for Complementary Medicine Category to continue reading. Learn more here