HEALTH EDUCATION RESEARCH
Theory & Practice Vol.3 no.4 1988
Pages 381-386

POINT OF VIEW

The choice of a measure in a health-promotion study

David J.Weiss^l,2, Donna L.Walker and David Hill³

Abstract

Two general principles are proposed as a basis for the construction of a health behavior index. The goal is to suggest a way to choose a dependent variable for studies designed to compare interventions that promote medically beneficial habits. The first principle is that because it is behavior that is being modified, it is behavior that must be measured. The second principle concerns the choice among behaviors to be measured. The advocated resolution is that the behavior with the greatest medical relevance form the basis of the index. Illustrations are given for reduction of alcohol consumption and weight reduction.

Introduction

There are many kinds of outcome measures used in studies of health promotion. These may range from vague self-report ('How well did you follow the instructions?') to precise biochemical intrusion (blood alcohol level or drug screens). The researcher's selection will depend upon philosophy concerning measurement, state of substantive and statistical knowledge, technical and financial resources available, and the goal of the research. Because of the myriad possibilities open to the investigator, it is often difficult to integrate findings across studies.

It is clear that there is confusion among researchers. In some fields, such as smoking cessation, it is customary to look at behavior, i.e. smoking itself, and to base a measure on the act. In other areas, such as obesity reduction, it is usual to focus on the physical outcome. Measures based on weight loss, rather than on the act of eating, are the norm. One avenue to consensus is the adoption of broad principles that could be used for any health promotion study. In the present work we propose two such principles. A specific context for their usage might be an experiment to assess the differential efficacy of a set of interventions designed to promote compliance with medical recommendations. Our concern is with what to measure; we will not deal with the practical issues of how to carry out the studies of health promotion.

Validity is the key property of a measuring instrument. A participant who scores high must be adhering more accurately to the health-promoting regimen than one who scores lower. This representation seems obvious, but it is frequently disregarded in studies that use physiological consequences as indices of compliance.

Principles of selection

The first premise on which a measure should be based is that the index must be directly related to the treatment recommendations. It is behavior that is the object of manipulation, and so it is behavior that should be measured. Adherence to this premise would perhaps have avoided the debacle in weight reduction research when it became known that long-term weight loss was not a monotone function of caloric intake (Stunkard and Penick, 1979). While the ultimate aim of a health promotion program is to help the patient to control a physiological variable, it is presumptuous to evaluate compliance with the program by direct physiological measurement. The somatic outcome is likely to be related in a complex way to the behavior. The effectiveness of a medical regimen is a separate question from whether a patient follows it, and the physiological index confounds the two.

The second principle is that the measured behavior should be the one that has the most medical relevance. This principle stems from the ultimate purpose of the intervention, to improve the health of the patient. The principle invokes a basic tenet of functional measurement (Anderson, 1981), that theory and measure are linked. This linkage has also been discussed in the health care context by Leventhal (1985). Here, the theory is the practitioner's best guess about the relation between the behavior being altered and its health consequences. This theory determines the measure. If no such theory of health promotion is currently available to underlie the investigation, as might be the case for programs designed to curtail the use of illicit but not necessarily harmful substances, then the principle of medical relevance cannot be invoked. No recommendation for an index to be used in comparing such programs is offered.

It bears emphasis that the patient's behavior be the primary object of measurement. If the medical theory is incorrect, then the programs will not be helpful. But that determination can be made only if knowledge of the patient's compliance with the recommendations is available. To make an assessment of a medical theory, an assessment which of course does require evaluation of physiological consequences, one must establish adherence to the regimen in which that theory is embedded.

The use of these principles may be seen in the choice of a measure designed to assess the relative effectiveness of a pair of smoking reduction programs. The practitioner may urge the patient to stop smoking, but the enunciation of this recommendation is not sufficient to determine an index of how closely it is followed. An obvious measure to employ is the number of cigarettes smoked by each participant during the evaluation period. This measure hardly requires justification, in that the treatment is designed to reduce smoking. Often, though, it is the number of abstainers at a particular time which is the criterion. The justification for this choice, for counting people rather than cigarettes, is that reduced-rate smoking for formerly heavy smokers is likely to be temporary and is therefore not a proper goal of treatment (Hill et al., 1988). In addition, recent evidence (Benowitz et al., 1986) has shown that as smokers restrict the number of cigarettes consumed, the intake of toxins per cigarette increases dramatically. A third possibility is to record the amount of nicotine ingested by the members of the respective groups. This choice of measure is supported by Schachter's (1980) elucidation of the role of nicotine titration in voluntary consumption of cigarettes. If it is nicotine, rather than tar or carbon monoxide, which has the more dire medical consequences among the components of tobacco, then the case for this index would be very strong. Obviously these three measures are linked but not perfectly correlated. This means that substantive inferences resulting from statistical analyses employing them may differ. So long as variables are not linearly related, they will yield different F-ratios. This means that substantive inferences resulting from statistical analyses employing competing measures may differ.

An implication of this view is that the practitioner might well choose to define the goal of the program in terms of the specific measure. This definition of successful treatment could be intended primarily for research purposes, but might also be spelled out for the patient.

What is specifically not recommended here is a multivariate analysis employing all of the proposed indices as dependent variables. Such an analysis would provide equal opportunity to the proposed measures, weighting most heavily those which show the most variation across patients. This fairness is an abdication of responsibility by the researcher, who should know which variable is most important. To the extent the measures are not correlated, the selection affects the assessment of the cessation intervention. The choice should be made on a substantive rather than statistical basis.

If disparate behaviors are to be integrated to assess adherence with a recommended multicomponent regimen, then a weighted sum of the individual components is a possibility. Such a summative measure would be appropriate for a study manipulating compliance with an anti-hypertensive regimen, in which taking medications, altering the diet, reducing alcohol and tobacco consumption, and getting checkups are all important, though not equally so. The summation loses the individual contributions of the components, to be sure; but it allows for an overall assessment of the compliance manipulation. A common scaling, such as percentage compliance (Goldsmith, 1979) would be applied to each assigned behavior. The medically judged importance of each component would determine its weight in the overall assessment of the compliance intervention.

In addition to the major dependent variable in a study, there are sometimes other, minor aspects of interest. Missed appointments or poor record-keeping are examples of behaviors that the researcher may consider as indicative. A simple way to append such aspects is to incorporate a minor penalty into the scoring system (Lange et al., 1986). A subject's primary score is altered in proportion to the judged medical dangers of the noncompliance.

The latter anti-tubercular study also had the feature that the focal behavior, pill-taking, could be scored as either positive or negative. Normally, taking prescribed pills was a positive act. However, if pills were taken in the presence of adverse symptoms that had previously been described to the patient, the individual's score was reduced.

Integration of the various components of compliance is not the only option. The alternative is to evaluate separately the behaviors that comprise compliance, noting that a given intervention favors some and another intervention favors others. Preserving the component outcomes may be of value, especially for a multi-faceted intervention. This tactic may be especially useful in the preliminary stages of an investigation, when the intervention is under construction. In the evaluation phase, though, this choice has the pragmatic drawback that it is often necessary to implement one intervention program rather than another, and an overall assessment simplifies the decision.

Illustrations

We illustrate the thought processes that go into the choice of a measure by examining in detail two important domains, reduction of alcohol consumption and weight reduction. We consider some of the difficulties and illustrate the kind of evidence needed to resolve them.

Reduction of alcohol consumption

The repercussions of alcohol abuse have led to confusions that lie within the domain of the present proposal. For example, psychotherapeutic approaches to treatment of heavy drinkers (Davies, 1980) have often seen a need to cure the underlying causes of drinking, the hypothesized personality disorders, as well as to control the behavior. Thus, a treatment could not be evaluated solely on the basis of consumption reduction. Because alcohol has tremendous indirect health consequences in societies in which drunk driving is prevalent, whether drinkers can be induced to stay beneath the legal limit becomes a criterion for the success of treatment (Lovibond, 1975).

In keeping with the limited aims of the present proposal, we suggest that the outcome measure for reduction of alcohol consumption should be based upon the direct medical effects of the drug. If it is the case that the more a person drinks, the more he or she is at risk, then it is sensible to construct a measure based on the amount by which consumption is reduced. On the other hand, there may be threshold effects. For example, below a certain level, consumption may have no hazardous effects, as is suggested by Alden (1980). Alternatively, a person consuming at a high level might not decrease health risks unless the reduction were great enough. These possibilities would suggest that the scoring be done on the basis of how many people were above threshold at the time of measurement.

Obviously it is not possible to have unequivocal evidence about the devastating consequences of varying the magnitude and time distribution of doses over meaningfully long observation intervals; the proper experiments would be dangerous and require unethical random assignment. Short-term effects on health are not of great interest. Therefore it is necessary to rely on epidemiological data. Although we have not done so here, it might be of value to incorporate results from research with animals as well.

Epidemiological evidence suggests that the more alcohol consumed, the worse in terms of damage (Nicholson, 1980). This suggests that any consistent reduction in consumption is worthwhile. Abstinence is not a level that must be attained before a treatment can be deemed worthwhile. Emrick (1975) has argued similarly that certain treatments could be considered as improvements even though they did not lead to better abstinence rates, so long as they patients reduced amounts of drinking. Furthermore, abstinence is not a state that is easily maintained (Marlatt and Gordon, 1980). The controversial controlled-drinking studies summarized by Heather and Robertson (1983) provide evidence that heavy drinkers for whom abstinence was a goal are more likely to regress severely than those for whom moderate consumption was the goal of treatment, Nor is zero necessarily the optimum level of consumption for everyone from a medical perspective; Kaplan (1981) has observed that moderate (1-2 ounces/day) alcohol intake raises the cardioprotective high density lipoprotein levels. The physiologically safe upper limit on daily consumption has not been firmly established, but it does not appear to be zero (Popham and Schmidt, 1978).

The temporal distribution of alcohol intake has obvious medical relevance. Even if binge drinking were not directly harmful physically, its relationship to environmental trauma is sufficiently great to warrant consideration in the measure.

Invoking the criteria proposed as general determinants of a measure, we tentatively propose that a treatment program be evaluated in terms of the square of the number of ounces of alcohol drunk above a daily target level determined for each patient. The target would be determined by the practitioner in consultation with the patient, and it would likely not exceed 2 ounces/day. This scoring system uses the medical information to determine an appropriate objective, and it penalizes binge drinking by virtue of the squaring (other transformations could achieve similar effects, and squaring should be considered as a preliminary, ad hoc suggestion). A patient consuming up to the target level would be considered perfectly compliant according to this measure. Consumption above this amount would reflect adversely on the success of the treatment condition to which the patient was assigned. It should be noted that a change to metric units will not affect intergroup comparisons using the squared measure.

Weight reduction

Long-term weight reduction is very difficult to achieve (Stunkard and McLaren-Hume, 1977). Many patients report great difficulty in previous attempts (Dubbert and Wilson, 1983) to control weight [although it should be mentioned that Schachter's (1982) sample of people who had not sought treatment furnished self-reports suggesting widespread success in self-regulation].

A primary difficulty in treatment is that the patient cannot be advised to give up the behavior that produces the condition, namely eating. Rather a fine-grained adjustment is required. The stated goal of treatment is usually the maintenance of a proper diet. It is presumed that attaining this goal will produce the outcome that the patient seeks, the long-term establishment of a proper weight.

Although the patient's goal, weight loss, and the practitioner's goal, maintenance of a proper diet and exercise program, certainly are linked, they are not identical. In recent years it has become increasingly clear that people defend a given weight against dietary manipulation. Nisbett (1972) called this maintained value a set point, and hypothesized that it is determined early in life. Weight loss should be seen more as an incentive for the participant to maintain the regimen (Clausen et al., 1980) than as a formal way to evaluate the behavioral treatment. The failure to distinguish the goals of participant and of therapist is perhaps the reason that so many fad diets appear in the marketplace. The obese person is seduced by the prospect of fast reduction. It is relatively easy to find a diet that can bring about significant immediate weight loss (Dwyer, 1985), but much of that loss will be lean tissue (Yang and Van Itallie, 1984) and will not be maintained in the long-term.

It is maintenance of the prescribed dietary regimen that should be evaluated. Tools for scrutinizing the input have been suggested. Participants have been asked to keep nutrient and exercise logs (Epstein et al., 1985). More objectively, Rogers and Blundell (1979) have monitored meals by television. Perhaps even more accurate are the chewing and swallowing recorders developed by Bellisle and LeMagnen (1981) and by Stellar and Shrager (1985). Such intrusive instruments may, however, distort the behavior simply by their presence.

The focus on behavior directs our attention to the patient's diet. The measure should indicate the extent of deviation from the assigned program. It is perhaps simplest to consider excesses in terms of calories per day, but it may be advisable to use a more complex measure that weights for deviations in particular nutrient categories. Unless compliance with an assigned regimen is measured independently of the weight loss, it is impossible to assess the effectiveness of the program.

Conclusion

The complex quantitative issues in measuring success in treatment programs have not escaped attention. A theory may fail to gain empirical confirmation either because it is incorrect or because an invalid response measure has been employed in its evaluation. Feinstein (1959) and Bellack and Rozensky (1975) have provided sophisticated discussions of the difficulty in choosing the index for weight reduction; shall it be pounds, or percentage change, or proportion of goal attained? Similarly, Sobell and Sobell (1975) and Greenfield (1986) discuss problems in evaluating programs to reduce alcohol consumption. While these discussions are valuable in clarifying the defects in proposed measures, it is unlikely that they are sufficiently broadly based to lead to consensus. The two principles proposed here, (i) focus on the behavior and (ii) choose the behavioral index most closely related to improved health, may be used to determine a measure whenever the patient's active participation in the regimen is to be evaluated.

An attractive feature of the medical relevance criterion is that it requires cooperation between the medical expert and the psychologist to determine the proper response measure. As new medical information becomes available, the optimal measure may be changed. This ephemeral character of the scoring system suggests that researchers should give detailed, rich accounts of the behaviors they measure in the hope that archival material will retain relevance.

References

Alden,L. (1980). Preventive strategies in the treatment of alcohol abuse: a review and a proposal. In Davidson, P. O. and Davidson, S. M. (eds.), Behavioral Medicine: Changing Health Lifestyles. Brunner/Mazel, New York, pp. 256-278.

Anderson, N. H. (1981). Foundations of Information Integration Theory. Academic Press, New York.

Bellack, A.S. and Rozensky, R. H. (1975). The selection of dependent variables for weight reduction studies. Journal of Behavior Therapy and Experimental Psychiatry, 6, 83- 84.

Bellisle, F. and LeMagnen, J. (1981). The structure of meals in humans: eating and drinking patterns in lean and obese subjects. Physiology and Behavior, 27, 649-658.

Benowitz, N. L., Jacob, P., Kozlowski, L. T. and Yu, L. (1986). Influence of smoking fewer cigarettes on exposure to tar, nicotine, and carbon monoxide. The New England Journal of Medicine, 315, 1310- 1313.

Clausen, J. D., Silfen, M., Coombs, J., Ayers, W. and Altschul, A. (1980). Relationship of dietary regimens to success, efficiency, and cost of weight loss. Journal of the American Dietetic Association, 77, 249-256.

Davies, D. L. (1980). The treatment of alcohol dependence. In Clark, P. M. S. and Kricka, L. J. (eds.), Medical Consequences of Alcohol Abuse. Ellis Horwood, Chichester, pp. 261-276.

Dubbert, P. and Wilson, G. T. (1983). Treatment failures in behavior therapy for obesity: causes, correlates, and consequences. In Foa, E. and Emmelkamp, P. M. G. (eds.), Treatment Failure in Behavior Therapy, 3rd ed. Wiley, New York, pp. 263-288.

Dwyer, J. (1985). Classifying current popular and fad diets. In Hirsch, J. and Van Itallie, T. B. (eds.), Recent Advances in Obesity Research. John Libbey, London, Vol. IV, pp. 179-191.

Emrick, C. D. (1975). A review of psychologically orientated treatment of alcoholism. Il. The relative effectiveness of different treatment approaches and the effectiveness of treatment versus no treatment. Journal of Studies on Alcohol, 36, 88- 108.

Epstein, L. H., Wing, R. R., Koeske, R. R., Valoski, A., and Hyde, S. (1985). The effect of parental weight on weight, social withdrawal, blood pressure and nutrient intake of obese children. In Hirsch, J. and Van Itallie, T. B. (eds.), Recent Advances in Obesity Research. John Libbey, London, Vol. IV, pp. 327-346.

Feinstein, A. R. (1959). The measurement of success in weight reduction. Journal of Chronic Diseases, 10, 439-456.

Goldsmith,C. H. (1979). The effect of compliance distributions on therapeutic trials. In Haynes, R. B., Taylor, D. W. and Sackett, D. L. (eds.), Compliance in Health Care. Johns Hopkins University Press, Baltimore, pp. 297-308.

Greenfield, T. K. (1986). Quantity per occasion and consequences of drinking: a reconsideration and recommendation. The International Journal of the Addictions, 21, 1059-1079.

HilI, D., Weiss, D. J., Walker, D. L. and Jolley, D. (1988). Long- term evaluation of controlled smoking as a treatment outcome. British Journal of Addiction, 83, 203- 207.

Kaplan, N. M. (1981). Management strategies in hypertension. In Brenner, B. M., and Stein, J. H. (eds.), Hypertension. Churchill Livingstone, New York, pp. 339-366.

Lange, R. A., Ulmer, R. A., and Weiss, D. J. (1986). An intervention to improve compliance to year-long isoniazid (INH) therapy for tuberculosis. Journal of Compliance in Health Care, 1, 47-54.

Leventhal, H. (1985). The role of theory in the study of adherence to treatment and doctor-patient interactions. Medical Care, 23, 556-563.

Lovibond, S. H. (1975) Use of behavior modification in the reduction of alcohol-related road accidents. In Thompson, T. and Dockens, W. S. (eds.), Application of Behavior Modification. Academic Press, New York, pp. 399-406.

Marlatt, G. A., and Gordon, J. R. (1980). Determinants of relapse: implications for the maintenance of behavior change. In David son, P. O., and Davidson, S. M. (eds.), Behavioral Medicine: Changing Health Lifestyles. Brunner/Mazel, New York, pp. 410-452.

Nicholson, G. (1980). Alcoholic liver disease. In Clarke, P. M. S., and Kricka, L. J. (eds.), Medical Consequences of Alcohol Abuse. Ellis Horwood, Chichester, pp. 51-86.

Nisbett, R. E. (1972). Hunger, obesity and the hypothalamus. Psychological Review, 79, 433-453.

Popham, R. E., and Schmidt,W. (1978). The biomedical definition of safe alcohol consumption: a crucial issue for the researcher and the drinker. British Journal of Addiction, 73, 233-235.

Rogers, P. J., and Blundell, J. E. (1979). Effect of anorexic drugs on food intake and the microstructure of eating in human subjects. Psychopharmacology, 66, 159-165.

Schachter, S. (1980). Urinary pH and the psychology of nicotine addiction. In Davidson, P. O. and Davidson, S. M. (eds.), Behavioral Medicine: Changing Health Lifestyles. Brunner/Mazel, New York, pp. 70-93.

Schachter, S. (1982). Recidivism and self-cure of smoking and obesity. American Psychologist, 37, 436-444.

Sobell, M. B., and Sobell, L. C. (1975). The need for realism, relevance, and operational assumptions in the study of substance dependence. In Cappell, H. and Le Blance, A. E. (eds.), Biological and Behavioral Approaches to Drug Dependence. Addiction Research Foundation, Toronto, pp. 133-167.

Stellar, E. and Shrager, E. E. (1985). Chews and swallows and the microstructure of eating. American Journal of Clinical Nutrition, 42, 973-982.

Stunkard, A., and McLaren-Hume, M. (1977). The results of treatment for obesity. In Foreyt, J. P. (ed.), Behavioral Treatments of Obesity. Pergamon Press, Oxford, pp. 35-44.

Stunkard, A. J., and Penick, S. B. (1979). Behavior modification in the treatment of obesity: the problem of maintaining weight loss. Archives of General Psychiatry, 36, 801-806.

Yang, M. U., and Van Itallie, T. B. (1984). Variability in body protein loss during protracted, severe calorie restriction: role of triiodothyronine and other possible determinants. American Journal of Clinical Nutrition, 40, 611-622.

Received on July 27, 1987: accepted on October 10, 1988

1 Department of Psychology, University of Sydney, Sydney. On leave from Department of Psychology, California State University, Los Angeles, CA 90032, USA

2 To whom reprint requests should be sent in Los Angeles

3 Centre for Behavioral Research in Cancer, Anti-Cancer Council of Victoria, Carlton South, Victoria 3053, Australia