Outcome Measures in Rheumatology
Reflecting on a few historical pages…
Introduction
There was a time when rheumatologists used to describe therapeutic outcomes using the measures of their preference and convenience. Numerous articular indices, grip-strength assessment, PIP joint circumference measurement etc. were commonly used endpoints in the 1980s. Wide differences existed between American and European clinical trials with regard to outcome measures used, rendering comparisons/meta-analyses difficult or almost impossible. Also, there was a lack of agreed criteria for improvement in RA. It was difficult to extrapolate the reported results of these trials to the treatment of individual patients. The standard practice at that time was to show statistically significant differences between means of 2 treatment groups for a given outcome measure. Statisticians as well as researchers often lamented the large number of endpoints measured in these trials, and used to wonder how best to interpret a trial in which some variables showed significant improvement but others did not. A controlled trial on efficacy of low-dose methotrexate versus placebo published in 1985 used as many as 13 clinical efficacy variables and 4 laboratory efficacy variables and these were subjected to separate statistical analyses (1). Thus moderate average improvement in a given endpoint in patients undergoing a treatment could occur because all patients improved modestly or because half of the patients experienced dramatic improvement and the other half had no improvement at all. Moreover, multiple comparisons introduced statistical errors, which required corrections, making interpretations even more difficult.
Concept of a composite index to measure improvement
Harold Paulus, Professor of Rheumatology at UCLA was quick to recognize this anomaly and pioneered the concept of a composite index to measure improvement in RA. Paulus et al, in their seminal publication in 1990, showed that 20% improvement in 4 of 6 selected measures (morning stiffness, Westergren ESR, joint tenderness score, joint swelling score, patient’s and physician’s overall assessments of current disease activity) clearly discriminated between an active drug and placebo (2). Using this algorithm, the placebo group rarely met the improvement criteria (~ 5%). This model was subsequently utilized by the ACR subcommittee in creating a definition of improvement in RA.
Absolute ‘quantum’ of disease activity- `DAS’ and its modifications
During the same time frame, Desiree van der Heijde and coworkers from the Netherlands had been working on a slightly different track and came up with a quantitative measure of disease activity of RA, which they termed DAS [disease activity score] (3). Their methodology was quite innovative. An investigation of periodic clinical and laboratory variables recorded during routine follow up of 113 patients with early RA in the clinic setting was carried out in a prospective study of up to three years' duration. The rheumatologists treating these patients were not aware that their day-to-day therapeutic decisions were being captured as a vital input to this study. Decisions to start treatment with DMARDs were considered indicative of high disease activity. If DMARD was not started or it remained unchanged for at least one year or if treatment was stopped because of disease remission, this was equated with periods of low disease activity. Factor analysis on the clinical and laboratory variables of high and low disease activity group yielded 5 factors. To determine the extent to which each factor contributed to discrimination between the two groups, discriminant analysis was done. At the end, a multiple regression analysis was performed to obtain a 'disease activity score'. This score consisted of number of tender joints (as per Ritchie’s index), number of swollen joints (out of 44 joints), Westergren ESR and patient’s grading of general health (global assessment), in decreasing order of importance. The calculations were complex but programmed calculators or web-based apps were made available which provided the result (range 0-10) instantaneously. Later on, Prevoo et al developed a reduced joint version (DAS28) in which only 28 designated joints were evaluated, thus saving precious clinic time (4). Other modification comprises a 3-parameter version (DAS28-3) in which the patient's global assessment was eliminated. These modified versions were shown to correlate closely with the original DAS. However, DAS28-ESR and DAS28-CRP are not interchangeable. The latter underestimates disease activity. Other easy-to-calculate versions are clinical disease activity index (CDAI) and simplified disease activity index (SDAI). These will not be discussed here.
The `OMERACT’ initiative
The seminal contributions of Paulus and van der Heijde marked a quantum leap in the standards of research in clinical rheumatology. Peter Tugwell (Canada) and Martin Boers (The Netherlands) joined hands to pursue the theme of outcome measures in RA on a global platform. They organized the first international conference in 1992 in Maastricht to obtain a consensus on this important issue. The conference was named OMERACT [outcome measures in RA clinical trials] (5). Three goals of this conference were:
1) To attempt to obtain agreement on the minimum number of outcome measures to be included in all RA clinical trials
2) To review the range of magnitude of differences judged to be clinically important by experienced clinicians and clinical investigators
3) To review the extent to which experienced clinicians and clinical investigators feel that composite indices are useful in the assessment of trials and individual patients
There were 92 participants in this meeting, which was highly engaging and involved extensive evidence-based discussions and debates. Delphi technique was used to build up consensus. The conference was a great success and agreement was achieved on the RA outcome domains that later became known as the ‘WHO/ILAR core set’ often called the ‘ACR core set’, as it was, subsequently, approved by an ACR committee (6). The core-set was as follows:
1. Tender joint count
2. Swollen joint count
3. Patient’s assessment of pain
4. Patient’s global assessment of disease activity
5. Physician’s global assessment of disease activity
6. Patient’s assessment of physical function
7. Acute-phase reactant value
8. Radiography or other imaging technique (for trial duration >1 year when evaluating a “DMARD”)
Agreement could not be reached on the preferred instruments for measurement of each of the core outcomes. It was decided to postpone discussions on this issue pending further studies comparing validity of the different instruments. Importantly, the need to develop `improvement’ criteria for RA and to explore composite indices of outcome was recognized by the participants.
Soon, the OMERACT movement gained so much momentum that its scope was expanded to include outcome measures in various rheumatic disorders other than RA. OMERACT conferences have been held every 2 years since 1992. The acronym now stands for `Outcome Measures in Rheumatology’. An OMERACT website (www.omeract.org) has been created which gives details of various groups in charge of outcome measures pertaining to specific rheumatic disorders as well as a downloadable OMERACT handbook. A very brief overview of the important landmarks in OMERACT journey so far is given below.
OMERACT Filter
It was resolved by OMERACT group that all proposed outcome measures have to meet the requirements of OMERACT filter, which comprises a minimum set of measurement principles, namely truth, discrimination, and feasibility (7). A more explicit and comprehensive version was subsequently created after much hard work and was termed OMERACT Filter 2 (8). Briefly, OMERACT recognizes four ‘core areas’ of outcome: death, life impact, resource use and pathophysiological manifestations. A clinical trial must include at least one measure under each of these core areas. ?Each core area comprises one or more ‘domains’ of interest to the condition under consideration. These domains of interest together comprise the ‘core domain set’. A clinical trial must include this core domain set at the least and could include any other domain(s) that might be relevant to the trial. Further, each core domain in turn comprises one or more valid outcome measure. Validity of an outcome measure is tested using OMERACT filter principles of truth, discrimination and feasibility. ?The word `truth’ captures issues of face, content, construct, and criterion validity. ?It assesses whether the measure is truthful, measures what is intended and whether the result is unbiased and relevant. The word `discrimination’, on the other hand, captures issues of reliability and sensitivity to change. It examines whether the measure discriminates between situations of interest. The latter can be states at one time (for classification or prognosis) or states at different times (to measure change). Lastly, the word `feasibility’ refers to the practical applicability of the measure, given constraints of time, money, and interpretability. It is a crucial determinant of the successful use of the measure. Using this methodology, OMERACT has played a critical role in the development and validation of clinical and radiographic outcome measures in rheumatoid arthritis, osteoarthritis, psoriatic arthritis, fibromyalgia, and other rheumatic diseases. OMERACT Working Groups have been set up to help carry forward the required development and validation work, and after which, they report back at Special Interest Group meetings during the main OMERACT conference. In due course, it was realized that the RA core set omitted certain outcomes of major importance to patients. One of these was fatigue. Consequently, fatigue was added to the RA core set (9).
ACR preliminary definition of improvement in RA- a brilliant example
This definition was published by ACR in 1995 and is also called ACR20 response (10). It comprises 20% improvement in tender and swollen joint counts and 20% improvement in 3 of the 5 remaining ACR core set measures (patient and physician global assessments, pain, disability, and an acute-phase reactant). It is worthwhile looking into the process by which this definition was finally derived, to get a sense of the tremendous effort that went into it. Briefly, the first step was to assess how rheumatologists decide whether a patient had improved. The 89 rheumatologists to whom the survey was sent consisted of OMERACT committee members, participants, and others chosen because of their considerable RA clinical and/or clinical trial experience. For each element of the core set (e.g., tender joint count), data at baseline and at 6 months were provided and the percent change was noted. Survey respondents had to opine whether each paper patient had improved or not. 68/89 returned the completed survey. Patients characterized as `improved’ by at least 80% of the surveyed rheumatologists were further analysed. The extent to which these patients were characterized as improved according to various candidate definitions of improvement was examined. Forty possible definitions for improvement were selected from published literature or because they had been recommended by members of ACR subcommittee or the international community. Seventeen of 40 definitions met the previously defined threshold in the survey, i.e. low false-positivity rate (< 25%) and high chi-square value (> 6). Notable among these were those given by Harold Paulus et al, WHO group (Dan Furst et al) and the OMERACT group. In the next step, these 17 candidate definitions of improvement were applied to patient data belonging to 5 published trials on efficacy of DMARDs (4 methotrexate trials + 1 gold trial). 320 patients (177 active drug-treated/l43 placebo-treated) remained after exclusion of patients with missing data. The percentage of active drug-treated patients identified as improved by each candidate improvement definition and the corresponding percentage of placebo-treated patients was obtained. For each improvement definition, the statistical power in discriminating active drug from placebo groups was calculated. The goal was to identify the most powerful definition(s). In case of statistical power being equal, one showing lower percentage improved for placebo-treated group was rated superior. Subsequent steps included other comparative criteria such as ease of use and face validity. The final step was a review of the rheumatologists’ survey, ranking each definition by its kappa statistic (another measure of agreement between the rheumatologists’ impression of improvement and the definition’s classification of improvement).
This definition of improvement has stood the test of time and continues to be used till today. It also increases the power of clinical trials since it draws on information from multiple different outcome measures. Therefore, the sample size needed to demonstrate differences between therapies decreases.
Current status
As of today, the art and science of developing outcome measures in rheumatology is very well established. Validated outcome measures have been published for almost all rheumatic diseases and there is an increasing trend to use internationally agreed outcome measures for research work. This makes comparisons between trials and meta-analyses smoothly achievable. Examples of validated instruments (this list may not be exhaustive) currently in use globally, include:
As can be seen, multiple instruments per domain are in use for a number of rheumatic diseases but it will be ideal to have just one internationally agreed one. OMERACT is constantly striving to bring about consensus on outcome measures across the board. It is noteworthy that ACR and EULAR have worked together in producing outcome measures for a few rheumatic diseases (as also classification criteria for certain rheumatic diseases). This is a welcome development and augers well for the future of rheumatology. I would like to end here with a quotation attributed to Henry Ford:
“Coming together is a beginning. Keeping together is progress. Working together is success”
References
Prof Ashok Kumar, MD, FRCP
Clinical Director & Head Dept of Rheumatology
Fortis Flt Lt Rajan Dhall Hospital, New Delhi