Clinical drug trials or studies are usually conducted to assess how well the drug works but also whether it causes any harm (side effects or adverse effects). Adverse effects can be detected by the trial doctor examining participants or taking some blood samples or doing other kinds of tests. The trial staff can also ask participants about how they are feeling after taking the trial drug. However, the way participants are asked about their health can vary from trial to trial, or even within a trial. In some trials, participants may be asked a simple open question such as 'how have you been feeling?', while in other trials, participants may be asked about whether they have had any of a long list of possible symptoms (such as 'have you had a headache, stomach ache, or sore muscles?'). There has been concern that these different kinds of questions and how they are phrased will impact on what participants report about their health during a trial. This might then affect the trial's results and what we know about the side effects of drugs.
We did this review to look at studies that compared different types of participant questioning methods in order to investigate these issues. We found 33 studies comparing mainly open questions with checklist-type questions, but also some ratings scales and participant interviews. While the studies were all very different in terms of the types of disease, drugs, and patients studied, we found in general that, as would be expected, when a more specific type of question was asked (like a checklist), participants reported more symptoms. What is interesting is that, in those studies that looked more closely at the types of symptoms reported, it seems that an open question picks up the more severe or bothersome symptoms compared to a checklist-type question. However, some studies found that even quite severe or bothersome symptoms were not reported when a participant is asked an open question and these severe symptoms will only be reported with the more specific question. This makes it difficult to say whether one method is better than any other and the different questioning methods may, in fact, be complementary and therefore should be used together. It is also difficult to say what a specific question should include, as it might take too long for a participant to have to answer a very long list. While more research is needed to resolve the remaining uncertainties, it is very important for trials to be clear about which kind of questioning was used when they publish their results. This will help readers understand the trial's findings about the side effects and make it easier to make accurate comparisons between trials.
This review supports concerns that methods to elicit participant-reported AEs influence the detection of these data. There was a risk for under-detection of AEs in studies using a more general elicitation method compared to those using a comprehensive method. These AEs may be important from a clinical perspective or for patients. This under-detection could compromise ability to pool AE data. However, the impact on the nature of the AE detected by different methods is unclear. The wide variety and low quality of methods to compare elicitation strategies limited this review. Future studies would be improved by using and reporting clear definitions and terminology for AEs (and other important variables), frequency and time period over which they were ascertained, how they were graded, assessed for a relationship to the study drug, coded, and tabulated/reported. While the many potential AE endpoints in a trial may preclude the development of general AE patient-reported outcome measurement instruments, much could also be learnt from how these employ both quantitative and qualitative methods to better understand data elicited. Any chosen questioning method needs to be feasible for use by both staff and participants.
Analysis of drug safety in clinical trials involves assessing adverse events (AEs) individually or by aggregate statistical synthesis to provide evidence of likely adverse drug reactions (ADR). While some AEs may be ascertained from physical examinations or tests, there is great reliance on reports from participants to detect subjective symptoms, where he/she is often the only source of information. There is no consensus on how these reports should be elicited, although it is known that questioning methods influence the extent and nature of data detected. This leaves room for measurement error and undermines comparisons between studies and pooled analyses. This review investigated comparisons of methods used in trials to elicit participant-reported AEs. This should contribute to knowledge about the methodological challenges and possible solutions for achieving better, or more consistent, AE ascertainment in trials.
To systematically review the research that has compared methods used within clinical drug trials (or methods that would be specific for such trials) to elicit information about AEs defined in the protocol or in the planning for the trial.
Databases (searched to March 2015 unless indicated otherwise) included: Embase; MEDLINE; MEDLINE in Process and Other Non-Indexed Citations; Cochrane Methodology Register (July 2012); Cochrane Central Register of Controlled Trials (February 2015); Cochrane Database of Systematic Reviews; Database of Abstracts of Reviews of Effects (January 2015); Health Technology Assessment database (January 2015); CINAHL; CAB Abstracts; BIOSIS (July 2013); Science Citation Index; Social Science Citation Index; Conference Proceedings Citation Index – Science. The search used thesaurus headings and synonyms for the following concepts: (A): Adverse events AND measurement; (B): Participants AND elicitation (also other synonyms for extraction of information about adverse effects from people); (C): Participants AND checklists (also other synonyms as for B). Pragmatic ways were used to limit the results whilst trying to maintain sensitivity. There were no date or sample size restrictions but only reports published in English were included fully, because of resource constraints as regards translation.
Two types of studies were included: drug trials comparing two or more methods within- or between-participants to elicit participant-reported AEs, and research studies performed outside the context of a trial to compare methods which could be used in trials (evidenced by reference to such applicability). Primary outcome data included AEs elicited from participants taking part in any such clinical trial. We included any participant-reported data relevant for an assessment of drug-related harm, using the original authors' terminology (and definition, where available), with comment on whether the data were likely to be treatment-emergent AEs or not.
Titles and abstracts were independently reviewed for eligibility. Full texts of potentially eligible citations were independently reviewed for final eligibility. Relevant data were extracted and subjected to a 100% check. Disagreements were resolved by discussion, involving a third author. The risk of bias was independently assessed by two authors. The Cochrane 'Risk of bias' tool was used for reports comparing outcomes between participants, while for within-participant comparisons, each study was critically evaluated in terms of potential impact of the design and conduct on findings using the framework of selection, performance, detection, attrition, reporting, and other biases. An attempt was made to contact authors to retrieve protocols or specific relevant missing information. Reports were not excluded on the basis of quality unless data for outcomes were impossible to compare (e.g. where denominators differed). A narrative synthesis was conducted because differences in study design and presentation meant that a quantitative meta-analysis was not possible.
The 33 eligible studies largely compared open questions with checklist-type questions or rating scales. Two included participant interviews. Despite different designs, populations and details of questioning methods, the narrative review showed that more specific questioning of participants led to more AEs detected compared to a more general enquiry. A subset of six studies suggested that more severe, bothersome, or otherwise clinically relevant AEs were reported when an initial open enquiry was used, while some less severe, bothersome, or clinically relevant AEs were only reported with a subsequent specific enquiry. However, two studies showed that quite severe or debilitating AEs were only detected by an interview, while other studies did not find a difference in the nature of AEs between elicitation methods. No conclusions could be made regarding the impact of question method on the ability to detect a statistically significant difference between study groups. There was no common statistical rubric, but we were able to represent some effect measures as a risk ratio of the proportion of participants with at least one AE. This showed a lower level of reporting for open questions (O) compared to checklists (CL), with a range for the risk ratios of 0.12 to 0.64.