A diagnostic test is any kind of medical test performed to help with the diagnosis or detection of a disease. A systematic review of a particular diagnostic test for a disease aims to bring together and assess all the available research evidence. Bibliographic databases are usually searched by combining terms for the disease with terms for the diagnostic test. However, depending on the topic area, the number of articles retrieved by such searches may be very large. Methodological filters consisting of text words and database indexing terms have been developed in the hope of improving the searches by increasing their precision when these filters are added to the search terms for the disease and diagnostic test. On the other hand, using filters to identify records for diagnostic reviews may miss relevant studies while at the same time not making a big difference to the number of studies that have to be assessed for inclusion. This review assessed the performance of 70 filters (reported in 19 studies) for identifying diagnostic studies in the two main bibliographic databases in health, MEDLINE and EMBASE. The results showed that search filters do not perform consistently, and should not be used as the only approach in formal searches to inform systematic reviews of diagnostic studies. None of the filters reached our minimum criteria of a sensitivity greater than 90% and a precision above 10%.
None of the current methodological filters designed to identify reports of primary DTA studies in MEDLINE or EMBASE combine sufficiently high sensitivity, required for systematic reviews, with a reasonable degree of precision. This finding supports the current recommendation in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy that the combination of methodological filter search terms with terms for the index test and target condition should not be used as the only approach when conducting formal searches to inform systematic reviews of DTA.
A systematic and extensive search for as many eligible studies as possible is essential in any systematic review. When searching for diagnostic test accuracy (DTA) studies in bibliographic databases, it is recommended that terms for disease (target condition) are combined with terms for the diagnostic test (index test). Researchers have developed methodological filters to try to increase the precision of these searches. These consist of text words and database indexing terms and would be added to the target condition and index test searches.
Efficiently identifying reports of DTA studies presents challenges because the methods are often not well reported in their titles and abstracts, suitable indexing terms may not be available and relevant indexing terms do not seem to be consistently assigned. A consequence of using search filters to identify records for diagnostic reviews is that relevant studies might be missed, while the number of irrelevant studies that need to be assessed may not be reduced. The current guidance for Cochrane DTA reviews recommends against the addition of a methodological search filter to target condition and index test search, as the only search approach.
To systematically review empirical studies that report the development or evaluation, or both, of methodological search filters designed to retrieve DTA studies in MEDLINE and EMBASE.
We searched MEDLINE (1950 to week 1 November 2012); EMBASE (1980 to 2012 Week 48); the Cochrane Methodology Register (Issue 3, 2012); ISI Web of Science (11 January 2013); PsycINFO (13 March 2013); Library and Information Science Abstracts (LISA) (31 May 2010); and Library, Information Science & Technology Abstracts (LISTA) (13 March 2013). We undertook citation searches on Web of Science, checked the reference lists of relevant studies, and searched the Search Filters Resource website of the InterTASC Information Specialists' Sub-Group (ISSG).
Studies reporting the development or evaluation, or both, of a MEDLINE or EMBASE search filter aimed at retrieving DTA studies, which reported a measure of the filter’s performance were eligible.
The main outcome was a measure of filter performance, such as sensitivity or precision. We extracted data on the identification of the reference set (including the gold standard and, if used, the non-gold standard records), how the reference set was used and any limitations, the identification and combination of the search terms in the filters, internal and external validity testing, the number of filters evaluated, the date the study was conducted, the date the searches were completed, and the databases and search interfaces used. Where 2 x 2 data were available on filter performance, we used these to calculate sensitivity, specificity, precision and Number Needed to Read (NNR), and 95% confidence intervals (CIs). We compared the performance of a filter as reported by the original development study and any subsequent studies that evaluated the same filter.
Ninteen studies were included, reporting on 57 MEDLINE filters and 13 EMBASE filters. Thirty MEDLINE and four EMBASE filters were tested in an evaluation study where the performance of one or more filters was tested against one or more gold standards. The reported outcome measures varied. Some studies reported specificity as well as sensitivity if a reference set containing non-gold standard records in addition to gold standard records was used. In some cases, the original development study did not report any performance data on the filters. Original performance from the development study was not available for 17 filters that were subsequently tested in evaluation studies. All 19 studies reported the sensitivity of the filters that they developed or evaluated, nine studies reported the specificities and 14 studies reported the precision.
No filter which had original performance data from its development study, and was subsequently tested in an evaluation study, had what we defined a priori as acceptable sensitivity (> 90%) and precision (> 10%). In studies that developed MEDLINE filters that were evaluated in another study (n = 13), the sensitivity ranged from 55% to 100% (median 86%) and specificity from 73% to 98% (median 95%). Estimates of performance were lower in eight studies that evaluated the same 13 MEDLINE filters, with sensitivities ranging from 14% to 100% (median 73%) and specificities ranging from 15% to 96% (median 81%). Precision ranged from 1.1% to 40% (median 9.5%) in studies that developed MEDLINE filters and from 0.2% to 16.7% (median 4%) in studies that evaluated these filters. A similar range of specificities and precision were reported amongst the evaluation studies for MEDLINE filters without an original performance measure. Sensitivities ranged from 31% to 100% (median 71%), specificity ranged from 13% to 90% (median 55.5%) and precision from 1.0% to 11.0% (median 3.35%).
For the EMBASE filters, the original sensitivities reported in two development studies ranged from 74% to 100% (median 90%) for three filters, and precision ranged from 1.2% to 17.6% (median 3.7%). Evaluation studies of these filters had sensitivities from 72% to 97% (median 86%) and precision from 1.2% to 9% (median 3.7%). The performance of EMBASE search filters in development and evaluation studies were more alike than the performance of MEDLINE filters in development and evaluation studies. None of the EMBASE filters in either type of study had a sensitivity above 90% and precision above 10%.