Physician use of red flags to screen for fractured vertebrae for patients with new back pain

This review describes the understanding of a common practice for checking for spinal injuries when patients come to a family practice doctor, back pain clinic or emergency room with new back pain. Doctors usually ask a few questions and examine the back to check for the possibility of a spinal fracture. The reason for this check for fractures is that the treatment is different for common back pain and fractures. Fractures are usually diagnosed with an x-ray, then treated with rest, a back brace and pain relievers. Common back pain is treated with exercise, chiropractic manipulation, and pain relievers; x-rays, computed tomography (CT) and magnetic resonance imaging scans are not useful for diagnosis. Fractures are rare, being the cause of back pain in the range of 1% to 4.5% of new back pain visits to family doctors.

Eight studies including several thousand patients described 29 different questions and physical exam tests that have been used to look for spinal fractures. Most of the 29 were not accurate. The best four questions asked about use of steroids (which can cause weak bones), the patient’s age (age above 74 increases the risk of fractures) and recent trauma such as a fall. Using a combination of the best questions appears to improve the accuracy. For example, women above age 74 are more likely to have a fracture when they come to the physician complaining of back pain. In the emergency room, the best indication of a spinal fracture was a bruise or scrape on the painful area of the back.

Fractures are rare and generally do not require emergency treatment, even if red flags exist clinicians and patients can watch and wait. During the waiting period, patients should avoid treatments like exercise and manipulation that are not recommended for spinal fractures.

The worst effects of low quality red flag screening are overtreatment and undertreatment. If the tests are not accurate, patients without a fracture may get an x-ray or CT scan that they don’t need—unnecessary exposure to x-rays, extra worry for the patient and extra cost. At the other extreme (and much less common), it might be possible to miss a real fracture, and cause the patient to have extra time without the best treatment.

Most of the studies were of low or moderate quality, so more research is needed to identify the best combination of questions and examination methods.

Authors' conclusions: 

The available evidence does not support the use of many red flags to specifically screen for vertebral fracture in patients presenting for LBP. Based on evidence from single studies, few individual red flags appear informative as most have poor diagnostic accuracy as indicated by imprecise estimates of likelihood ratios. When combinations of red flags were used the performance appeared to improve. From the limited evidence, the findings give rise to a weak recommendation that a combination of a small subset of red flags may be useful to screen for vertebral fracture. It should also be noted that many red flags have high false positive rates; and if acted upon uncritically there would be consequences for the cost of management and outcomes of patients with LBP. Further research should focus on appropriate sets of red flags and adequate reporting of both index and reference tests.

Read the full abstract...
Background: 

Low-back pain (LBP) is a common condition seen in primary care. A principal aim during a clinical examination is to identify patients with a higher likelihood of underlying serious pathology, such as vertebral fracture, who may require additional investigation and specific treatment. All 'evidence-based' clinical practice guidelines recommend the use of red flags to screen for serious causes of back pain. However, it remains unclear if the diagnostic accuracy of red flags is sufficient to support this recommendation.

Objectives: 

To assess the diagnostic accuracy of red flags obtained in a clinical history or physical examination to screen for vertebral fracture in patients presenting with LBP.

Search strategy: 

Electronic databases were searched for primary studies between the earliest date and 7 March 2012. Forward and backward citation searching of eligible studies was also conducted.

Selection criteria: 

Studies were considered if they compared the results of any aspect of the history or test conducted in the physical examination of patients presenting for LBP or examination of the lumbar spine, with a reference standard (diagnostic imaging). The selection criteria were independently applied by two review authors.

Data collection and analysis: 

Three review authors independently conducted 'Risk of bias' assessment and data extraction. Risk of bias was assessed using the 11-item QUADAS tool. Characteristics of studies, patients, index tests and reference standards were extracted. Where available, raw data were used to calculate sensitivity and specificity with 95% confidence intervals (CI). Due to the heterogeneity of studies and tests, statistical pooling was not appropriate and the analysis for the review was descriptive only. Likelihood ratios for each test were calculated and used as an indication of clinical usefulness.

Main results: 

Eight studies set in primary (four), secondary (one) and tertiary care (accident and emergency = three) were included in the review. Overall, the risk of bias of studies was moderate with high risk of selection and verification bias the predominant flaws. Reporting of index and reference tests was poor. The prevalence of vertebral fracture in accident and emergency settings ranged from 6.5% to 11% and in primary care from 0.7% to 4.5%. There were 29 groups of index tests investigated however, only two featured in more than two studies. Descriptive analyses revealed that three red flags in primary care were potentially useful with meaningful positive likelihood ratios (LR+) but mostly imprecise estimates (significant trauma, older age, corticosteroid use; LR+ point estimate ranging 3.42 to 12.85, 3.69 to 9.39, 3.97 to 48.50 respectively). One red flag in tertiary care appeared informative (contusion/abrasion; LR+ 31.09, 95% CI 18.25 to 52.96). The results of combined tests appeared more informative than individual red flags with LR+ estimates generally greater in magnitude and precision.