Tools for Depression: Standardized Rating Scales

Authors: Mark Zimmerman, MD




To determine the impact of treatment, it is necessary to evaluate outcome. In mental health clinical settings this typically is based on unstructured interactions that yield unquantified judgments of progress. Often judgments of outcome are made on replies to broad-based, global questions such as "How are you feeling?" or "How are you doing?" -- inquiries that are similar to everyday discourse when greeting a friend or acquaintance. Patients often reply with global responses such as "Okay" or "Fine," responses that may not accurately reflect their clinical status. Incorporating standardized depression-measurement scales into clinical practice has increasingly been recognized for its ability to provide more accurate evaluations of patients' status.

Desirable Features of a Depression Outcome Scale

Two perspectives are of primary importance in deciding which measure to choose: the patient's and the clinician's.

Patient Perspective

Patients should find the measure user-friendly and the directions easy to follow. The questions should be understandable and relevant to the patient's problem. The scale should be brief, taking no more than 2-3 minutes to complete, so that upon routine administration at follow-up visits patients are not inconvenienced by the need to arrive for their appointment 10-15 minutes early in order to complete the measure. Brief scales can feasibly be completed at each follow-up visit in the same way that blood pressure and weight are routinely assessed in primary care settings for patients being treated for hypertension and obesity.

Clinician Perspective

The instrument should have practical value to clinicians, providing clinically useful information and improving the efficiency of conducting the clinical evaluation. Of course, clinicians must be able to trust the information provided by any instrument they use. Consequently, outcome measures for depression should have a sound basis in psychometrics, demonstrating good reliability, validity, and sensitivity to change. Clinicians and clinics should also find the instrument user-friendly; it should be easy to administer and score with minimal training.

Ideal Features

With these perspectives in mind, a useful depression scale should contain the following features:

  • Brief;
  • Acceptable to patients;
  • Covers all DSM-IV diagnostic criteria for major depressive disorder;
  • Reliable (internal consistency and test-retest reliability);
  • Convergent validity (correlates with other measures of depression);
  • Discriminant validity (correlates lower with measures of other symptom domains, such as anxiety);
  • Indicator of symptom severity;
  • Indicator of remission status;
  • Case-finding capability as a screening instrument;
  • Assesses psychosocial function;
  • Assesses quality of life;
  • Assesses suicidal thoughts;
  • Sensitive to change;
  • Easy to score; and
  • Inexpensive.

Self-Report vs Clinician-Administered Measurement

An obstacle to the use of standardized scales in clinical practice is the perceived burden of scale completion. The user-friendliness of measurement tools, as well as their reliability and validity, is critical to their widespread adoption. Clinicians are already overburdened with paperwork, and adding to this load by requiring repeated detailed evaluations with clinician-administered instruments, such as the Hamilton Depression Rating Scale, is unlikely to meet with success.

Self-report questionnaires are a cost-effective option because they are inexpensive in terms of professional time needed for administration and they correlate highly with clinician ratings.[1,2] Of course, there are also limitations with self-report questionnaires.

Advantages of self-report questionnaires to measure depression:

  • Do not require clinician time for administration;
  • Improve efficiency of clinical encounter;
  • Correlate highly with clinician-administered tools;
  • Free from clinician bias to overestimate patient improvement (which might occur when there are incentives to document treatment success); and
  • May assess internal mental states more validly than clinician rating scales.

Disadvantages of self-report questionnaires to measure depression:

  • Reporting bias resulting in minimization or overreporting of symptom severity, thereby reducing validity; and
  • Cannot be completed by some individuals due to illiteracy, physical debility, or compromised cognitive functioning.

Commonly Used Self-Administered Depression Scales

So many depression scales have been developed that a compendium of available instruments has been published.[3] Scales vary in length, ranging from single-item measures to tools that include more than 100 statements. Some scales have been developed to measure depression in specific populations, such as postpartum women or patients with schizophrenia. Other scales have been developed to measure depression in specific age groups, such as adolescents and the elderly. Most scales were developed prior to the development of the criteria used in DSM-IV to diagnose major depressive disorder. These scales do not fully assess the major depressive disorder diagnostic criteria. In the past 15 years, several scales assessing the DSM-IV criteria have been developed and are considered reliable, valid measures of depression severity. Four of these are briefly described, and all are recommended for clinical use.

Beck Depression Inventory-II (BDI-II)

The original version of the BDI was published in 1961. A revised version of the BDI was published in 1996 to correspond more closely with the DSM-IV criteria for major depressive disorder.[4] The BDI-II contains 21 multiple-choice items assessing symptoms of depression. Each item is a set of 4 statements reflecting increasing levels of symptom severity; thus, the scale consists of 84 statements. It takes 5-10 minutes to complete the scale. The scale has good internal consistency, item-scale correlations, and is sensitive to change. The BDI-II correlates highly with clinician assessments of depression severity. Total scores on the scale range from 0 to 63. Recommended severity score ranges are 0-13 (minimal depression), 14-19 (mild depression), 20-28 (moderate depression), and 29-63 (severe depression).

Clinically Useful Depression Outcome Scale (CUDOS)

The CUDOS contains 18 items assessing all of the DSM-IV inclusion criteria for major depressive disorder as well as psychosocial impairment and quality of life. Compound DSM-IV symptom criteria -- referring to more than 1 construct (eg, problems concentrating or making decisions; insomnia or hypersomnia) -- are subdivided into their respective components, and a CUDOS item is written for each component. The respondent is instructed to rate the symptom items on a 5-point Likert scale indicating "how well the item describes you during the past week, including today" (0 = not at all true/0 days; 1 = rarely true/1-2 days; 2 = sometimes true/3-4 days; 3 = usually true/5-6 days; 4 = almost always true/every day). A Likert rating of the symptom statements was preferred in order to keep the scale brief. On average, the scale takes less than 2 minutes to complete. The CUDOS has good test-retest reliability, internal consistency, sensitivity to change, and can be used to screen for depression.[5] Total symptom scores on the scale range from 0 to 64. Empirically derived severity score ranges are 0-10 (nondepressed), 11-20 (minimal depression), 21-30 (mild depression), 31-45 (moderate depression), and 46-64 (severe depression). A cutoff point to determine remission was reported in a study comparing CUDOS scores to the HAM-D.[6]

Patient Health Questionnaire (PHQ-9)

The PHQ-9 contains 9 items assessing all DSM-IV inclusion criteria for major depressive disorder as well as an additional item assessing psychosocial impairment. Unlike the CUDOS, compound symptom criteria are assessed with a single item. For example, the PHQ-9 assesses insomnia and hypersomnia, as well as reduced or increased appetite, with a single item. The 9-item format makes it easier to apply the DSM-IV diagnostic algorithm for major depression, though at a cost of some information. Respondents are instructed to rate the symptom items on a 4-point Likert scale indicating how often they have been bothered by the symptom over the past 2 weeks (0 = not at all; 1 = several days; 2 = more than half the days; 3 = nearly every day). The scale is briefer than the CUDOS, taking less than 2 minutes to complete. The PHQ-9 has good test-retest reliability, internal consistency, and sensitivity to change.[7] It has been extensively studied as a screening measure for major depression in primary care settings. Total scores on the scale range from 0 to 27. Recommended severity score ranges are 0-4 (no depression), 5-9 (mild depression), 10-14 (moderate depression), 15-19 (moderately severe depression), and 20-27 (severe depression).

Quick Inventory of Depressive Symptomatology (QIDS)

The items of the QIDS are constructed similarly to those of the BDI, as multiple-choice questions with 4 choices. The QIDS contains 16 items that cover the symptoms of DSM-IV major depressive disorder, though single items are used to assess indecisiveness and impaired concentration, guilt and worthlessness, and wishes for death and suicidal ideation. Not every item contributes to the total score. In scoring the QIDS, the highest score is used of the 4 items assessing sleep disturbance (initial, middle, or terminal insomnia, or hypersomnia), the 2 items assessing psychomotor disturbance (agitation, retardation), and the 4 items assessing appetite and weight disturbance. Equivalent clinician-rated and self-report versions of the scale are available. The QIDS requires 5-10 minutes to complete. The QIDS has good internal consistency, correlates significantly with clinician ratings of depression severity, and is sensitive to change.[8] Total scores on the scale range from 0 to 27. Recommended severity score ranges are 0-5 (no depression), 6-10 (mild depression), 11-15 (moderate depression), 16-20 (severe depression), and 21-27 (very severe depression).

Which Scales Are in the Public Domain?

The BDI-II is copyrighted, and users must pay a fee to purchase each copy of the scale administered. Unauthorized duplication of the BDI-II for clinical use represents a violation of copyright. As of this writing, the cost of purchasing the BDI-II is $48 for a package of 25. Because there is no evidence that the BDI-II is more reliable or valid than other depression scales, it is difficult to justify its cost of use. The PHQ-9 is copyrighted by Pfizer Inc and is available for unlimited clinical use. It is accessible at various Websites (there is no single parent Website for PHQ-9), some of which require a use agreement to be signed. The QIDS and CUDOS are copyrighted by their authors, but both are available for unlimited use by clinicians. Both can be downloaded in user-ready formats. The Website for the QIDS is; for CUDOS, see

Supported by an independent educational grant from Bristol-Myers Squibb Company.

  • Print