Estimating Incidence, Prevalence, and YLD: Methods and Data
This section provides an overview of the methods, software tools, and data sources used to calculate YLD for the GBD 2001 together with a short description of the disease models, assumptions, and data sources for important cause groups. Estimating YLD is the most complex and time-consuming component of burden of disease analysis, because it requires systematic assessments of the available evidence on incidence, prevalence, duration, and severity of a wide range of conditions. The GBD study has developed various methods to reconcile often fragmented and partial estimates available from different studies. A specific software tool, DisMod, described later, has been developed to assist in the analysis of epidemiological data and the preparation of internally consistent estimates.
Assessing YLD
YLD are essentially calculated as follows (ignoring the complications of discounting):
where I is the number of incident cases in the reference period, D is the disability weight (in the range 0 to 1), and L is the average duration of disability measured in years. With discounting at rate r, the formula for calculating YLD becomes
To prepare consistent and unbiased estimates of YLD by cause, it is important to ensure that the disability weight and the population incidence and prevalence data relate to the same case definitions. The data required to estimate YLD are incidence, disability duration, age of onset, and distribution by severity class, all of which must be disaggregated by age and sex. These in turn require estimates of incidence, remission, and case fatality rates or relative risks by age and sex.
For some conditions, numbers of incident cases were available directly from disease registers or epidemiological studies, but for most conditions, only prevalence data were available. In these cases, the DisMod II software program was used to model incidence and duration from estimates of prevalence, remission, case fatality rates, and background mortality.
The sources of data and methods used for each of the major disease and injury groups are summarized in later subsections. Given the large number of categories analyzed and the paucity of epidemiological information for many of them, many of the disease models were necessarily simple and approximate. For most disease and injury groups, relevant experts were consulted during the development and revision of YLD estimates.
The disability weights used for the GBD 2001 are still largely based on the GBD 1990 disability weights and are summarized in annex tables 3A.6 to 3A.8 . For certain conditions for which weights were not available from the original GBD study, provisional weights were used from Mathers, Vos, and Stevenson (1999) and Stouthard and others (1997) .
As discussed earlier, the disability weights used in DALY calculations quantify societal preferences for different health states. These weights do not represent the lived experience of any disability or health state or imply any societal value of the person in a disability or health state. Thus, for example, disability weights of 0.57 for paraplegia and 0.43 for blindness quantify a social judgment that a year with blindness represents less loss of health than a year with paraplegia. It also means that, on average, a person who lives three years with paraplegia followed by death is considered to experience more equivalent healthy years than a person who has one year of good health followed by death (3 years x [1 - 0.57] = 1.3 "healthy" years is greater than 1 year of good health).
Ensuring Internal Consistency Using DisMod
Estimating prevalence and incidence is usually much harder than estimating mortality. Data collection, when done, is often limited in terms of both time and geographical area and problems of case definition abound. Not surprisingly, data are frequently incomplete, and when available, their validity may be in doubt. In particular, given differences in the way the data for incidence, prevalence, and mortality are collected, it is almost inevitable that observations are internally inconsistent. For example, when a cohort study misses more incident cases than deaths, the observed incidence will be too small to account for the observed mortality.
To address such issues, the GBD studies have exploited two kinds of knowledge. First, disease characteristics, such as remission, case fatality rates, and duration, may be relatively constant across countries and known from studies in some populations, from clinical studies, or from expert knowledge. Supplementing observed data with expert knowledge may help to overcome a lack of data. Second, because the various epidemiological variables are causally linked by a disease process, a disease model that explicitly describes these causal pathways allows us to infer missing data if existing data are sufficient to do so.
DisMod was developed for the original GBD study to help model the parameters needed for YLD calculations, to incorporate expert knowledge, and to check the consistency of different epidemiological estimates and ensure that the estimates used were internally consistent.
Figure 3.8
shows the underlying model used by DisMod.
[Figure
3.8]
Based on experience with the DisMod software tool in the original GBD study, a new version, DisMod II, was developed with a number of additional features ( Barendregt and others 2003 ). Unlike DisMod I, which used finite difference methods to "solve" the disease model, DisMod II implements an exact solution to the underlying differential equations. As well as calculating solutions when the three hazard rates (for incidence, remission, and mortality) are provided as inputs, DisMod II allows other combinations of inputs, such as prevalence, remission, and case fatality rates. In these cases, DisMod uses a goal-seeking algorithm to fit hazards such that the model reproduces the available input variables. DisMod II also has a range of advanced features, including the ability to undertake sensitivity analysis and uncertainty analysis, to give different weights to the various inputs, and to smooth inputs and specify age patterns for outputs. (The software may be downloaded from the WHO Web site at http://www.who.int/evidence/dismod .)
DisMod II was extensively used in the analyses for the GBD 2001 for four main purposes:
-
to estimate a set of incidence rates by age from observed prevalences for a condition, given estimates of remission rates and cause-specific mortality risk derived from population data or epidemiological studies;
-
to check whether available data for a condition are consistent with each other, for example, when separate estimates of incidence and prevalence were available for a condition;
-
to calculate the average duration of incident cases, needed to calculate YLD for a condition;
-
to extrapolate estimates in GBD age categories from epidemiological data for different age categories.
Whereas different assumptions regarding remission and case fatality rates affect the age distribution of incident cases and YLD estimates, total YLD are relatively insensitive to these assumptions if matched to a fixed prevalence distribution. This is because YLD estimates are proportional to incidence multiplied by duration, which approximately equals the prevalence of the condition. In other words, for most conditions the combination of incidence, case fatality, and remission rates (and thus derived durations) used in the YLD calculations makes relatively little difference to total YLD across age groups assuming the same prevalence figures are used as the basis. The effect of discounting complicates this, however, with low incidence and long duration conditions being more discounted than high incidence but short duration conditions.
Figure 3.9
illustrates the use of DisMod II to calculate the incidence of diabetes mellitus in males in Sub-Saharan Africa given estimates of the age-specific prevalence of cases, the relative risk of mortality for those with diabetes compared with those without diabetes (
Roglic and others 2005
), and the assumption that remission rates are zero.
[Figure
3.9]
YLD Estimates for Regions in 2001
The GBD 2001 estimated incidence, prevalence, and YLD for 17 epidemiological regions based on the 6 WHO regions subdivided by 5 mortality strata. The five mortality strata were defined in terms of quintiles of the distribution of child and adult mortality for males in 1999 ( WHO 2002d , pp. 233-5). These regions are defined in annex table 3A.4 .
The Disease Control Priorities Project followed the World Bank approach in treating all high-income countries as one region even though they are not geographically contiguous, and then dividing the rest of the world into six geographic regions that together are referred to as low- and middle-income countries. These regions are defined in annex table 3A.1 .
To estimate YLD by cause, age, sex, and region for 2001, incidence and prevalence rates were imputed from the 17 epidemiological subregions to the country level using cause-specific methods documented by Mathers, Murray, and Salomon (2003) . Absolute incidence and prevalence numbers by age and sex were then added for all countries in each region to provide regional estimates for 2001. Because Version 3 estimates for 2000 had been prepared so that they were consistent with those for 2002, estimates for 2001 were imputed by averaging the Version 3 estimated cause-, age-, sex-, and country-specific rates for mortality, incidence, and prevalence for 2000 and 2002 and applying them to population data for 2001.
Overview of Data Sources
A wide range of data sources were used to analyze incidence, prevalence, and YLD for the GBD 2001. These included
-
Disease registers. Disease registers record new cases of disease based on reports by physicians and laboratories. Registers are common for infectious diseases, for instance, TB; cancer; congenital anomalies; for some relatively rare diseases, such as cystic fibrosis or thallassaemia; and sometimes for conditions such as diabetes, schizophrenia, and epilepsy.
-
Population surveys. Interview surveys, such as the National Health Interview Survey in the United States, can provide self-reported information on disabilities, impairments, and diseases; however, self-reported data are generally not comparable across countries ( Murray, Tandon, and others 2002 ; Sadana and others 2002 ). In addition, attributing impairment to the underlying causes is often difficult and frequently considerable differences are apparent between lay self-reporting of disease causes and actual underlying disease causes in terms of defined GBD disease categories. In general, the results of health examination surveys have contributed more to YLD calculations than self-reported interview surveys. The Composite International Diagnostic Interview (CIDI) and Diagnostic Interview Schedule (DIS) questionnaires used for mental health surveys are examples of standard questionnaires based on self-reporting that have undergone validity testing and have been used in assessing YLD for mental disorders for the GBD 2001.
-
Epidemiological studies. Some of the most useful sources of information for the GBD 2001 were population-based epidemiological studies. In particular, longitudinal studies of the natural history of a disease have provided a wealth of information about incidence, average duration, levels of severity, remission, and case fatality rates. Such studies are rare because they are costly to undertake. In addition, as they are often conducted in a particular region or town, judgment is needed when extrapolating results to the entire population.
-
Health facility data. In most cases, routine data on consultations by diagnosis were not found to be a great deal of use in estimating YLD. Unless coverage of the health system is virtually total, facilities-based data will be based on biased samples that do not reflect the prevalence or severity distributions of conditions in the community. Likewise, hospital deaths are unlikely to be useful because of the same problems of selection bias. Examples of conditions that were estimated from hospital data with national or quasi-national population coverage include perinatal and maternal conditions, meningitis, stroke, myocardial infarction, some sequelae identifiable from data on surgical interventions, and injuries.
The following sections provide an overview of data sources and methods for various specific causes and references to more detailed documentation. For some conditions, WHO programs maintain up-to-date databases based on diseases registers, population surveys, and epidemiological studies. These have been used where available. Many of the epidemiological reviews underlying the GBD 2001 estimates of YLD have been documented and published in draft form on the WHO Web site ( http://www.who.int/evidence/bod ) and in peer-reviewed publications.
While it is difficult to quantify the exact numbers of data sources used for the YLD estimates for the GBD 2001, table 3.11 provides an approximate count by region. This table counts the number of data sources (registers, notifications, health facility and other official data sets, and epidemiological studies) for each of the causes included in the GBD 2001. For some causes, the only counts available were of the number of countries in each region for which country-specific data were used. In some cases, an exact recount of studies by region was not feasible, and an approximate regional breakdown was estimated from prior counts according to 17 subdivisions of the 6 WHO regions used in WHO documentation of GBD analyses and data sources (Mathers, Lopez, and others 2003). In addition, it was not always possible to be consistent in the counting of studies carried out across multiple countries or multiple years. Finally, note that there is huge variability in the information content across studies or data sets, and that small epidemiological studies are counted equally in table 3.11 with national hospital inpatient data on injuries for an entire population-year. Thus the counts in table 3.11 should be treated as reasonably indicative of the empirical bases underlying the GBD 2001 without overinterpreting differences between causes or regions.
[Table .]
That said, it is striking that of the more than 8,000 data sets estimated to have been used for the GBD 2001 estimation of YLD, nearly 6,600 relate to Group I causes and only 18 to Group III causes. Furthermore, one-quarter of the data sets relate to populations in Sub-Saharan Africa and around one-fifth to populations in high-income countries. While this predominance of data relating to Group I conditions and to Sub-Saharan Africa is not entirely surprising, the paucity of data for some of the leading noncommunicable diseases is more surprising. For example, for several of the leading causes of burden among mental disorders, one or no usable population-based studies were found for some regions, and for IHD, few studies of the incidence or prevalence of angina pectoris or acute myocardial infarction were found outside high-income countries.
Assuming that for causes in table 3.11 where the counts relate to countries rather than to data sets there are, on average, two data sets per country; then overall, approximately 8,700 data sets contributed to the estimation of YLD. Not counting again studies that also contributed to the estimation of cause-specific mortality rates, an additional 1,370 data sets were used to estimate YLL. In total, the GBD 2001 has drawn on more than 10,000 data sets or studies, making it almost certainly the largest synthesis and analysis of global population health data carried out to date.
Communicable Diseases and Maternal, Perinatal, and Nutritional Conditions
This section gives an overview of data sources and methods for specific Group I causes and references to more detailed documentation.
Tuberculosis.
Estimates of incidence and deaths due to TB (excluding HIV-infected persons) for countries in 2001 formed the basis of estimates of TB prevalence in 2001. The methods and data used to estimate incidence and mortality for each country were described earlier. For countries with VR data for TB deaths, incidence estimates have been revised to be consistent with estimated deaths, estimated case fatality rates for treated and untreated cases, and proportion of incident cases treated.
Estimated prevalence of all forms of TB (excluding HIV-infected persons) for 2001 was calculated by multiplying estimated incidence by estimated duration. Country-specific estimates of duration were weighted for the proportion of cases treated and that were smear-positive.
Sexually Transmitted Infections Other Than HIV/AIDS.
More than 300 community-based and prenatal care-based prevalence and incidence studies of pregnant women were used to generate region-specific estimates of the prevalence of syphilis, chlamydia, and gonorrhea. The methodology is described in detail elsewhere ( Gerbase and others 1998 ; WHO 2001c ) and was used to update estimates to 2001.
HIV/AIDS.
The Joint United Nations Programme on HIV/AIDS and WHO have developed country-specific estimates of HIV/AIDS for most countries and revise them periodically to account for new data and improved methods ( Salomon and Murray 2001b ; Schwartlander and others 1999 ; Walker and others 2003 ). For the most recent round of estimates, they used two different types of models, one for generalized epidemics and one for epidemics concentrated in high-risk groups.
For a few countries where prevalence estimates for HIV seropositive cases were not directly available, they were derived by scaling regional prevalence estimates according to the ratio of country-specific HIV/AIDS mortality to regional HIV/AIDS mortality. Because different countries may be in different phases of the epidemic, the relationship between prevalence and mortality may vary across countries.
Diarrheal Diseases.
To estimate the incidence of diarrheal diseases in children under five in developing and developed countries, 357 community-based studies and population surveys were used ( Bern 2004 ; Murray and Lopez 1996d ). Point prevalences were estimated assuming an average duration of six days per episode. Work is currently in progress to update these estimates with more recent evidence from community-based studies.
Vaccine-preventable Childhood Diseases and Meningitis.
The methods used to estimate incidence for childhood-cluster diseases were summarized earlier. The incidence of meningitis due to Haemophilus influenzae type b together with the incidence of meningitis due to Streptococcus pneumoniae and Neisseria meningitides, was updated from the 1990 estimates using information from the WHO Vaccines and Biologicals Program derived from country notifications of cases and deaths, from WHO surveillance centers and, where relevant, from immunization coverage data ( WHO 2001b ).
Hepatitis B and C.
Available data on the prevalence of chronic hepatitis B and hepatitis C infection were used together with disease models to estimate regional incidence and mortality rates ( Global Burden of Hepatitis C Working Group 2004 ; Lavanchy 2004 ; WHO 2002a ; WHO 2002b ).
Malaria.
Malaria prevalence was based on regional prevalence rates for acute symptomatic episodes estimated by Murray and Lopez (1996d) . Country-specific estimates of malaria prevalence were derived by adjusting subregional prevalence by the ratio of country to subregional malaria mortality. Work is currently under way in collaboration with other WHO programs and external expert groups to refine and revise these country-specific estimates of malaria prevalence ( Korenromp 2005 ).
Schistosomiasis.
The CEGET/WHO Atlas of the Global Distribution of Schistosomiasis ( Doumenge and others 1987 ) and population-based prevalence studies were used to estimate country-specific prevalence rates. Prevalence estimates were based on regional prevalence rates for schistosomiasis infection ( Murray and Lopez 1996d ) applied to updated estimates of country-specific populations at risk in 2001 ( van der Werf and de Vlas 2001 ).
Lymphatic Filariasis.
Estimates for lymphatic filariasis were developed for six of the eight regions defined for the GBD 1990 study ( Murray and Lopez 1996d ). The established market economies and formerly socialist economies of Europe were excluded, because infection was not considered to be endemic in these countries. The prevalence data were obtained from community-based surveys and complemented with reports by the Information and Reference Service of the Parasitic Diseases Program, WHO. Prevalence estimates were based on regional prevalence rates for cases of hydrocele or lymphodaema caused by infection with filariae. These estimates were updated using estimates of country-specific populations at risk in 2001 provided by the WHO Lymphatic Filiariasis Elimination Program.
Onchocerciasis.
In the early 1990s, WHO estimated the prevalence of blindness due to onchocerciasis from surveys and national reports ( WHO 1995 ). Following the continued success of the Onchocerciasis Control Program in western African countries and the introduction of population-wide administration of ivermectin in other endemic areas, the prevalence of onchocerciasis and its disabling sequelae has been dramatically reduced in all 36 endemic countries in Latin America and the Caribbean and Sub-Saharan Africa ( Richards and others 2001 ). Therefore, the prevalence of blindness from onchocerciasis was reestimated by taking into account the declining trends in prevalence and the coverage and duration of onchocerciasis control programs ( Alley and others 2001 ).
Reliable sources of information on the prevalence of blindness due to onchocerciasis are available from several population-based studies, usually as part of an overall blindness survey. However, prevalence studies of onchocerciasis-specific blindness are often carried out in hyperendemic areas and/or in local communities, and thus the estimated prevalence may not be generalizable to the country as a whole. For this reason, the current prevalence of blindness due to onchocerciasis was estimated by nationally reported data, if available, and extrapolation from 1993 estimates using trend analysis of onchocerciasis control programs in each endemic country (Shibuya and Ezzati 2003).
Leprosy.
Regional incidence and prevalence rates for leprosy were based on case reporting and surveillance by 120 WHO member states ( Stein 2002a ; WHO 2002c ).
Dengue and Dengue Hemorrhagic Fever.
Regional incidence and prevalence rates for dengue and dengue hemorrhagic fever were based on a review of nearly 300 population-based studies, but data were sparse for regions apart from East Asia and the Pacific and Latin America and the Caribbean ( LeDuc, Esteves, and Gratz 2004 ).
Trachoma.
The baseline regional and subregional prevalence of blinding trachoma was first estimated as described elsewhere ( Frick and others 2003 ; Ranson and Evans 1995 ) and then updated using several recent population-based studies in the Middle East and North Africa and Sub-Saharan Africa. As the prevalence of blinding trachoma declines with socioeconomic development even in the absence of a specific trachoma control program ( Dolin and others 1997 ), the extrapolation from regional prevalence estimates made in the 1980s would overestimate current prevalence. For this reason, both nationally reported data and specific criteria for a regression model of time-series data were used to estimate the prevalence of blinding trachoma. The model estimates were then applied to countries that have reported cases of blinding trachoma ( Shibuya and Mathers 2003 ).
Intestinal Nematode Infections.
Updated estimates of the prevalence of intestinal nematode infections were based on WHO's new global databank on schistosomiasis and soil-transmitted helminths, which contains data derived from community-based, cross-sectional surveys for subnational administrative regions ( Brooker and others 2000 ; de Silva and others 2003 ). In areas without comprehensive data, predictions of the distribution of soil-transmitted helminths were developed using environmental data derived from satellite remote sensing ( Brooker and others 2002 ). Incidence rates and YLD for disabling sequelae of helminth infections were modeled using a mathematical model developed by Chan and others ( Bundy and others 2004 ; Chan 1997 ).
Lower Respiratory Infections.
Prevalence and incidence estimates for lower respiratory infections were based on an analysis of published data on the incidence of clinical pneumonia from 95 community-based studies published since 1961 ( Rudan and others 2004 ). Most of the studies were longitudinal and conducted over long enough periods to account for seasonal variation. Studies over short periods of time were excluded.
Maternal Conditions.
Incidence rates for maternal conditions and disabling sequelae were derived from reviews of published population-based studies supplemented by studies of hospital-based deliveries adjusted for the proportion of deliveries occurring in hospitals ( Dolea and AbouZahr 2003a , 2003b ; Dolea, AbouZahr, and Stein 2003; Dolea and Stein 2003). The incidence of unsafe induced abortion was estimated at the country level using 156 published and unpublished reports for 131 countries together with information on legal and social contexts (Ahman, Dolea, and Shah 2003; WHO 2004a ).
Perinatal Conditions.
Incidence rates for low birthweight, birth asphyxia and trauma, and disabling sequelae were derived from health service-based data and national birth registration systems in high-income countries and from mothers participating in nationally representative household surveys (such as the U.S. Agency for International Development-funded DHSs and the Multiple Indicator Cluster Surveys carried out by the United Nations Children's Fund), supplemented by reviews of published population-based and hospital-based studies (UNICEF and WHO 2005).
Protein-Energy Malnutrition.
More than 400 recent nationally representative studies from WHO's global database on child growth and malnutrition ( http://www.who.int/nutgrowthdb/ ) were used to estimate the prevalence of child stunting and wasting in every country ( de Onis and Blossner 2003 ; de Onis, Frongillo, and Blossner 2000 ; de Onis and others 2004 ). Where country estimates were not available from the database, the regional average calculated from the available studies or data from other countries with similar epidemiological characteristics were used ( Stein 2002c ).
Iodine Deficiency and Vitamin A Deficiency.
Country-specific estimates for goiter rates were obtained and used to calculate regional estimates for total goiter rates. The primary data source was the WHO Nutrition and Health for Development Program, which is developing and refining a comprehensive database of country-specific estimates of both clinical and subclinical iodine deficiency disorders from national level and subnational nutrition surveys ( Rastogi and Mathers 2002a ; WHO 2001a ; WHO Nutrition Program 2005 ).
Country-specific estimates were obtained and used to calculate regional estimates for both xerophthalmia and corneal scars resulting from vitamin A deficiency ( Rastogi and Mathers 2002c ). Again, the primary data source was the WHO Nutrition and Health for Development Program, which is also developing and refining a comprehensive database of country-specific estimates of both clinical and subclinical vitamin A deficiency from national-level and subnational nutrition surveys ( WHO Nutrition Program 2002b ). The database compiles information for all population groups, especially preschool-age children and women of childbearing age, and includes information on the prevalence of xerophthalmia, including night blindness and serum retinol distributions.
Iron Deficiency Anemia.
Country-specific prevalence estimates of iron deficiency anemia were obtained from 69 studies and used to estimate regional age- and sex-specific prevalence rates for mild, moderate, and severe anemia. The primary data source was the WHO Nutrition and Health for Development Program. The program is currently preparing a comprehensive database of country-specific prevalence estimates of both clinical and subclinical iron deficiency anemia from national-level and subnational nutrition surveys (a WHO Nutrition Program 2002a ).
All prevalence estimates were reviewed, with priority being given to the most recent national-level estimates (most were obtained from studies conducted in the last 10 years). For countries for which no studies were available, the regional average was applied ( Rastogi and Mathers 2002b ).
Noncommunicable Diseases
This section gives an overview of data sources and methods for specific Group II causes and references to more detailed documentation.
Malignant Neoplasms.
Regional survival models were developed for each cancer site and used to estimate numbers of incident cases from estimated deaths by site for each country (Mathers, Shibuya, and others ; Shibuya and others 2002 ). The same models were used to estimate numbers of prevalent cases, defined as cases of malignant neoplasms causing death within 15 years, and cases of nonfatal malignant neoplasms (where the person is likely to survive 15 years or more) diagnosed within the last five years.
Diabetes Mellitus.
Diabetes prevalence estimates for those age 20 and older were based on an analysis of 41 representative population-based studies that used oral glucose tolerance tests and either 1980 WHO criteria to define diabetes cases or similar criteria that produced comparable prevalences ( Wild and others 2004 ). For countries for which eligible data were not available, data from a proxy country believed to have similar diabetes prevalence were used. Most studies of diabetes prevalence did not indicate the type of diabetes, and consequently the estimates refer to all diabetes. The prevalence of diabetes among people under 20 years of age was estimated from incidence data derived from 100 published studies ( Karvonen and others 2000 ).
Depressive Disorders.
Point prevalence estimates for episodes of unipolar major depression were derived from a systematic review of available published and nonpublished population studies on depressive disorders, which identified 56 studies from all World Bank regions ( Ustun and others 2005 ). Variations in the prevalence of unipolar depressive disorders in some European countries, Australia, Japan, and New Zealand were estimated directly from relevant population studies ( Ayuso-Mateos and others 2001 ). For other high-income European countries, country-specific prevalences were estimated using a regression model of available prevalence data on suicide rates (for ages 15 to 59, both sexes combined). For other regions, prevalence estimates were based on regional prevalence rates applied to country-specific population estimates for 2002. Unlike the original GBD study, survey data on the severity of unipolar depressive disorders (mild, moderate, or severe) were used together with disability weights for these three severity classes from Stouthard and others (1997) . This resulted in an overall disability weight for unipolar depressive disorders across regions from 0.30 to 0.46. This compares reasonably well with a more recent analysis of the distribution of depression by severity and disability weights for a Dutch community, which resulted in an overall disability weight of 0.41 ( Kruijshaar and others 2005 ). YLD due to dysthymia not associated with major depressive episodes were estimated separately using the disability weight for mild depressive disorders.
Subregional prevalence rates for bipolar disorder were derived from a systematic review of all available published and unpublished population studies using case definitions that met the diagnostic criteria of the fourth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) of the American Psychiatric Association (1994) or of ICD-10 ( Ayuso-Mateos 2002a ).
Anxiety Disorders and Schizophrenia.
Subregional prevalence rates for panic disorder, obsessive-compulsive disorder, and post-traumatic stress disorder were also derived from systematic reviews of all available published and unpublished population studies using case definitions that met ICD-10 or DSM-IV criteria ( Ayuso-Mateos 2002b , 2002c , 2002d ; Ustun and Chisholm 2001 ).Those with comorbid depressive disorder or alcohol or drug use disorders were excluded from prevalence estimates. For data sources and methods for schizophrenia see Ayuso-Mateos (2002d).
Alcohol and Drug Use Disorders.
The case definition for alcohol use disorders is based on ICD-10 criteria for alcohol dependence and harmful use, excluding cases with comorbid depressive episode. DSM-IV alcohol abuse is included in the case definition. All available population-based surveys using diagnostic criteria that could be mapped to this case definition were identified. Population estimates of the point prevalence of alcohol use disorders were obtained from 55 studies (Mathers and Ayuso-Mateos 2003).
Published data on alcohol production, trade, and sales, adjusted for estimates of illegally produced alcohol, were used to estimate country averages of the volume of alcohol consumed. These preliminary estimates were then further adjusted on the basis of survey data on alcohol consumption to estimate the prevalence of alcohol use disorders for countries where recent population-based survey data were not available ( Rehm and others 2004 ).
Estimating the prevalence of illicit drug use is difficult, because the use of these drugs is illegal, stigmatized, and hidden. In addition, definitions differ from country to country, as does the quality of data collected. The definition used for the GBD 2001 was based on ICD-10 criteria for opioid dependence and harmful use or cocaine dependence and harmful use, excluding cases with comorbid depressive episodes. Data on the prevalence of problematic illicit drug use were derived from a range of sources ( Degenhardt and others 2003 ). A literature search was conducted of all studies that estimated the prevalence of problematic drug use and more than 100 studies were identified. Other data sources included the United Nations Drug Control Program and the European Monitoring Centre for Drugs and Drug Addiction.
Insomnia (Primary).
Subregional prevalence rates for primary insomnia were derived from systematic reviews of all available published and unpublished population studies using case definitions that met ICD-10 or DSM-IV criteria, where the insomnia causes problems with usual activity and is not secondary to other diseases. Persons with comorbid depressive disorder or alcohol or drug use disorders were excluded from the prevalence estimates.
Epilepsy and Multiple Sclerosis.
Subregional prevalence rates for epilepsy, excluding epilepsy or seizure disorder secondary to other diseases or injury, were derived from systematic reviews of available published and unpublished population studies. Subregional prevalence rates for multiple sclerosis, derived for the GBD 1990, were updated using recent epidemiological studies ( Warren and Warren 2001 ).
Alzheimer's Disease and Other Dementias.
Subregional prevalence rates, incidence rates, and durations for Alzheimer's disease and other dementias were estimated based on 110 available population studies and assumed to apply to countries within each subregion (Mathers and Leonardi 2003).
Parkinson's Disease.
Regional incidence to mortality rates for Parkinson's disease estimated by Murray and Lopez (1996d) were used to derive country-specific estimates for incidence from the estimated country-specific mortality rates.
Migraine.
Regional prevalence rates for people who experience migraine were estimated from 43 available population studies and assumed to apply to countries within each subregion ( Leonardi and Mathers 2003 ). Migraine has been treated as a chronic disease lasting from 15 years to around 45 years with sporadic episodes. The case definition was taken from the International Headache Society's definition of migraine. Available population studies using this definition provided prevalence estimates that were quite similar across most regions.
Mental Retardation.
An attempt was made to assess the prevalence of all forms of mental retardation, but due to difficulties with data comparability, we decided to assess only the burden resulting from childhood exposure to environmental lead, plus mental retardation estimated as sequelae to diseases or injuries or associated with specific congenital malformations. The YLD associated with mental retardation as a sequela of diseases and injuries or as a component of a syndrome are included in the estimation of total YLD for such causes in the tables presented in annex 3C. In addition, YLD were estimated separately for mental retardation as a consequence of environmental lead exposure, because this was required for the assessment of the total attributable burden of environmental lead exposure. For details of methods and data sources see Fewtrell and others (2004) and Pruss-Ustun and others (2004) .
Low Vision and Blindness.
Both regional and subregional prevalences for blindness and low vision were updated using all available data gathered since 1980 ( Resnikoff and others 2004 ; Thylefors and others 1995 ). Subregional prevalences were estimated from more than 50 cross-sectional, population-based surveys of blindness and low vision, both published and unpublished. For countries for which no data were available, prevalences were extrapolated from available data for neighboring subregions or countries with a similar epidemiological and socioeconomic environment. The DisMod software was then used to obtain internally consistent age- and sex-specific estimates of incidence, prevalence, remission, and relative risks of mortality. Ratios of blindness to low vision for each region were used to estimate the prevalence of low vision and DisMod analyses were then carried out to ensure internal consistency among parameters.
Hearing Loss.
Despite the number of published studies on hearing loss, many of them use different criteria and relate to subnational or nonrepresentative populations. Data from 25 representative population surveys of measured hearing loss (19 surveys for adults and 14 surveys for children) were used to estimate subregional prevalences of moderate or greater hearing loss according to the WHO definition (hearing threshold level in the better ear is 41 decibels or greater averaged over 0.5, 1.0, 2.0, and 4.0 kilohertz) and of severe or greater hearing loss (hearing threshold level in the better ear is 61 decibels or greater averaged over 0.5, 1.0, 2.0, and 4.0 kilohertz) (Mathers, Smith, and Concha 2003). Regional estimates of the prevalence of hearing aid use were used in the calculation of average disability weights for moderate, severe, and profound hearing loss in each region, and thus to calculate YLD associated with hearing loss.
Congestive Heart Failure.
The incidence of congestive heart failure following acute myocardial infarction was estimated using a model for IHD based on available population data on incidence and case fatality rates for acute myocardial infarction and on the proportion of acute myocardial infarction patients who go on to develop congestive heart failure (Mathers, Truelson, and others 2004). The incidence of congestive heart failure as a sequela to rheumatic heart disease, hypertensive heart disease, and inflammatory heart diseases was estimated using incidence to mortality ratios from the GBD 1990 ( Murray and Lopez 1996d ).
Angina Pectoris.
The GBD 2001 study developed a model for IHD based on available population data on the incidence and case fatality rates for acute myocardial infarction and on the prevalence and case fatality rates for angina pectoris (Mathers, Truelson, and others 2004). Observed correlations between the prevalence of acute myocardial infarction survivors and the prevalence of angina pectoris (whether incident before or after acute myocardial infarction) were used to estimate the prevalence of angina pectoris from the modeled prevalences of acute myocardial infarction survivors. The latter were estimated from country-specific IHD mortality estimates together with estimated regional case fatality rates for acute myocardial infarction.
Stroke.
The GBD 2001 study developed a model for stroke based on available population data on case fatality rates within 28 days for incident cases of first-ever stroke and on long-term survival in cases surviving this initial period, in which the risk of mortality is highest (Truelsen and others 2002). A consistent relationship between incidence, prevalence, and mortality was established using U.S. data. The resulting age- and sex-specific 28-day and long-term case fatality rates were used as the basis for estimating subregional case fatality rates after adjusting for the observed relationship between GDP per capita and overall 28-day case fatality rates in published studies from various countries. Consistent epidemiological models for the prevalence of stroke survivors in each subregion were then estimated using these case fatality rates and observed mortality after adjustment to account for the fact that deaths recorded as resulting from stroke in vital statistics do not fully reflect the true excess risk of mortality among survivors.
Chronic Obstructive Pulmonary Disease.
Chronic obstructive pulmonary disease is characterized by airway obstruction with lung function levels of forced expiratory volume in one second (FEV 1 ) to forced vital capacity ratio of less than 70 percent and the presence of a postbronchodilator FEV 1 of less than 80 percent of the predicted value that is not fully reversible. Because accurate prevalence data based on spirometry are not available in many regions, an alternative approach was used to infer disease occurrence from regional estimates of mortality due to chronic obstructive pulmonary disease that made use of the constraints imposed by the consistent epidemiological relationships among prevalence to incidence, remission, case fatality, and mortality rates. The relative risk of mortality due to chronic obstructive pulmonary disease across subregions was estimated as a function of its two leading risk factors—tobacco smoking and indoor air pollution from solid fuel used for cooking—along with regional fixed effects (Lopez and others forthcoming). Data on risk factors were derived from the comparative risk assessment carried out for the World Health Report 2002 ( Ezzati and others 2002 ; WHO 2002d ). The estimated relative risks were validated by comparing estimated regional prevalence with data from available population studies. For regions where surveys of representative populations based on spirometry were available, both direct estimation and model estimation were used.
Asthma.
Asthma prevalence estimates were based on a case definition requiring a positive airway hyper-responsiveness test in addition to symptoms in the last 12 months. Specifically, the prevalence estimates related to cases defined in terms of reported wheeze in the last 12 months plus current bronchial hyper-responsiveness, defined as a mean provocation concentration of histamine required to produce a 20 percent fall in FEV 1 of 8 milligrams per milliliter or less.
While epidemiological studies commonly use a broader definition of asthma based on symptom reporting, the 2001 GBD study used a narrower definition in order to identify cases experiencing a significant loss of health. The disability threshold for inclusion in the prevalence estimates is mild asthma, defined as occasional wheeze that does not affect usual activities, but which, if untreated, may result in occasional episodes that cause sleep disturbance and/or speech limitations.
A review of published literature identified studies using the foregoing definition, but also many studies using self-reported symptoms only, self-reported current asthma (asthma attack in the last 12 months or currently in treatment), or physician diagnosis of current asthma in the last 12 months. Based on study populations for which prevalence data were available according to one of these alternative definitions, as well as the foregoing stricter definition, we calculated adjustment factors to estimate asthma prevalence from community surveys using other definitions of asthma.
A total of 149 population-based studies were used to derive estimates of asthma prevalence for a wide range of countries for children, teenagers, and adults. In particular, extensive use was made of two multicountry studies: the International Study of Asthma and Allergies in Childhood using self-reported symptoms in children ages 6 to 7 and 13 to 14 (a ISAAC Steering Committee 1998a , 1998b ), and the European Community Respiratory Health Survey of adults ages 20 to 44 using self-reported symptoms and bronchial hyper-responsiveness ( Chinn and others 1997 ; Pearce and others 2000 ). Estimates from the population-based studies were then used to derive subregional average prevalence rates, which were assumed to apply in countries without specific population studies.
Rheumatoid Arthritis.
Subregional prevalence rates for rheumatoid arthritis were derived from available published population studies using case definitions for definite or classical rheumatoid arthritis ( Symmons, Mathers, and Pfleger 2002b ).
Osteoarthritis.
Subregional prevalence rates for osteoarthritis were derived from available published population studies that provided prevalence data for symptomatic osteoarthritis of the hip or knee, radiologically confirmed as Kellgren-Lawrence grade 2 or greater ( Symmons, Mathers, and Pfleger 2002a ).
Edentulism.
Prevalence numbers were based on regional prevalence rates for edentulism estimated by Murray and Lopez (1996d) . New data from the 2002-4 WHO World Health Survey will enable revision of these estimates in the future.
Injuries
An incident episode of a nonfatal injury is defined as an episode that is severe enough for the person to be hospitalized or that requires emergency room care (if such care is available). Begg and others (2002) describe methods used to estimate injury-related prevalences and prevalence YLD. In brief, the incidence of nonfatal injuries by external cause category, age, and sex was estimated by applying regional and country-specific death to incidence ratios to the injury deaths estimated for each country in 2002.
Age- and sex-specific ratios were based on new analyses of health facility data provided by 18 countries in five World Bank regions. For most cause categories, extrapolations from observed death to incidence ratios were derived for all countries at a regional level, with final adjustments using mortality and per capita GDP as predictors of expected variability in case fatality rates.
Prevalences for disabling injuries were estimated from the proportions of cases by injury type estimated to result in long-term disability, together with estimates of short- and long-term disability durations. The latter were based on analyses of excess mortality risks from epidemiological studies (Begg and others 2002).
