3. The Burden of Disease and Mortality by Condition: Data, Methods, and Results for 2001

Estimating Deaths by Cause: Methods and Data

Complete death registration data cover only one-third of the world's population. Some information on another third is available through the urban death registration systems and national sample registration systems of China and India. For the remaining one-third of the world's population, including most countries in Sub-Saharan Africa, only partial information is available from epidemiological studies, disease registers, and surveillance systems.

The original (1990) GBD study was the first attempt to estimate the global and regional numbers of deaths resulting from a comprehensive set of causes while ensuring consistency with death totals provided by death registration and demographic methods ( Murray and Lopez 1996c ). Ensuring this consistency was a major advance and is an essential first step in measuring the disease burden. Estimates of numbers of deaths carried out separately for individual causes that are not constrained to sum to a demographically derived total often result in substantial overestimates of deaths from each cause ( Jamison 1996 ). In part, this occurs because in carrying out analysis for a single cause, researchers may easily be overinclusive in counting the deaths attributable to the cause of interest, even without any intent to maximize the size of the specific problem.

Thus, the first analytical step in estimating deaths by cause was to estimate age-specific total death rates by sex. The importance of this step cannot be overemphasized. The number of deaths by age and sex provided an essential "envelope" that constrained individual disease and injury estimates of deaths. Competing claims for the magnitude of deaths from various causes must then be reconciled within this envelope.

Next, to estimate the number of deaths by cause we drew on the following four broad sources of data:

  • Death registration systems. Complete or incomplete death registration systems provide information about causes of death for almost all high-income countries and for many countries in Europe (Eastern) and Central Asia and in Latin America and the Caribbean. Some vital registration (VR) information is also available in all other regions.

  • Sample death registration systems. In China and India, sample registration systems for rural areas supplement urban death registration systems. Information systems now provide data on causes of death for several other large countries for which information was not available at the time of the original GBD study.

  • Epidemiological assessments. Epidemiologists have estimated deaths for specific causes, such as HIV/AIDS, malaria, and tuberculosis (TB), for most countries in the regions most affected. These estimates usually combine information from surveys on the incidence or prevalence of the disease with data on case fatality rates.

  • Cause of death models. The cause of death models used in the original GBD study ( Murray and Lopez 1996a ) were substantially revised and enhanced for estimating deaths by broad cause group in regions with limited information on mortality. The CodMod software developed for this study and described later drew on a data set of 1,613 country-years of observation of cause of death distributions from 58 countries between 1950 and 2001.

 

All-Cause Mortality for 192 Countries


According to data provided by 112 WHO member states, only about one-third of the estimated 56 million deaths occurring annually are recorded in death registration systems. If the sample registration systems of China and India are considered to provide information on their entire populations, then information is available for around 72 percent of the global population. In recent years, considerable priority has also been given to obtaining data on child and maternal mortality through such instruments as the Demographic and Health Survey (DHS) program funded by the U.S. Agency for International Development and the Multiple Indicator Cluster Survey program carried out by the United Nations Children's Fund. Table 3.1 summarizes sources of information on levels of child and adult all-cause mortality used to construct life tables for 192 WHO member states by region and by type of data.


[Table .]

For countries with death registration data, demographic techniques (Preston-Coale, Brass growth-balance, generalized growth-balance, and Bennett-Horiuchi methods) were first applied, as appropriate, to assess the extent of completeness of the recorded mortality data for adults. If the data coverage estimates were high enough to be meaningful, death rates for those aged five years and over were then adjusted accordingly. The completeness of death registration for children was assessed separately using other available sources of information on child mortality. For countries without usable VR data, other available sources of adult mortality such as surveys and censuses were used to estimate the level of adult mortality as measured by 45 q 15 (the probability of dying between exact ages 15 and 60). For child mortality under five, again, all available survey, census, and VR data were assessed, adjusted, and averaged to estimate the probable trend in child mortality ( 5 q 0 ) in recent decades.

The population estimates used for all countries were those prepared by the United Nations Population Division (2003). Note that these estimates refer to de facto populations, that is, they include residents such as guest workers and refugees, rather than de jure populations, meaning citizens, and in some countries, permanent residents. Member states that report death registration data to WHO also routinely report population data for the population the death registration system covers, which in some cases is a subset of the national population. Death registration data may cover less than 100 percent of the population not only because some geographical areas may be excluded, but also because registration may be restricted to a subset of the resident population, such as citizens, and may thus exclude deaths among groups such as guest workers or refugees.

For the GBD 2001 study, age- and sex-specific death rates were calculated from the death and population data provided by countries, with adjustments made for completeness of the registration data where needed, and then total deaths by age and sex were calculated for each country by applying these rates to the United Nations Population Division estimates of de facto populations for 2001.

Four methods were used to construct life tables for each country depending on the type of data available ( Lopez and others 2002 ):

  • Countries with death registration data for 2001. Such data were used directly to construct life tables for 56 countries after adjusting for incomplete registration if necessary.

  • Countries with a time series of death registration data. Where the latest year of death registration data available was prior to 2001, a time series of annual life tables (adjusted if the registration level was incomplete) between 1985 and the latest available year was used to project levels of child and adult mortality for 2001. For small countries with populations of less than 500,000, moving averages were used to smooth the time series. Projected values of child and adult mortality were then applied to a modified logit life table model ( Murray, Ferguson, and others 2003 ), using the most recent national data as the standard, to predict the full life table for 2001, and HIV/AIDS and war deaths were added to total mortality rates for 2001 where necessary. This method was applied for 40 countries using a total of 711 country-years of death registration data.

  • Countries with other information on levels of child and adult mortality. For 37 countries, estimated levels of child and adult mortality were applied to a modified logit life table model ( Murray, Ferguson, and others 2003 ), using a global standard, to estimate the full life table for 2001, and HIV/AIDS deaths and war deaths were added to total mortality rates as necessary. For most of these countries, data on levels of adult mortality were obtained from death registration data, official life tables, or mortality information derived from other sources such as censuses and surveys. The all-cause mortality envelope for China was derived from a time series analysis of deaths for every household in China reported in the 1982, 1990, and 2000 censuses. The extent of underreporting of deaths in the 2000 census was estimated at about 11.3 percent for males and 18.1 percent for females ( Bannister and Hill 2004 ). The all-cause mortality envelope for India was derived from a time series analysis of age-specific death rates from the Sample Registration System after correction for underregistration (88 percent completeness) ( Mari Bhat 2002 ).

  • Countries with information on levels of child mortality only. For 55 countries, 42 of them in Sub-Saharan Africa, no information was available on levels of adult mortality. Based on the predicted level of child mortality in 2001, the most likely corresponding level of adult mortality (excluding HIV/AIDS deaths where necessary) was selected, along with uncertainty ranges, based on regression models of child versus adult mortality as observed in a set of almost 2,000 life tables judged to be of good quality ( Lopez and others 2002 ; Murray, Ferguson, and others 2003 ). These estimated levels of child and adult mortality were then applied to a modified logit life table model, using a global standard, to estimate the full life table in 2001, and HIV/AIDS deaths and war deaths were added to total mortality rates as necessary. Evidence on adult mortality in Sub-Saharan African countries remains limited, even in areas with successful child and maternal mortality surveys.

 

Classification of Causes of Disease and Injury


Disease and injury causes of death and of burden of disease were classified using the same tree structure as in the original GBD study ( Murray and Lopez 1996c ). The first level of disaggregation comprises the following three broad cause groups:

  • Group I: communicable, maternal, perinatal, and nutritional conditions

  • Group II: noncommunicable diseases

  • Group III: injuries.

Each group was then divided into major cause subcategories, for example, cardiovascular disease (CVD) and malignant neoplasms (cancers) are two major cause subcategories of Group II. Beyond this level, two further disaggregation levels were used, resulting in a complete cause list of 136 categories of specific diseases and injuries. Annex table 3A.2 lists the GBD 2001 cause categories and their ICD codes in terms of the ICD 9th revision (ICD-9) and 10th revision (ICD-10) ( WHO 1977 , 1992 ).

Group I causes of death consist of the cluster of conditions that typically decline at a faster pace than all-cause mortality during the epidemiological transition. In high-mortality populations, Group I dominates the cause of death pattern, whereas in low-mortality populations, Group I accounts for only a small proportion of deaths. The major cause subcategories are closely based on the ICD chapters with a few significant differences. Whereas the ICD classifies chronic respiratory diseases and acute respiratory infections into the same chapter, the GBD cause classification includes acute respiratory infections in Group I and chronic respiratory diseases in Group II. Note also that the Group I subcategory of "causes arising in the perinatal period" relates to the causes included in the corresponding ICD chapter, principally low birthweight, prematurity, birth asphyxia, and birth trauma, but does not include all causes of deaths occurring during the perinatal period, such as infections, congenital malformations, and injuries. In addition, the GBD includes only deaths among children born alive and does not estimate stillbirths (see chapter 6 ).

The development and successive revisions of the ICD have facilitated the comparability of cause of death data within and across countries. Although each revision has produced some discontinuities in cause of death data, the revision from ICD-9 to ICD-10 resulted in more substantial changes than previous revisions. ICD-10 is considerably more detailed than ICD-9, with almost twice the number of codes, and includes both conceptual and classification revisions as well as changes in the coding rules used to select the underlying cause of death. Additional problems in comparing data on causes of death across countries arise from variations in the accuracy of diagnoses of causes of death.

In most developed countries, medical practitioners certify the underlying cause of death even though they may not always have had prior contact with the deceased or access to relevant medical records. In developing countries, a significant proportion of deaths may occur without medical attention and such deaths may be registered without a medical opinion about the cause of death. At the same time, selecting a single underlying cause of death is often problematic for the elderly, who have often had several chronic diseases that concurrently led to their death. This results in higher levels of uncertainty about cause of death distributions in the oldest age group. Finally, in both developing and developed countries, legal, societal, and other reasons may lead to the underreporting of causes of death of a sensitive nature, such as suicide or HIV/AIDS. For this reason, other sources of information for specific causes such as HIV/AIDS, illicit drug use, and war have been used where necessary to modify cause-specific estimates based on death registration data.

The GBD classification system does not include the ICD category "symptoms, signs, and ill-defined conditions" as one of the major causes of deaths. The GBD classification scheme has reassigned deaths assigned to this ICD category, as well as some other codes used for ill-defined conditions, to specific causes of death. This is important from the perspective of generating useful information to compare cause of death patterns or to inform health policy making, because it allows unbiased comparisons of cause of death patterns across countries or regions.

Deaths are categorically attributed to one underlying cause using ICD rules and conventions. In some cases where the ICD rules are ambiguous, the GBD 2001 follows the conventions used by the GBD 1990 study ( Murray and Lopez 1996a ). Note also that a number of causes of death act as risk factors for other diseases. Total mortality attributable to such causes may be substantially larger than the mortality estimates for the cause in terms of ICD rules for underlying causes. For example, the GBD 2001 estimates that 960,000 deaths were due to diabetes mellitus as an underlying cause, but when deaths from CVD and renal failure attributable to diabetes are included, the global total of attributable deaths rises to almost 3 million ( Roglic and others 2005 ). Other causes for which important components of attributable mortality are included elsewhere in the GBD cause list include hepatitis B or C (mortality attributable to liver cancer and renal failure), unipolar or bipolar depressive disorders and schizophrenia (mortality attributable to suicide), and blindness (mortality attributable to blindness whether from infectious or noninfectious causes).

 

Countries with Complete or Incomplete Death Registration Data


In the last decade, computerization of death registration data at the country level and electronic transmission to WHO have considerably improved the timeliness of information. In addition, the number of countries submitting their underlying cause of death data to WHO using ICD-10 increased from 4 in 1995 to 75 in 2003. Some 50 countries are still reporting data using ICD-9 and only 1 country is still using ICD-8 ( Mathers and others 2005 ).

Several new features and changes from ICD-9 to ICD-10 have a major impact on the interpretation of statistical data, and the implications of these changes have been taken into account to a limited extent when making trend comparisons and estimations for causes of death. ICD-10 is more detailed, with about 10,000 codes compared with around 5,100 in ICD-9, and the rules for selecting the underlying cause of death have been reevaluated and sometimes changed. For example, ICD-10 considers pneumonia to be a consequence of a much wider range of conditions than ICD-9, and it therefore would be less likely to be selected as the underlying cause. Modification of the death certificate with the inclusion of an additional line in part 1 of the certificate (for diseases related to the chain of events leading directly to death) as recommended by WHO may also have had an impact on the selection of the underlying cause of death.

Accuracy in diagnosing causes of death still varies substantially across countries with death registration systems. In addition, even in countries where medically qualified staff assign causes of death, some degree of misattribution or miscoding occurs during the process of coding underlying causes of death, mainly because of incorrect or systematic biases in diagnoses, incorrect or incomplete death certificates, misinterpretation of ICD rules for selecting underlying causes, and variations in the use of categories for unknown and ill-defined causes ( Mathers and others 2005 ).

Death registration data containing usable information on cause of death distributions were available for 107 countries, mostly in the high-income group, Europe and Central Asia, and Latin America and the Caribbean ( table 3.2 , annex table 3A.3 ). Where the latest available year was earlier than 2001, death registration data from 1980 through the latest available year were analyzed as a basis for projecting recent trends for specific causes, and these trend estimates were used to project the cause distribution for 2001. When estimating cause of death distributions for very small countries, an average of the three last years of data was used to minimize stochastic variation.


[Table .]

In the case of the few countries still reporting data using the condensed ICD-9 Basic Tabulation List, algorithms based on data from countries with more detailed coding were applied to estimate deaths due to asthma as no Basic Tabulation List code for asthma is available. Also, China and some of the newly independent states of the former Soviet Union still use some special condensed ICD-9 cause of death classifications, which were then mapped to the GBD cause list. Missing values for some GBD conditions were estimated with the use of algorithms. Similarly, algorithms were also applied for countries reporting data using the condensed ICD-10 Mortality Tabulation List 1.

Deaths resulting from war are not systematically included in the cause of death data. For example, in the United States, the Department of Defense records deaths resulting from war, and for security reasons they are not included in the death registration system. Some death registration data undercount deaths due to HIV/AIDS and drug use partly because of miscoding and partly because of reluctance to record these diagnoses. In some cases, adjustments for deaths due to war, HIV/AIDS, and drug use have been made using other sources of information as described later.

Cause of death data were carefully analyzed to take incomplete coverage of VR into account and the likely differences in cause of death patterns among the uncovered and often poorer subpopulations. When the coverage of death registration data was assessed as less than 85 percent, cause of death modeling was used to adjust the proportions of deaths occurring in Groups I, II, and III by age and sex. Table 3.2 shows the regional distribution of the 58 countries for which such adjustments were carried out. In total, useful information on cause of death distributions was available for 37 percent of the world's population, or 76 percent if China and India's sample registration and mortality surveillance systems were included. Usable death registration information was available for only four Sub-Saharan African countries: Mauritius, the Seychelles, South Africa, and Zimbabwe. Death registration data are available for several other Sub-Saharan African countries, but are largely restricted to deaths in urban hospitals, with overall coverage being too low to provide useful population-level information on cause of death distributions ( Rao, Bradshaw, and Mathers 2004 ).

Annex table 3A.3 summarizes the years of death registration data with information on underlying cause available for each country, together with information on the methods used to estimate cause of death distributions. As shown in table 3.1 , a total of 770 country-years of death registration data were used in the analysis of causes of death for the GBD 2001.

 

Redistribution of Ill-defined Causes and "Garbage Codes"


Even in countries where medically qualified staff assign causes there is substantial use of coding categories for unknown and ill-defined causes. In addition to the ICD codes for "symptoms, signs, and ill-defined conditions" (ICD-9 codes 780-799 and ICD-10 codes R00-R99), a number of other ICD codes do not represent useful underlying causes from a policy perspective and their inappropriate overuse compromises the usefulness of information on causes of death. These garbage codes or ill-defined codes include deaths from injuries where the intent was not determined (ICD-9 codes E980-989 and ICD-10 codes Y10-Y34 and Y872); CVD categories lacking diagnostic meaning, such as cardiac arrest and heart failure (ICD-9 codes 427.1, 427.4, 427.5, 428, 429.0, 429.1, 429.2, 429.9, and 440.9; and ICD-10 codes I47.2, I49.0, I46, I50, I51.4, I51.5, I51.6, I51.9, and I70.9); and cancer deaths coded to categories for secondary or unspecified sites (ICD-9 codes 195 and 199 and ICD-10 codes C76, C80, and C97). The percentage of deaths coded as ill-defined causes varies from 4 percent in New Zealand to more than 40 percent in Sri Lanka and Thailand.

Table 3.3 shows the distribution of deaths assigned to ill-defined codes for the 105 WHO member states reporting data on death registrations since 1990 with at least 50 percent completeness or coverage. The median percentage of deaths coded to ill-defined causes was 12 percent; the median percentage of symptoms, signs, and ill-defined conditions was 4.0 percent; and the median of ill-defined cardiovascular causes was 5.3 percent. In more than 15 high-income countries, more than 10 percent of deaths were coded to these ill-defined conditions, not so much because of overuse of codes for symptoms, signs, and ill-defined conditions, but because of excessive use of garbage codes for CVD, cancers, and injuries ( Mathers and others 2005 ).

To produce unbiased estimates of cause-specific death rates, and to maximize comparability across member states, deaths coded to general ill-defined categories (ICD- 9, chapter XVI; ICD-10, chapter XVIII) were redistributed pro rata across all Group I and Group II causes, that is, all causes excluding injuries. Correction algorithms were also applied to resolve problems of miscoding for the cardiovascular, cancer, and injury garbage codes.

 

Ill-defined Cardiovascular Codes.


Physicians may use a number of cardiovascular codes in ICD-9 and ICD-10 to assign deaths that are actually due to ischemic heart disease (IHD). They may assign IHD deaths to ill-defined cardiovascular codes because of insufficient clinical information at the time of death, local medical diagnostic practices, or simply by error. These include codes for heart failure, ventricular dysrhythmias, generalized atherosclerosis, and ill-defined descriptions and complications of heart disease.


[Table .]

Figure 3.2 illustrates the enormous variation across countries in coding practice with respect to these ill-defined cardiovascular codes. For each country, the fraction of cardiovascular deaths (excluding stroke) assigned to the ill-defined cardiovascular codes is plotted against the fraction of cardiovascular deaths (excluding stroke) assigned to IHD (ICD-9 codes 410-414 or ICD-10 codes I20-I25). The strong negative relationship between IHD mortality and that from the ill-defined CVD codes (r 2 = 0.90) strongly supports the hypothesis that the quality of CVD death certification varies substantially across countries. The upper left portion of figure 3.2 shows countries where doctors certified, on average, more ill-defined CVD than IHD deaths, and these include France, Japan, Portugal, and Spain. The bottom right corner of the figure shows those countries where doctors assign, on average, a small proportion of ill-defined CVD deaths. This second group includes Australia, Canada, Finland, New Zealand, Norway, and the United Kingdom (Scotland). We refer to these two groups of countries as the high ill-defined coding and low ill-defined coding groups.
[Figure 3.2]

To correct for the likely underregistration of IHD in countries such as France, Japan, and Spain in the original GBD study, Murray and Lopez (1996a) developed an algorithm based on the assumption that the cluster of countries comprising Canada, Finland, New Zealand, and Norway, where ill-defined coding was low, would define the standard coding practice. For all other countries, the percentage of cardiovascular deaths (excluding stroke) assigned to these codes in excess of this standard percentage was then assumed to be largely miscertified IHD. For the GBD 2001, Lozano and others (2001) developed a revised method to estimate the fraction of IHD deaths assigned to ill-defined cardiovascular codes. This involved estimating age- and sex-specific regression equations predicting observed IHD death rates in terms of the ill-defined CVD death rates and the smoking impact ratio for a cross-national data set of 372 country-years of death registration data for 26 countries between 1979 and 1998. The smoking impact ratio, estimated from lung cancer mortality rates using the Peto-Lopez method ( Peto and others 1992 ), is a measure of the cumulative effects of tobacco exposure as a risk factor for IHD.

Table 3.4 shows the resulting correction factors, that is, the proportion of ill-defined CVD deaths reassigned to IHD. As expected, the extent of miscoding at every age, for both males and females, was systematically higher in high ill-defined coding countries, where the results suggest that 50 to 95 percent of ill-defined CVD codes should be reassigned to IHD.


[Table .]

With correction, the age standardized death rates increased in all countries, but most notably in Japan (26 percent for males and 24 percent for females), France (27 percent for males and 35 percent for females), and Greece (32 percent for males and 47 percent for females). Smaller increases were apparent for Belgium, the Czech Republic, Hungary, Italy, Portugal, and Spain (12 to 25 percent on average for males and females), and only small changes were observed for Austria, Germany, the Netherlands, and the United States (about 5 percent). In other countries, including Australia, Canada, Finland, Ireland, New Zealand, Norway, and the United Kingdom (Northern Ireland and Scotland), no corrections were suggested by this analysis.

Corrections for miscertification narrow the range in death rates across countries from a fivefold to a fourfold variation and also change the relative rankings of countries. The analysis of IHD miscertification is supported by the dramatic increase of more than 25 percent in recorded IHD mortality rates in Japan between 1994 and 1995 with the change from ICD-9 to ICD-10, whereby physicians were encouraged not to use heart failure as an underlying cause of death. Prior to the introduction of ICD-10, corrected rates were more than 80 percent higher in males and around 70 percent greater in females compared with what was recorded in vital statistics.

Lozano and others (2001) compare the miscertification levels estimated using their regression approach with those observed in the WHO Monitoring Cardiovascular Disease (MONICA) study sites. They find general agreement in relation to the existence of significant miscertification in each country, but less clear agreement on specific levels of miscertification. This latter finding is difficult to interpret given some difficulties in mapping the MONICA "possible IHD category" to ICD categories and the fact that the study sites may not be representative of national populations.

While the empirical results of applying the recoding model are encouraging, and the GBD 2001 has used it to reassign ill-defined CVD codes, two points are noteworthy. First, the fraction of ill-defined cardiovascular deaths that are due to IHD is assumed to be constant across countries within each of the low and high ill-defined code groups. Statistical models can only go so far in extracting truth from poorly coded deaths data, and more precise country-specific analyses really require recoding studies for samples of relevant deaths, ideally involving autopsy or other clinical diagnostic information. Second, due to the nonstandard disease classification used in Russia and other newly independent states (175 categories based on ICD-9), the method cannot be applied without further evidence from autopsies as to the true cause of cardiovascular deaths. The single most important cause of cardiovascular death in these countries is "coronary atherosclerosis" (093 in the Soviet classification of diseases), which in part reflects a disease process different than what the term implies elsewhere ( Chenet and others 1998 ; Zatonski 1998 ). The use of the code "sudden death" to describe mortality often associated with binge drinking in Russia and neighboring countries may also conceal cases of IHD ( Kauhanen and others 1997 ).

 

Ill-Defined Cancer Codes.


In the GBD 1990 study, deaths coded to ICD-9 195-199 (malignant neoplasm of other and unspecified sites, including those whose point of origin cannot be determined, secondary and unspecified neoplasm) were redistributed pro rata across all malignant neoplasm categories within each age and sex group, so that the category "other malignant neoplasms" included only malignant neoplasms of other specified sites ( Murray and Lopez 1996a ). For the GBD 2001, the survival model developed for estimating cancer deaths by site from cancer incidence data (Mathers, Shibuya, and others 2002) was used to compare predicted deaths from the survival model for the United States with those reported in U.S. vital statistics. This comparison identified four sites that did not appear to have any significant coding of cancer deaths to the garbage codes ICD-9 195-199. The redistribution algorithm for cancer garbage codes was therefore revised for the GBD 2001 to redistribute cancer garbage code deaths pro rata across all cancer sites except liver; pancreas; ovary; and trachea, bronchus, and lung.

 

Intent of Injuries Undetermined.


Deaths assigned to codes for injuries undetermined whether accidentally or purposefully inflicted (ICD-9 codes E980-989 and ICD-10 codes Y10-Y34 and Y872) are those where the person certifying the cause of death has not determined whether the injuries were unintentional or intentional, for example, an outcome of self-inflicted injury or assault. While there will remain a residue of deaths for which insufficient information is available to determine intent, this should be a small fraction of injury deaths if appropriate forensic and coronial investigations are carried out. Excluding South Africa, the proportion of injury deaths assigned to these codes varies from less than 0.5 percent in most developed countries to just over 5 percent ( table 3.3 ). To reduce bias in estimating deaths due to unintentional and intentional injuries, deaths coded as undetermined intent were redistributed pro rata by age and sex to the GBD categories for intentional and unintentional injury.

 

Data Sources and Methods for Some Specific Countries


In some cases, either because of large population size, and hence implications for global mortality estimates, or because of recent national burden of disease research involving one or more of the authors, more detailed methods to estimate mortality patterns were applied, as summarized in the following subsections.

 

China.


Cause-specific mortality data for China are available from two sources: the sample VR system administered by the Ministry of Health and the Disease Surveillance Point (DSP) system established by the Chinese Center for Disease Control (see Yang and others 2005 for an overview of the design and operational characteristics of these systems). The VR system covers a population of 120 million people at 137 sample sites and captures around 700,000 deaths per year. The DSP system has 145 surveillance points, covers a population of around 11 million, and collects information on around 50,000 deaths per year.

The Ministry of Health classifies sample sites for the DSP system into an urban stratum and four socioeconomic strata for rural areas, based on an analysis of nine indicators for rural counties from the 1990 national census. These indicators include birth and mortality rates, dependency ratios, literacy rates, and proportions of agricultural versus industrial occupations in the overall workforce. The VR system's sample sites are classified into one urban and three rural socioeconomic strata. Because the sample sites for the DSP system are considered to be nationally representative, the fraction of the national population in each socioeconomic stratum was assumed to follow the same population distribution as the DSP sites.

Data from the VR system for 2000 and a three-year average for the DSP system from 1997-9 were separately appraised for their usability in estimating national-level, cause-specific mortality for China. From the two systems, a comparison of age-standardized mortality rates for specific conditions was carried out for each socioeconomic stratum, as shown in figure 3.3 .
[Figure 3.3]

We found that the mortality rates of the DSP system reflected the broad cause, group-specific mortality distribution more accurately, especially in rural areas. Also, the sampling distribution of sites in the DSP system was more nationally representative than that of the VR system. Thus, the proportional distribution of broad cause group mortality for each stratum from the DSP data was applied to each stratum-specific mortality envelope to derive the broad cause group mortality in absolute numbers of deaths by age and sex.

The VR system's data captured mortality at the level of subgroup and specific cause more accurately, and because it was based on a significantly larger sample of deaths, it showed more plausible age patterns for specific causes. Hence, the specific cause-proportionate mortality distributions from the VR system's data were used for distributions within broad cause groups.

Finally, we summed the mortality estimates by cause, age, and sex from each stratum to obtain a national estimate of cause-specific mortality that had not been corrected for underregistration. We then inflated this cause-specific mortality to the national all-cause mortality envelope from the life table analysis to obtain the final national estimate of cause-specific mortality for 2001. We adjusted these estimates with information from WHO technical programs on maternal, perinatal, and childhood-cluster conditions and from epidemiological estimates for TB, HIV/AIDS, illicit drug dependence and problem use, rheumatoid arthritis, and war deaths.

 

India.


For India, separate mortality recording systems for rural and urban areas were used to estimate all-cause death rates by age and sex for rural and urban areas and these were added to obtain national all-cause death rates to construct a national life table. The all-cause mortality envelope was derived from a time series analysis of age-specific death rates from the Sample Registration System after correcting them for underregistration (88 percent completeness) ( Mari Bhat 2002 ).

Cause patterns of mortality were based on the Medical Certification of Cause of Death Database for urban areas of India and the Annual Survey of Causes of Death for rural areas of India. The all-cause mortality envelope was split into separate envelopes for urban and rural populations using a 70:30 ratio. Data on cause-specific mortality from separate sources for rural and urban areas were used with these mortality envelopes to build up independent estimates for urban and rural areas, which were summed to obtain national cause-specific mortality estimates.

For rural areas, the Andhra Pradesh burden of disease study ( Mahapatra 2002 ) analyzed data from the Annual Survey of Causes of Death for 1996-8. The analysis included the redistribution of ill-defined deaths to specific causes based on a verbal autopsy retest survey conducted as part of the field studies for the project. For urban areas, data from the Medical Certification of Cause of Death system for 1996 were used. This system provides data on about 400,000 deaths annually coded to a national list of ICD-9 causes groups that approximates the ICD-9 Basic Tabulation List. These data were mapped onto the GBD classification and inflated to the urban mortality envelope. The proportion of urban deaths due to injuries was adjusted based on results from a large-scale verbal autopsy study in the city of Chennai, which detected that about 2.5 percent of deaths certified as due to ill-defined medical causes were actually due to injuries ( Gajalakshmi and others 2002 ).

The summed national-level, cause-specific mortality estimates were adjusted with information from WHO technical programs on maternal, perinatal, and childhood-cluster conditions, as well as epidemiological estimates for TB, HIV/AIDS, illicit drug dependence and problem use, rheumatoid arthritis, and war deaths.

 

Egypt.


Even though Lopez and others (2002) assessed Egyptian death registration data for 2000 to be almost complete, these data contained high proportions of deaths coded to symptoms and ill-defined conditions, as well as to conditions such as heart failure and cardiac arrest that were not underlying causes of death. Hence, a model-based prediction of the broad cause proportionate distribution by age and sex was used and applied to the cause-specific mortality structure from the country data after excluding a major proportion of the ill-defined deaths.

 

Turkey.


The national life table for Turkey was estimated from separate urban and rural life tables. To estimate the urban life table, reported deaths during 1991-9 in the 81 provincial and distinct urban centers were evaluated for completeness using established demographic methods. These methods suggested that for more recent years, adult deaths were about 80 percent complete for males and 78 percent complete for females. These correction factors were used to estimate the level of adult mortality ( 45 q 15 ) in 1999 and the rate was then projected forward to 2000. The resulting estimates (0.190 for males and 0.106 for females) were similar to the levels estimated from the 2002-3 nationally representative mortality survey carried out by the Ministry of Health and Bakent University ( Baskent University 2005 ). Together with estimated child mortality values from the 1998 DHS projected to 2000, a full life table was estimated for urban Turkey, which is equivalent to about two-thirds of the national population. Death rates were projected to 2001 assuming an annual rate of mortality decline of 1.25 percent. For rural areas, child mortality was first estimated from the DHS in the same way as for urban areas. Adult mortality ( 45 q 15 ) was estimated from the WHO modified logit life table system (0.235 for males, 0.189 for females), values that were broadly similar to national mortality survey data, although the relatively small number of rural deaths in the survey, about 300, gave rise to substantial uncertainty about the true levels of adult mortality in rural areas. The urban and rural death rates were then weighted by population size to obtain estimated national death rates, and hence the life table.

Data on causes of death were only available for urban areas of Turkey. These data were systematically reviewed for cause miscoding and adjusted based on clinical opinion and evidence on a sample of deaths from urban hospitals in Ismir and Ankara. In particular, most of the large proportion of deaths coded to "other heart disease" were reassigned to specific vascular pathologies based on this clinical evidence. For rural areas, causes of death were estimated using CodMod as described later. Adjusted proportions of Group I, II, and III deaths by age and sex were first estimated, and then the same proportionate distribution of deaths by cause as observed for urban areas was applied, after adjustment, to estimate the detailed pattern of causes of death.

 

Islamic Republic of Iran.


Data from the VR system in Iran were compiled for 18 of the country's 26 provinces for 2001. The data were coded to a condensed list of 150 cause categories using ICD-10. Because the registration system only covered part of the national population, a model-based prediction was used to estimate the broad cause proportionate mortality for the whole country. The results are shown in figure 3.4 . The model predicted a higher proportion of Group I causes for both males and females in childhood and a higher proportion of Group I causes for females ages 15 to 44, reflecting higher maternal mortality among the nonregistered population than among the registered population. The predicted distributions for the broad cause groups were then applied to the specific-cause proportionate mortality from the reported data and adjusted to the national mortality envelope derived from the life table analysis.
[Figure 3.4]

 

Thailand.


VR data were available for 2000 with an estimated coverage of about 80 percent ( Lopez and others 2002 ). However, the proportion of ill-defined conditions was nearly 50 percent, because many deaths in Thailand occur at home and the cause of death is often reported by lay persons. To improve the usability of data from the VR system, the Ministry of Public Health conducted a retest survey on a sample of about 33,000 deaths, using verbal autopsy methods, to ascertain the true cause of death ( Ministry of Public Health 2002 ). This included a sample of 12,000 deaths with ill-defined causes. The study reallocated about 66 percent of deaths with ill-defined causes to specific causes, including reclassifying many deaths as caused by HIV/AIDS. The reallocation algorithm for ill-defined causes from the verbal autopsy study was used to correct the high proportion of ill-defined deaths from the VR data, and then the resultant cause-specific proportionate mortality was inflated to the national mortality envelope derived from the life table analysis.

 

Epidemiological Estimates of Mortality for Specific Causes


As outlined in table 3.2 , specific epidemiological estimates for some causes were also taken into account in analyzing causes of death for countries. Table 3.5 summarizes the numbers of studies (population-based epidemiological studies, disease registers, and notification systems) that contributed to the estimation of mortality due to 21 specific causes of death, including HIV/AIDS, malaria, and TB. As the table shows, more than 2,700 data sets contributed to the estimates for these 21 causes of death, with almost one-third of these relating to Sub-Saharan Africa.


[Table .]
 

Tuberculosis.


In 1997, WHO began a study to develop country estimates of incidence, prevalence, and mortality from TB (for a detailed description of data sources and methods see Dye and others 1999 ). The study derived estimates of incidence from case notifications adjusted by estimated case detection rates, prevalence data on active disease combined with estimates of average case durations, or estimates of infection risk multiplied by a scalar factor relating the incidence of smear-positive pulmonary TB to annual risks of infection.

Since the original estimates for 1997 were completed, revised and updated estimates have been prepared. Most countries reporting to WHO have provided notification data with interpretable trends and with no other evidence for any significant change in the case detection rate. Trends in notification rates were assumed to represent trends in incidence rates for most countries except those with evidence of changes in case detection rates. China carried out a countrywide disease prevalence survey during 2000, and the results were used to reevaluate incidence for 1999. For other countries with evidence of changes in case detection rates, the trend for one of eight groups of epidemiologically similar countries was assumed to apply ( Corbett and others 2003 ). Annual reports on TB control have included further details on surveillance methods, case notifications, and incidence estimates by country ( WHO 2003a ).

Deaths due to all forms of TB (excluding HIV-infected persons) were estimated for 2001. For countries with VR data, these estimates were based on the most recently available VR data. For other countries, estimates were based on the estimated TB incidence rates (excluding HIV-infected persons) multiplied by the estimated case fatality rates and weighted for the proportion of cases treated and the proportion smear-positive.

 

HIV/AIDS.


The Joint United Nations Programme on HIVAIDS and WHO have developed country-specific estimates of HIV/AIDS mortality and revise them periodically to account for new data and improved methods ( Schwartlander and others 1999 ; Walker and others 2003 ). For the most recent round of estimates, they used two different types of models depending on the nature of the epidemic in a particular country. For generalized epidemics, in which infection is spread primarily through heterosexual contact, they used a simple epidemiological model to estimate epidemic curves based on sentinel surveillance data on HIV seroprevalence ( UNAIDS Reference Group on Estimates Model and Projections 2002 ). For countries with epidemics concentrated in high-risk groups, they used prevalence estimates derived from the estimated population size and prevalence surveillance data in each high-risk category, and then employed simple models to back-calculate incidence and mortality based on these estimated prevalence trends ( Stover and others 2002 ).

For countries with death registration data, HIV/AIDS mortality estimates were generally based on the most recently available VR data except where miscoding of HIV/AIDS deaths was evident. In such cases, a time series analysis of causes was carried out to identify and reassign miscoded HIV/AIDS deaths. For other countries, estimates were based on the Joint United Nations Programme on HIV/AIDS and WHO estimates of HIV/AIDS mortality for 2001, or in some cases where these were not available, on the estimated prevalence of HIV/AIDS in 2001 multiplied by the average subregional mortality to prevalence ratio.

 

Diarrheal Diseases.


For countries with usable death registration data, deaths due to diarrheal diseases were estimated directly from that data. For other countries, a regression model was used to estimate proportional mortality from diarrhea for children under five (Boschi-Pinto and others forthcoming). The regression model included the logit of the proportional mortality from diarrheal diseases in children from birth through four as a dependent variable and gross domestic product (GDP) per capita in international dollars, time, and region indicator variables as explanatory variables. The regression data were drawn from more than 60 community-based studies carried out since 1980 with study durations of multiples of 12 months. This model was validated and supplemented with vital statistics from developing countries where coverage was high.

 

Vaccine-Preventable Childhood Diseases.


Mortality for measles was estimated using two approaches. In countries where routine vaccine coverage was low (less than 80 percent), incidence data were derived from a natural history model using country-specific vaccine coverage and attack rates from population-based studies ( Crowcroft and others 2003 ). For countries with higher routine coverage and in the elimination phase, case notification and country-specific correction factors were used to estimate incidence. To obtain mortality in countries where VR data were not available, age- and region-specific case fatality rates from community-based and outbreak studies were applied to incidence estimates derived from both approaches.

Pertussis cases and deaths were based on a natural history model using vaccine coverage and age-specific case fatality rates from community based studies, where available ( Crowcroft and others 2003 ). The model is a revision of Galezka and Robertson's (2004) approach.

The incidence estimates for polio and diphtheria ( Stein 2002b ; Stein and Robertson 2002) were based on country-specific reported cases of acute flaccid paralysis with adjustments for underreporting and on country-specific notifications of diphtheria cases with an assumed notification efficiency of 20 percent, respectively. A case fatality rate of 10 percent was assumed for diphtheria in countries without high death registration coverage.

 

Acute Respiratory Infections.


Community-based studies with durations of one year or longer, published since 1980, were used to estimate the proportional mortality from acute respiratory infections in children under five in developing countries ( Williams and others 2002 ). The results confirmed earlier findings that the proportion of deaths attributable to acute respiratory infections diminishes as general mortality diminishes. Much of the variability across studies in the proportion of child deaths attributed to acute respiratory infections was due to the use of verbal autopsies to determine the cause of death. Data from seven studies that compared verbal autopsies with hospital-based diagnoses indicated that the percentage of deaths due to acute respiratory infections could be underestimated by up to 4 percent. The modeled estimates were supplemented with vital statistics from developing countries where coverage was high to develop regional and global estimates.

 

Malaria.


Malaria mortality estimates for all regions except Sub-Saharan Africa were derived from the cause of death data sources described earlier. For Sub-Saharan Africa, country-specific estimates of malaria mortality were based on analyses by Snow and others (1999) and updated using the most recent geographical distributions of risks from the Mapping Malaria Risks in Africa International Collaboration. Subsequent adjustments were made to the estimated country-specific malaria deaths to ensure that total mortality for Group I causes, particularly in the 0-4 year age group, and including estimates for other specific causes such as TB, HIV/AIDS, and measles, added to the total all-cause mortality envelopes for the relevant countries. Work is currently under way to refine and revise these country-specific estimates of malaria mortality in collaboration with other WHO programs and external expert groups ( Korenromp and others 2003 ; Rowe and others 2004 ).

 

Chagas' Disease.


Chagas' disease estimates were obtained from recent intensive surveillance activities in the Southern Cone American countries and community-based studies ( Moncayo 2003 ; Moncayo, Guhl, and Stein 2002). These estimates were supplemented with and validated against vital statistics from Latin American countries where coverage was high.

 

Maternal Mortality.


Mortality from maternal conditions was estimated following a similar approach to earlier analyses ( Abdallah and Zehani 2000 ; Hill, AbouZahr, and Wardlaw 2001 ), using the most recently available mortality data for developing countries, together with improved estimates of the impact of HIV/AIDS as a competing cause of mortality ( WHO, UNICEF, and UNFPA 2003 ). Depending on the availability and quality of data on detailed causes of maternal deaths, the methods used to estimate the proportion of deaths of women of reproductive age that is due to maternal causes (PMDF) varied and included vital records, DHSs and other surveys, Reproductive Age Mortality Study surveys, and epidemiological models. For countries without death registration data, both nationally reported data and specific criteria for a regression model were used to estimate maternal mortality. The dependent variable in this model was the logit of the PMDF after subtracting HIV/AIDS deaths and the explanatory variables were the proportion of deliveries with skilled birth attendance, the GDP per capita in international dollars, and the general fertility rate plus region dummy variables. The total number of deaths from maternal causes for each country was estimated by multiplying the PMDF by the overall mortality envelope for women aged 15 to 49 after subtracting HIV/AIDS deaths.

Abortion-related mortality occurs mainly as a result of unsafe induced abortion. It has been estimated using published and unpublished reports for 131 countries together with other information on legal and social contexts and summed to give regional totals ( WHO 2004a ).

 

Perinatal Causes.


The cause category "perinatal causes" refers to the ICD cause group "conditions arising in the perinatal period" (ICD chapter 16, P-codes). Deaths from these causes, primarily low birthweight, prematurity, and birth trauma or asphyxia, may occur at any age, and can include some maternal or placental causes, such as multiple pregnancy. Deaths from these causes should not be confused with deaths that occur during the perinatal period, which include stillbirths and neonatal deaths from other causes such as tetanus and congenital malformations. However, acknowledging that nearly all deaths due to perinatal causes occur during the neonatal period, we first estimated the envelope of neonatal mortality for every country (for details of the method see Murray and Lopez 1998 ). The analysis has been updated using recent death registration data and DHS data. Work is currently under way in collaboration with other WHO programs and external expert groups to refine and revise these country-specific estimates of mortality due to perinatal causes ( Lawn, Cousens, and Zupan 2005 ).

 

Cancer.


For countries without good VR data to estimate the site-specific distribution of cancer mortality, a site-specific model for relative interval survival was developed and applied to cancer incidence estimates by site (Mathers, Shibuya, and others 2002; Shibuya and others 2002 ). This age-period-cohort model of cancer survival was based on data from the Surveillance, Epidemiology, and End Results program of the National Cancer Institute ( Ries and others 2002 ). The model was further adjusted by site for each country based on observed correlations in regional and country survival probabilities and level of economic development (GDP per capita in international dollars) (Mathers, Shibuya, and others 2002). Combined with available incidence data from the International Agency for Research on Cancer ( Ferlay and others 2001 ), cancer death distributions were estimated and the model estimates were validated against available VR data from countries other than the United States.

 

Drug Use Disorders.


This category includes dependence on and nondependent problem use of both licit and illicit drugs, excluding tobacco and alcohol (see table 3A.2 ). Estimating mortality directly attributable to drug use disorders, such as death from overdose, is difficult because of variations in the quality and quantity of mortality data. For some regions with a substantial prevalence of illicit drug use, available data sources do not record any deaths as due to drug dependence. As a result, it is necessary to make indirect estimates based on estimates of the prevalence of illicit drug use and of case fatality rates, on the assumption that almost all mortality directly attributable to drug use disorders is associated with illicit drugs. However, making even indirect estimates is difficult because the use of these drugs is illegal, stigmatized, and hidden.

The comparative risk assessment work carried out for the World Health Report 2002 ( WHO 2002d ) included estimating the prevalence of illicit drug dependence and direct mortality based on available data ( Degenhardt and others 2003 ; Ezzati and others 2002 ). Data on the prevalence of problematic illicit drug use were derived from a range of sources, including a formal literature search of all studies that estimated the prevalence of problematic drug use, the United Nations Drug Control Program, and the European Monitoring Centre for Drugs and Drug Addiction (2002).

A search was also conducted for cohort studies of drug users that had estimated mortality due to individual causes of death (overdose, suicide, and trauma) and to all causes of death (updating previous systematic reviews). Data on the number of years of follow up were extracted from each study and a weighted average annual mortality rate was calculated for each cause of death and for their sum.

The total regional deaths due directly to illicit drug use were then distributed among countries in each region in proportion to estimated prevalences of drug dependence and problem use. For developed countries with good VR data, evidence suggests that deaths due to drug use disorders are underrecorded ( European Monitoring Centre for Drugs and Drug Addiction 2002 ; Single and others 2002 ). For these countries, mortality figures were adjusted for age groups in which the estimated deaths derived from the comparative risk assessment analysis exceeded the number of deaths recorded on the assumption that these additional deaths were originally miscoded as due to accidental poisoning or ill-defined causes.

 

War Deaths.


Country-specific estimates of war deaths and corresponding uncertainty ranges were obtained from a variety of published and unpublished databases. The Armed Conflict Report ( Project Ploughshares 2001 , 2002 ), a report that supplies several databases with mortality estimates (see, for example, Center for Research on the Epidemiology of Disasters 2001 ), was the primary source used for time trend and mortality estimates. This report was a preferred source of information, because it includes war deaths by country and year, a departure from the typical practice of supplying estimates by conflict and across years. The report's data were checked against historical and current estimates by other research groups, such as those of the Uppsala Conflict Data Project ( Gleditsch and others 2002 ) and the Center for International Development and Conflict Management at the University of Maryland ( Marshall and Gurr 2003 ).

These data sets rely on press reports of eyewitness accounts and official announcements of combatants, which are, unfortunately, the main and often only possible method of estimating casualties in armed conflicts. Murray, King, and others (2002) summarize the issues involved in estimating war deaths and emphasize the considerable uncertainty in the GBD 2000 and GBD 2001 estimates. Many of the available data sources on conflict deaths only count deaths in conflicts that involve the armed forces of at least one state or one or more armed factions seeking to gain control of all or part of the state, and in which more than a certain number of people have been killed, for instance, more than 1,000 total or more than 25 per year. Some sources count only battlefield deaths and deaths that occur concurrently with conflict.

In contrast, the GBD 2001 estimated deaths occurring in 2001 in which the underlying cause (following ICD conventions) was an injury due to operations of war or civil insurrection, whether or not that injury occurred during the time of war or following the cessation of hostilities, which in some cases occurred many years earlier than 2001. The GBD 2001 estimates included injury deaths resulting from all civil insurrection, whether or not the state was involved. They also included deaths due to terrorism carried out by organized groups. The GBD 2001 estimates of war deaths did not include deaths from other causes, such as starvation, infectious disease epidemics, or lack of medical intervention for chronic diseases, that may be counterfactually attributable to war or civil conflict.

Deaths due to landmines and unexploded ordnance were estimated separately by country. The primary sources for these data were the Landmine Monitor Report of the International Campaign to Ban Landmines ( Human Rights Watch 2001 ) and Handicap International's annual report on landmine victims ( Handicap International 2001 ).

Whereas total injury deaths for most countries were derived either from death registration data or from cause of death models, war deaths were treated as "outside the envelope," and for countries for which life tables were estimated from data for earlier years not affected by war, war deaths were added to the total deaths estimated from the life tables.

 

Cause of Death Modeling for Countries with Poor Data


Although epidemiological studies and other data sources described in the previous section allow the estimation of deaths due to certain causes in populations without death registration data, they do not cover many important causes of death in these populations, such as CVD or injuries. To address these information gaps, models for estimating broad cause of death patterns can serve as the starting point for indirect methods of estimating attributable mortality for a comprehensive list of causes.

Preston (1976) was the first to develop indirect methods for estimating cause of death structure. Preston modeled the relationship between total mortality and cause-specific mortality for 12 broad groups of causes using historical VR data for the industrial countries and a few developing countries. In particular, Preston postulated that cause-specific mortality was a linear function of total mortality. The GBD 1990 study ( Murray and Lopez 1996a ) used cause of death models to estimate mortality for the three major cause groups (Groups I, II, III) as a function of mortality from all causes, based on regression analysis of observations on recent mortality patterns from 67 countries. The log of cause-specific mortality was postulated to be a linear function of the log of total mortality, and poorly coded deaths were redistributed before estimating the regression equations.

The cause of death model used in the GBD 1990 has been substantially revised and enhanced for estimating deaths by broad cause group in regions with limited information on mortality. The statistical model has been improved by adapting models for compositional data that were previously developed in other areas, and a substantially larger data set of 1,613 country-years of observations was used for analysis. Income per capita has been added to the model as an explanatory variable in addition to the level of all-cause mortality ( Salomon and Murray 2002a ).

This section provides an overview of the new model, CodMod, developed by Salomon and Murray for the GBD 2001, and describes its application for estimating (a) broad cause patterns for populations where no cause of death information is available, and (b) broad cause of death patterns when incomplete death registration data are available. The estimation of broad cause of death patterns is critical to avoid overemphasizing or underemphasizing specific causes due to biases in the data sets available to estimate national mortality patterns, for example, if data are derived from urban hospital statistics.

 

Statistical Methods and Data.


The statistical basis for cause of death models has also been enhanced by the adaptation of models for compositional data that were previously developed in other areas ( Katz and King 1999 ). These models take account of the key features of this type of data, namely, that the fraction of deaths attributable to each cause is bounded by 0 and 1 and that all the fractions must sum to unity. Violations of both constraints were possible with the regression models used in the GBD 1990; an additional normalization step was undertaken to impose these constraints. The new model explicitly ensures both these constraints using a seemingly unrelated regression model (for a full description of this model and its application to analysis of the epidemiological transition, see Salomon and Murray 2002a ).

In addition to revising the statistical model used in the previous study, Salomon and Murray also considered additional covariates beyond all-cause mortality. The objective was to identify variables likely to have a strong relationship to cause-specific mortality, but also variables for which estimates would be available in all countries, because one of the goals of the exercise was to use the model to predict broad patterns of mortality for countries without VR data. The variables that were selected based on these criteria were all-cause mortality, as before, plus income per capita in international dollars. Both variables were included in logged form, because this formulation tended to provide a better fit than the linear form.

Perhaps most important, the new cause of death model incorporated a more extensive database on mortality by age, sex, and cause than previous efforts, with substantially more representation of middle-income countries. Increasing the range of observed cause of death patterns should improve the validity of extrapolations from countries with registration systems to data-poor settings.

Separate models were estimated for each sex and the following age groups: younger than 1 month, 1-11 months, 1-4 years, 5-9 years, 10-14 years, and so on by five-year age groups up to 80-84 years and 85 years and older. For the two youngest age groups, a smaller number of observations were available because some countries for some periods reported only on the age range from birth to 11 months. A total of 586 country-years of observations were available for the first two age groups and 1,613 country-years of observations for each of the other 18 age groups. The regression results provided insights into the relationships between cause of death patterns, all-cause mortality levels, and increases in income per capita ( Salomon and Murray 2002a ).

Salomon and Murray also used Monte Carlo simulation techniques to estimate the probability distributions of the predicted cause of death components given a particular set of values for all-cause mortality and GDP per capita ( Salomon and Murray 2001a ). The results from this approach were useful in estimating cause of death patterns for residual areas in countries where VR covers only part of the population and in defining regional cause of death patterns.

 

Application of CodMod for Countries without Good Registration Data.


As with the GBD 1990, one of the useful applications of cause of death models is to examine patterns of deviation from the expected cause composition across countries or regions based on the probability distribution for a predicted cause of death pattern. In other words, the models permit comparison of the observed pattern with the pattern that would be predicted conditional on the levels of all-cause mortality and income per capita associated with that observation.

Given some assumptions about the stability of this pattern of deviation over short time intervals within a country or across countries in the same mortality stratum, it is possible to use the observed cause of death pattern in a reference population to estimate the cause of death pattern for some other population while taking into account differences in the explanatory variables. Some examples of applications would be

  • estimating the cause of death pattern in nonregistration areas for a country in which part of the population is covered by a VR system,

  • forecasting the cause of death pattern for a country where the most recent VR data are for several years in the past, and

  • estimating the cause of death pattern for a country for which information is not available but is available for other countries in the same region.

All these applications are based on the assumption that patterns of deviation from the cause compositions predicted by the model will have some stability across time and place, for example, if young adults in Canada tend to have a low proportion of Group I deaths and a high proportion of Group II deaths in one year given the levels of all-cause mortality and income in that year, a reasonable assumption would be that the next year's composition will be similarly low in Group I and high in Group II given that year's total income and mortality. This hypothesis builds on the notion that all-cause mortality and income per capita explain only some of the variation in cause of death patterns, while the other sources of this variation are unmeasured but are assumed to be relatively stable. In other words, the cause of death pattern in Canada differs from what we would predict based only on total mortality and income because other factors influence the pattern. We assume that these other factors will change gradually over time, which would imply that the deviation from the prediction should also move gradually.

Using similar arguments, Salomon and Murray (2001a) suggested that it may be possible to use patterns of deviation from one country to predict cause of death patterns in another country in the same demographic region. They demonstrated an example of this for mortality data from Chile and Mexico for women aged 35 to 39 for 1965-94. They estimated the percentiles at which the observed cause fractions for the two countries fell in the probability distribution of predicted fractions produced by the Monte Carlo simulations conditional on the mortality and income levels in those years for each country and found similarities in the deviation patterns. Overall, this example suggested that deviation patterns in groups of similar countries may be similar, allowing predictions of cause of death patterns in countries where registration data are not available but for which neighboring countries do have data.

The application of this method has been formalized in a simple spreadsheet program called CodMod ( Salomon and Murray 2001a ). The program incorporates the regression models described earlier and uses Monte Carlo simulation methods to generate probability distributions around predicted cause of death patterns conditional on values for all-cause mortality and income per capita. CodMod allows two main operations: (a) analysis of deviations in observed cause of death patterns given levels of mortality and income, and (b) predictions of cause of death patterns conditional on a reference pattern of deviation and levels of mortality and income.

Thus, for example, if the VR system covers only one region in a country, CodMod may be used to examine the pattern of deviation in that region from the predicted cause of death pattern at local income and total mortality levels. We assume that a similar pattern of deviation will hold in the nonregistration areas of the country, then we can use information on total mortality levels and income in the nonregistration areas to predict cause of death patterns in these areas. The GBD 2001 used CodMod for countries with incomplete death registration data to adjust for biases in cause composition. Annex table 3A.3 lists countries for which such adjustments were carried out.

CodMod was also used to develop regional patterns of deviation from predicted cause compositions, which were then used to estimate mortality by broad causes for countries for which no registration data were available. Annex table 3A.3 summarizes details of these regional models. In the case of the Sub-Saharan Africa region, where good VR data were available for only three countries, a regional pattern of specific causes of deaths was based on VR data from urban and rural South Africa. For the Middle East and North Africa, a similar pattern was built for the Gulf states based on the four latest years of data from Bahrain and Kuwait. For other countries in that region, regional models were based on weighted death rates using Egyptian and Iranian VR data. The weights used were determined by the income levels of the individual countries and overall death rates. For the Pacific islands, a regional pattern was based on data available from islands reporting death registration data.

Whereas the original GBD study used a more detailed cause of death model for 12 causes of death to estimate deaths below the broad group level for countries without death registration data, the increased availability of death registration data in most regions has enabled us to use detailed proportional cause distributions within Groups I, II, and III based on death registration data from within each region (see annex table 3A.3 for more details). Specific causes were further adjusted on the basis of epidemiological evidence from registries, verbal autopsy studies, disease surveillance systems, and analyses from WHO technical programs as described earlier.

Chapter Sections

Tables