A Bayesian approach for estimating age-adjusted rates for low-prevalence diseases over space and time (2024)

  • Journal List
  • HHS Author Manuscripts
  • PMC9575652

As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsem*nt of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more: PMC Disclaimer | PMC Copyright Notice

A Bayesian approach for estimating age-adjusted rates for low-prevalence diseases over space and time (1)

Link to Publisher's site

Stat Med. Author manuscript; available in PMC 2022 Oct 17.

Published in final edited form as:

Stat Med. 2021 May 30; 40(12): 2922–2938.

Published online 2021 Mar 16. doi:10.1002/sim.8948

PMCID: PMC9575652

NIHMSID: NIHMS1828780

PMID: 33728679

Melissa Jay,1 Jacob Oleson,1 Mary Charlton,2 and Ali Arab3

Author information Copyright and License information PMC Disclaimer

Associated Data

Supplementary Materials
Data Availability Statement

Abstract

Age-adjusted rates are frequently used by epidemiologists to compare disease incidence and mortality across populations. In small geographic regions, age-adjusted rates computed directly from the data are subject to considerable variability and are generally unreliable. Therefore, we desire an approach that accounts for the excessive number of zero counts in disease mapping datasets, which are naturally present for low-prevalence diseases and are further innated when stratifying by age group. Bayesian modeling approaches are naturally suited to employ spatial and temporal smoothing to produce more stable estimates of age-adjusted rates for small areas. We propose a Bayesian hierarchical spatio-temporal hurdle model for counts and demonstrate how age-adjusted rates can be estimated from the hurdle model. We perform a simulation study to evaluate the performance of the proposed model vs a traditional Poisson model on datasets with varying characteristics. The approach is illustrated using two applications to cancer mortality at the county level.

Keywords: cancer, disease mapping, hurdle, risk, standardization

1 |. INTRODUCTION

Disease mapping is an invaluable tool in spatial epidemiology used to assess geographic patterns in incidence or mortality of a given disease.1,2 Trustworthy rates can help health departments efficiently allocate resources and promote prevention or intervention efforts in the regions that need them most.13 They can also be used to examine trends in incidence or mortality over time. When interest lies in comparing incidence or mortality rates among fine areal units, such as counties, ZIP codes, or census tracts, rates derived from raw data will be subject to high variability. This is especially true for low-prevalence diseases, in which there are a small number of cases in a given region, low population sizes in certain regions, or both.1 Bayesian hierarchical modeling approaches can be used to create reliable estimates of disease risk on a fine areal scale.

Numerous studies have estimated disease-specific standardized mortality/morbidity ratios (SMR) or relative risks across fine spatial units using a Bayesian hierarchical modeling framework, but fewer studies have estimated rates.1,47 Rates are an important metric used by epidemiologists and public health professionals to directly compare the health of multiple populations of interest. A crude rate is defined as the number of cases divided by the population at risk, multiplied by some number to compare values against, typically 100 000 for cancer rates. While a crude rate can be a meaningful measure of which regions have high death counts for a particular disease, the rate is often highly influenced by the underlying age distribution of the population. Alternatively, age-adjusted rates provide a fairer comparison of disease risk when the disease is highly influenced by the underlying age distribution.8,9

To compute age-adjusted rates, one approach is the direct standardization method. The direct standardization method assumes each region has the same underlying population distribution as a selected standard population. To apply direct standardization, a crude rate is calculated separately for each age group. An overall age-adjusted rate is then computed by taking a weighted average of the age-specific rates based on the population proportions from the standard population.9 The use of direct standardization allows for a fair comparison of rates between regions and time periods, as long as the same standard population is used throughout.8,9 In application, researchers frequently compute the age-adjusted rates first, and then assume the rates are fixed and normally distributed for analysis. This approach comes with many downsides. It does not allow for individual level data, it gives a sample of size one to each areal unit, and it ignores variation in the calculation of the rate.

Further complicating the analysis for low-prevalence diseases, the number of incident cases or deaths in each county, ZIP code, or census tract can be low, which results in a large proportion of zero counts in the dataset. The number of zeros is further inflated when stratifying counts in each small region by age group, which is required when calculating age-adjusted rates. It is important to note that there are often far fewer zeros in datasets that are used to calculate SMRs as compared with age-adjusted rates. This is because modeling SMRs requires an upfront calculation of the expected count per county and year prior to the modeling step due to the indirect standardization procedure that is employed. The additional zero-inflation that results from age group stratification is therefore unique to modeling age-adjusted rates.

In the presence of excess zero counts, common models like the Poisson regression model may fail to adequately fit the data, predicting fewer zeros than observed in the dataset.6 For this scenario where excess zeros are expected, models have been developed to appropriately analyze count data that contain excess zeros. Two popular approaches include hurdle models and zero-inflated models such as the Poisson hurdle model and the zero-inflated Poisson (ZIP) model.10,11

Both the Poisson hurdle and ZIP models are mixture models. The Poisson hurdle model is a mixture of a point mass at zero and a zero-truncated Poisson distribution, which generates only positive counts. The hurdle model includes two stages. First, the probability that a count is nonzero vs zero is modeled via a Bernoulli distribution. Next, the nonzero count values are modeled via a zero-truncated Poisson distribution. Under the hurdle model, it is assumed that each individual in the study population is “at risk” of the outcome. On the other hand, the ZIP model is a mixture of a point mass at zero and a Poisson distribution, which also generates zeros. This model distinguishes between those who are at risk and those who are not. Thus, the point mass at zero refers to the population not at risk, while the Poisson zeros can be interpreted as zeros occurring from the at risk population.10,11 In this work, we use the Poisson hurdle model, since the assumption that everyone in the population is at risk is most reasonable in the disease mapping settings that we have encountered, and in our illustrating examples.

Two studies that estimate rates using zero-excess Bayesian hierarchical models include a study on maternal mortality and a study on chronic obstructive pulmonary disease (COPD).12,13 Loquiha et al12 used ZIP and zero-inflated negative binomial models to estimate facility-based maternal mortality rates in Mozambique. They accounted for the large number of zeros in their dataset and captured both spatial correlation and nonspatial heterogeneity in their models. However, these rates were crude rates and not age-adjusted rates, so they did not encounter additional zero-inflation due to age group stratification. Nandram et al13 modeled age-specific and age-adjusted COPD mortality rates for white males in the United States. They used three different approaches for estimating age-adjusted rates for each health service area. Each of their models estimated separate age group-specific mortality rates and computed age-adjusted rates via direct standardization. The authors acknowledged the high proportion of zero counts, particularly in younger age groups, and one of their approaches estimated rates for the younger age groups using a ZIP model without spatial random effects. Neither of these studies used a hurdle model to handle the excess zeros present in their respective datasets. Furthermore, both studies focused on mortality rates for one time point and not a longitudinal series of time points.

Another study motivating this work is a study on cancer mortality in the United States. Mokdad et al3 created estimates of age-adjusted cancer mortality rates for each county in the United States. They used a Bayesian hierarchical Poisson regression model with space and time random effects to create annual small area estimates for 29 cancer types over 35 years. They, too, modeled mortality rates for each age group separately, and obtained age-adjusted rates using direct standardization. Since death counts were stratified by county, year, age group, and sex in the modeling process, they likely encountered a large proportion of zeros in their dataset for certain cancer types. Because we are working with cancer data from the same source, we propose our method as an alternative approach for estimating age-adjusted cancer mortality rates at the county-level.

In this article, we propose a novel Bayesian hierarchical hurdle model framework to create small area estimates of age-adjusted rates for low-prevalence diseases. To the best of our knowledge, this is the first article to use a Bayesian hierarchical hurdle model to estimate age-adjusted rates from spatio-temporal datasets with excess zero counts. While ZIP models have been used for this purpose,13 their assumptions about two zero-generating processes are not appropriate for all diseases. Our proposed methodology is motivated by the need for reliable estimates of age-adjusted cancer mortality rates in rural areas.

Our modeling framework accounts for an excess of zeros in the dataset that are present in part due to the low prevalence of the disease on the spatial scale of interest and in part due to the age group stratification that occurs when modeling the age-adjusted rates. We develop the calculation of the age-adjusted rate that needs to be modified to account for the mixture model. We also incorporate spatial and temporal smoothing, allowing estimates to borrow strength from nearby regions and prior time points.

The remainder of this article is organized as follows. In Section 2, we describe our modeling approach in detail. In Section 3, we present the results of a simulation study. In Section 4, we use this framework to estimate county-level cancer mortality rates, and in Section 5, we discuss the advantages and limitations of our modeling framework.

2 |. PROPOSED APPROACH FOR ESTIMATING AGE-ADJUSTED RATES

2.1 |. Model

We propose a Bayesian hierarchical Poisson hurdle model for estimating age-adjusted rates for each spatial unit and time point. Without loss of generality, we will refer to each rate as a “mortality rate,” each spatial unit as a “county,” and each time point as a “year.” This modeling framework accounts for an excess of zeros, which are present in part due to the rarity of the disease at the county-level and in part due to age group stratification in the modeling process. The proposed model accounts for both spatial correlation between neighboring counties and temporal correlation between years in the dataset, to produce more reliable estimates.

2.1.1 |. Likelihood

Let Yi,j,k denote the raw death count for county i during year j for age group k, and assume that there are I counties, J years, and K age groups in the dataset. The Poisson hurdle model incorporates two components: a Bernoulli distribution to model the probability of at least one death for a given county, year, and age group, and a zero-truncated Poisson distribution to generate a death count, given a county has at least one death for the age group and year of interest. Thus, the data are modeled in two steps:

P(Yi,j,k=yi,j,kπi,j,k,θi,j,k,ni,j,k)={1πi,j,k,yi,j,k=0πi,j,kθi,j,kyi,j,kexp{θi,j,k}yi,j,k!(1exp{θi,j,k}),yi,j,k>0,

(1)

where 0<πi,j,k<1 and θi,j,k>0. We allow each age group in a given county and year to have its own πi,j,k, which is the probability of at least one death. We choose to model πi,j,k, rather than use the same probability throughout, since death from certain diseases is highly dependent on age. Furthermore, Corpas-Burgos et al14 found that using a common π for every county in a hurdle model can result in inflated risk estimates for smaller counties and deflated risk estimates for larger counties, and we wish to minimize this bias. We also include county, age, and year-specific θi,j,k’s, which represent the mean counts from a Poisson distribution.

On the left-hand side of the regression equation, a transformation of πi,j,k is used. We denote this transformation g(πi,j,k) and suggest using either a complementary log-log or logit link function, depending on the dataset. The right-hand side of the regression equation includes a fixed effect for each age group, a fixed effect for the interaction between age group and log of the population size, a random effect for space, and a random effect for time:

g(πi,j,k)=xkTα1+log(ni,j,k)xkTα2+γ1,i+δ1,j.

(2)

Here, xk is a K × 1 vector of age group indicators, ni,j,k is the population size for age group k in county i during year j, γ1,i is a spatial random effect term for county i, and δ1,j is a temporal random effect term for year j. Details on the spatial and temporal random effects are provided in Section 2.1.2. The parameter vectors, α1 and α2, represent the coefficients for the fixed age effects and the interaction terms between age group and log of the population size, respectively. They are parametrized using a reference age group (k*) as the intercept and an additional effect for each of the remaining age groups. For example, if k* = 1 and k = 2, then x2 = (1, 1, 0, …, 0). An intercept term and a main effect for the log of the population size is embedded in these products. We will let α be the 2K × 1 vector containing both α1 and α2.

It is important to adjust for population size in stage one of the hurdle model in disease mapping applications. While there is a clear way to account for population size in a Poisson or truncated Poisson model, by incorporating a population offset term, there is no analogous way to account for population size when modeling the probability of a nonzero count. In many applications, it is reasonable to assume that the probability of a nonzero count is related, at least in part, to population size. Furthermore, for many diseases, a change in population size in older age groups has a larger impact on the probability of a nonzero count than the same change in population size in younger age groups, since individual risk is higher among older individuals. We incorporate the log of the population size as a covariate, following the work of Corpas-Burgos et al.14 They discuss considerations for using the hurdle model for estimating SMR, and suggest incorporating a similar adjustment in the Bernoulli regression as one way of adjusting for population size. We further extend this idea to include both a main effect and an interaction term between the log of the population size and age group category, to account for the difference in effect of an increase in population size across age groups.

In the second stage of the hurdle model, each θi,j,k is estimated with a log-linear regression model. The model includes a set of coefficients (β) for the age group fixed effects, a spatial random effect (γ2,i), a temporal random effect (δ2,j), and an uncorrelated heterogeneity term (ϵi,j). The uncorrelated heterogeneity term is added to account for potential overdispersion in the count data but can be omitted if this is not a concern in the dataset being analyzed. The log-linear regression model also accounts for population size in each specific age group, county, and year as an offset term, since the number of deaths is related to the size of the at-risk population. This term is included as log(ni,j,k) and θi,j,k is therefore:

log(θi,j,k)=log(ni,j,k)+xkTβ+γ2,i+δ2,j+ϵi,j.

(3)

The coefficients of the fixed effects are once again parametrized by using the reference age group. Details on the random effects in this stage of the hurdle model are provided in Section 2.1.2. Because we are modeling age-adjusted rates, θi,j,k depends on the population size, rather than an expected count.

The Bayesian hierarchical modeling framework conveniently allows us to assume the Yi,j,k’s are conditionally independent within the likelihood level of the model.10 The full likelihood can therefore be expressed by the product:

𝓛(π,θy,n,χ)=i=1Ij=1Jk=1KP(Yi,j,k=yi,j,kπi,j,k,θi,j,k,ni,j,k,xk).

2.1.2 |. Priors

A noninformative or weakly informative prior is utilized for each parameter in the proposed model so that the posterior distribution is largely shaped by the data. We assign independent diffuse normal priors to the coefficients of the fixed effects. Thus, α ~ Normal(0, 0.001 * I) and β ~ Normal(0, 0.001 * I), where the normal distributions are parametrized in terms of the precision.

To allow for spatial smoothing, we apply intrinsic conditional autoregressive (ICAR) priors to the spatial random effect terms in each stage of the hurdle model. We use the ICAR prior, as it is commonly employed in areal data settings and has the benefit of a convenient interpretation.15 The ICAR prior leverages the spatial correlation between each county and its neighbors. Let γ1 and γ2 be the vectors of spatial random effects in stage one and stage two of the hurdle model, respectively, where γ1=(γ1,1,γ1,2,,γ1,I) and γ2=(γ2,1,γ2,2,,γ2,I). The ICAR priors are specified as:

γ1τγ1MVN(0,τγ1(DW))

γ2τγ2MVN(0,τγ2(DW))

where τγ1 and τγ2 are overall precision terms and D = (di,i) is a diagonal matrix with di,i equal to the number of counties that neighbor county i has. W = (wi,h) is a sparse matrix of 0’s and 1’s, where wi,h = 1 if counties i and h are neighbors and wi,h = 0 if counties i and h are not neighbors, or if i = h.15

It is important to note that the ICAR prior is improper, since DW is singular. However, the prior generally does not lead to posterior impropriety and provides a convenient conditional interpretation:

γ1,i{γ1,h:hi},τγ1Normal(1di,ih=1Iwi,hγ1,h,di,iτγ1)

for i=1,,I where each spatial random effect for county i is the average of the corresponding spatial random effects of all neighboring counties. The prior distribution for γ2τγ2 can also be expressed using the conditional form and provides the equivalent conditional interpretation for stage two of the hurdle model.

We specify weakly informative prior distributions on the overall precision terms, τγ1 and τγ2. We utilize a Half-Cauchy prior on the overall SDs, so that 1τγ1 Half-Cauchy(10) and 1τγ2 Half-Cauchy(10), where 10 represents the scale parameter. The Half-Cauchy distribution has a gradual slope, and setting the scale parameter to 10 ensures that there is a probability of 0.76 that the overall SDs are below 25. This distribution aligns with our prior beliefs while allowing the data to primarily drive the posterior inference.

The temporal random effects in Equations (2) and (3) capture the correlation between each year’s counts and the prior year’s counts using autoregressive 1 (AR1) priors and allow for temporal smoothing. Conditional on τδ1 and ρ1, the first AR1 prior is specified as:

δ1,0Normal(0,τδ1(1ρ12))

δ1,j=ρ1δ1,j1+κ1,jforj=2,,J

where κ1,j ~ Normal(0, τδ1). Conditional on τδ2 and ρ2, the second AR1 prior is specified as:

δ2,0Normal(0,τδ2(1ρ22))

δ2,j=ρ2δ2,j1+κ2,jforj=2,,J

where κ2,j ~ Normal(0, τδ2). The correlation terms are assumed to follow Uniform(−1, 1) priors to reflect the lack of prior knowledge on the strength of the temporal correlation. We assign prior distributions to the marginal precisions of the temporal random effects, τδ1(1ρ12) and τδ2(1ρ22), since the software package we use internally parametrizes the AR1 model in terms of the marginal precision.16 As with the spatial random effects, we expect that the marginal SDs of the temporal random effects are well below 25. Furthermore, since there are often a limited number of years in temporal datasets, the use of a weakly informative prior, rather than a completely uninformative uniform prior, may allow for a narrower marginal posterior distribution.17 Thus, we let 1τδ1(1ρ12) Half-Cauchy(10) and 1τδ2(1ρ22) Half-Cauchy(10).

Finally, the prior distribution for the uncorrelated heterogeneity terms is ϵi,j ~ Normal(0, τϵ). Once again, we use a weakly informative Half-Cauchy(10) prior on the SD of these random effects. See Gelman17 for a detailed discussion of the use of weakly informative priors, such as the Half-Cauchy prior, on the SD parameters in Bayesian hierarchical models.

We also perform a sensitivity analysis to assess the effect of different choices of priors on both the posterior distribution and the estimated rates in the data examples in Section 4. Specifically, we change the prior distributions for the precision parameters τγ1,τγ2,τδ1(1ρ12),τδ2(1ρ22) and τϵ. We run four additional hurdle models where (1) all of the SD parameters are assigned Half-Cauchy(100) priors, (2) all of the SD parameters are assigned improper flat priors, (3) all of the precision parameters are assigned Gamma(0.01, 0.01) priors, and (4) all of the precision parameters are assigned Gamma(0.001, 0.001) priors.

2.1.3 |. Age-adjusted rates

In this section, we derive the expression used to compute age-adjusted rates from the hurdle model output. This expression combines estimates of πi,j,k from stage one of the hurdle model with estimates of θi,j,k from stage two. We calculate the expected count for each county, year, and age group, divide by the corresponding population sizes, and multiply by 100 000 to get K rates per county and year. We then take a linear combination of the age group-specific rates using the standard population weights to arrive at an age-adjusted rate.

We first derive the expectation of Yi,j,k.P(Yi,j,k=yi,j,kπi,j,k,θi,j,k,ni,j,k,xk), which we will refer to as P(Yi,j,k=yi,j,k) moving forward, can be expressed as the product of two probability mass functions with corresponding indicator variables. Thus,

P(Yi,j,k=yi,j,k)=(1πi,j,k)1(yi,j,k=0)[πi,j,kθi,j,kyi,j,kexp{θi,j,k}yi,j,k!(1exp{θi,j,k})]1(yi,j,k>0).

By definition, the expectation of Yi,j,k is E(Yi,j,k)=yi,j,k=0yi,j,kP(Yi,j,k=yi,j,k) which simplifies to πi,j,k(1exp{θi,j,k})θi,j,k via the following derivation:

E(Yi,j,k)=yi,j,k=0yi,j,k(1πi,j,k)1(yi,j,k=0)[πi,j,kθi,j,kyi,j,kexp{θi,j,k}yi,j,k!(1exp{θi,j,k})]1(yi,j,k>0)=0+yi,j,k=1yi,j,kπi,j,kθi,j,kyi,j,kexp{θi,j,k}yi,j,k!(1exp{θi,j,k})=yi,j,k=0yi,j,kπi,j,kθi,j,kyi,j,kexp{θi,j,k}yi,j,k!(1exp{θi,j,k})=πi,j,k(1exp{θi,j,k})yi,j,k=0yi,j,kθi,j,kyi,j,kexp{θi,j,k}yi,j,k!E(Yi,j,k)=πi,j,k(1exp{θi,j,k})θi,j,k.

In the final step, the sum simplifies to θi,j,k since it is the expected value of a Poisson random variable with parameter θi,j,k. Note that it is sufficient to compute each expectation marginally rather than jointly, since each Yi,j,k is conditionally independent at the likelihood level of the Bayesian hierarchical model.

E(Yi,j,k) under the hurdle model formulation is the ratio of the probabilities of nonzero counts under the Bernoulli process and the Poisson process multiplied by the expected count under the Poisson process. When πi,j,k<1exp{θi,j,k}, which often occurs when there are excess zeros in a dataset, the expected count is lower than the expected count from the standard Poisson regression model. While this is our primary scenario of interest, it should be noted that the hurdle model can also be applied in situations with underdispersion or fewer zero counts than would be fit by the Poisson model.

The age group-specific rate for county i during year j is calculated by dividing E(Yi,j,k) by ni,j,k and then multiplying by 100 000 individuals:

Ri,j,k=(πi,j,kθi,j,k1exp{θi,j,k}/ni,j,k)100 000.

Finally, an overall age-adjusted rate is calculated by combining the Ri,j,k’s across age groups via direct standardization. Given a vector w of age group weights from a standard population, Ri,j=k=1KwkRi,j,k.

2.2 |. Model implementation

To implement our modeling framework, we recommend using the INLA package in R.16,18 INLA performs approximate Bayesian inference via integrated nested Laplace approximations.16,19 Given the complexity of the proposed modeling framework and the computationally intensive task of estimating I × J × K individual rates, INLA greatly reduces computation time compared with software packages based on Markov Chain Monte Carlo techniques.16,18

We run the hurdle model using two separate model fits for stage one and stage two. To calculate rates, we draw 1000 posterior samples of the linear predictors from each model fit, to obtain 1000 samples of each πi,j,k and 1000 samples of each θi,j,k. We compute a rate for each county, year, and age group using each sample and treat the mean of these rates as the corresponding point estimate. These rates are then combined into an overall age-adjusted rate for each county at each year using the 2010 United States standard population weights. The complete R code for implementing our proposed approach using INLA is available in the supplementary materials.

3 |. SIMULATION STUDY

We perform a simulation study to assess the performance of the proposed hurdle model on simulated data sets with varying characteristics. In this simulation study, we have two primary objectives. First, we aim to assess the overall improvement in fit between the hurdle model and a simple Poisson model when fit to datasets that are generated from a hurdle model. We use a Poisson model for comparison because prior studies have used Bayesian hierarchical Poisson regression models to compute age-adjusted rates.3,13 In particular, Mokdad et al3 used a Poisson model to estimate age-adjusted cancer mortality rates with the same dataset that we use in this work. We also aim to evaluate the performance of a hurdle model fit under model misspecification (ie, when the datasets are generated from a Poisson model).

3.1 |. Simulation set-up

We simulate count data under 12 scenarios. In each scenario, 100 datasets are generated from an underlying model of interest. The neighborhood structure and population sizes from 44 counties in southern Minnesota from 2014 to 2017 are used to simulate each dataset. These counties were selected because they include a mix of rural counties and large metropolitan areas, with ni,j,k values ranging from about 500 to nearly 700 000. Annual population counts were obtained from the Surveillance, Epidemiology, and End Results Program (SEER) website. Individuals are classified into one of six age groups: <40, 40 to 49, 50 to 59, 60 to 69, 70 to 79, and 80+. Thus, I = 44, J = 4, and K = 6.

The following simple hurdle model, which uses the likelihood described in Equation (1) and the priors detailed in Section 2.1.2, is fit to each dataset:

logit(πi,j,k)=xkTα+γ1,i+δ1,j

log(θi,j,k)=log(ni,j,k)+xkTβ+γ2,i+δ2,j+ϵi,j

This model is similar to the hurdle model proposed in Section 2 but excludes the interaction terms in stage one of the hurdle model for simplicity and specifies a logit link function. The following Poisson regression model, with the priors described in Section 2.1.2, is also fit to each dataset:

Yi,j,kPoisson(θi,j,k)

log(θi,j,k)=log(ni,j,k)+xkTβ+γi+δj+ϵi,j

After fitting both models on each dataset, the performance metric ΔDIC=DIChDICp is computed, where DICh and DICp correspond to the deviance information criterion20 (DIC) values from the hurdle model fit and the Poisson model fit, respectively.

3.1.1 |. Scenarios 1 to 9

In scenarios 1 to 9, the true data generating process is the simple hurdle model detailed in Section 3.1. The spatial random effect terms are generated from proper conditional autoregressive models with τγ1 = 100 and τγ2 = 100 and correlation terms ργ1 = 0.9 and ργ2 = 0.9, to approximate the ICAR model. The temporal random effects are generated according to AR1 models with precision terms τδ1 = 100 and τδ2 = 100 and correlation terms ρ1 = 0.4 and ρ2 = 0.4. Independent normal distributions are used to generate the uncorrelated heterogeneity terms, with τϵ = 100.

The coefficients for the age group fixed effects, α and β, change across scenarios to evaluate the differences in model performance when values for π and θ vary. A display of the possible parameter combinations is provided in Figure 1. In the left plot, πk is the average probability of a nonzero count for age group k, computed by setting all of the random effects equal to 0. In the right plot, θk controls the nonzero count distribution and is plotted for each age group by setting the random effects equal to 0 and setting the population size to 100 000. Note that across all possible simulations, πk and θk are nondecreasing functions of age, since many diseases exhibit this pattern.

Open in a separate window

FIGURE 1

Display of parameter value combinations for scenarios 1 to 9. The population size is fixed at 100 000 to compute θk

There are three possible trends for πk: low values, medium values, and high values across all age groups. These trends were designed to evaluate how the proportion of zeros in the generated datasets affect ΔDIC. These choices for πk result in simulated datasets that have proportions of zeros in the neighborhood of 0.4, 0.6, and 0.8. The three trends for θk were designed to examine whether the shape and variation in the nonzero count distribution affects the difference in model fit between the hurdle and Poisson models. In addition to the low and high curves for θk, a high (modified) curve is included. This curve has the same values as the high curve for age groups 3 to 6 but generates larger counts for age groups 1 and 2 than the high curve generates. While the younger age groups often have very low rates of certain diseases, we hypothesized that model fitting errors might occur if there is no variation or very little variation in unique nonzero count values for at least one of the two lowest age groups. We perform nine simulations to account for all combinations of πk and θk.

3.1.2 |. Scenarios 10 to 12

Scenarios 10 to 12 assume the data generating process is the Poisson model described in Section 3.1. The random effects are generated using the process detailed for scenarios 1 to 9. We set τγ = 100 and ργ = 0.9 to generate the spatial random effects, τδ = 100 and ρδ = 0.4 to generate the temporal random effects, and τϵ = 100 to generate the uncorrelated heterogeneity terms. We vary β in each of the three scenarios to assess the model fits on datasets with low, medium, and high values of θk. These sets of parameter values were selected to produce a similar proportion of zeros in each dataset to the proportion of zeros produced in scenarios 1 to 9. See Figure 2 for a display of the three sets of parameter values. We fit the hurdle model and the Poisson model to each generated dataset.

Open in a separate window

FIGURE 2

Display of parameter values for scenarios 10 to 12. The population size is fixed at 100 000 to compute θk

3.2 |. Simulation results

The results of scenarios 1 to 9 and 10 to 12 are summarized in Tables 1 and ​and2,2, respectively. The tables display the mean proportion of zeros across datasets, the mean ΔDIC values, and the SDs of ΔDIC. They also include a column documenting the number of simulated datasets (out of 100) for which there were model fitting problems. Model fitting problems were identified if a model produced a DIC value that was infinite, near infinite, or undefined. Any datasets with a model fitting problem were excluded from the calculations of the mean and SD of ΔDIC.

TABLE 1

Simulation results for Scenarios 1 to 9 in which datasets were generated under a hurdle model

ScenarioθkπkProportion zeros (Mean)ΔDIC (Mean)ΔDIC (SD)Model fitting problems
1LowHigh0.413−685820
2LowMedium0.587−657910
3LowLow0.831−48711213
4HighHigh0.417−10931570
5HighMedium0.591−13031630
6HighLow0.832−11841846
7High (modified)High0.415−19542870
8High (modified)Medium0.590−19753130
9High (modified)Low0.832−15403021

Open in a separate window

Note: 100 datasets were generated for each scenario.

TABLE 2

Simulation results for Scenarios 10 to 12 in which datasets are generated under a Poisson model

ScenarioθkProportion zeros (Mean)ΔDIC (Mean)ΔDIC (SD)Model fitting problems
10High0.44045100
11Medium0.6384290
12Low0.78334911

Open in a separate window

Note: 100 datasets are generated for each scenario.

Abbreviation: DIC, deviance information criterion.

In scenarios 1 to 9, the hurdle model largely outperformed the Poisson model with average ΔDIC values ranging from −487 to −1975. Scenario 8, which generated datasets with the highest nonzero counts and a mean proportion of zeros of 0.590, showed the strongest preference for the hurdle model compared with the Poisson model with a ΔDIC value of −1975. In this setting, the datasets were comprised of primarily zero counts mixed with the highest nonzero count values that were generated. The divide between the zero counts and the nonzero count values was large, resulting in a highly inadequate Poisson model fit. The scenario in which datasets were generated with the largest proportion of zeros and the lowest nonzero counts (scenario 3), had the smallest difference in DIC between the hurdle and Poisson model fits. Intuitively, we would expect the Poisson fit to perform least badly compared with the hurdle fit in the setting in which all of the count values (both zeros and nonzeros) were close together, so this result seems reasonable.

Interestingly, when the data were hurdle-generated, changing the proportion of zeros with πk had less of an effect on ΔDIC than changing the nonzero count distribution. The distribution of nonzero counts, controlled by the θk values, had a large impact on the mean ΔDIC. In Table 1, there is a clear difference in the mean ΔDIC values for scenarios with the low, high, and high (modified) θk values. Scenarios 1 to 3, which generated datasets with low nonzero values, had mean ΔDIC values that were smaller in magnitude than the mean ΔDIC values for either of the higher θk curves. By contrast, the mean ΔDIC values for scenarios 7 to 9, which generated the highest nonzero count values, were −1540 or lower. Note that for higher θk values, the variability of ΔDIC also increased. These findings suggest that the hurdle model has the largest improvements in DIC over the Poisson model when average θk values are high for each of the age groups.

As hypothesized, model fitting problems occurred more frequently for scenarios involving low or high θk values, compared with the high (modified) setting. Only one model fitting problem occurred when generating datasets with high (modified) θk values (scenario 9), whereas more than five model fitting problems occurred in scenarios 3 and 6, which had low and high θk values, respectively. These problems occurred due to at least one of the two lowest age groups having no variability in nonzero counts in a simulated dataset. This demonstrates that the proposed hurdle model is not suitable when there is no variability in the nonzero count values for one or more of the age groups.

In scenarios 10 to 12, we see that the Poisson model outperformed the hurdle model in all three cases, with average ΔDIC values ranging from 34 to 45. Since the underlying data generating process was a Poisson model, both zero and nonzero counts were generated using the θk values. Thus, the simulation scenario with high values of θk was also the scenario in which the simulated datasets had the lowest proportion of zeros. Given that hurdle models were designed to account for zero inflation, it makes sense that the hurdle model fit performed more adequately as the proportion of zeros increased. The model fitting problems once again occurred when low nonzero count values were generated, due to difficulties fitting stage two of the hurdle model.

The magnitude of ΔDIC was much larger in scenarios 1 to 9 compared with scenarios 10 to 12. This pattern suggests that in the disease mapping context presented in this article, fitting a Poisson model to hurdle-generated data may have larger consequences than fitting a hurdle model to Poisson-generated data, in terms of DIC.

4 |. APPLICATION TO COUNTY-LEVEL CANCER MORTALITY

We next illustrate our approach by estimating age-adjusted mortality rates for liver cancer and colorectal cancer. In this illustration, we use county-level data from three states in the Midwest region of the United States: Iowa, Minnesota, and Wisconsin. These states were chosen because they are predominantly rural, with several large metropolitan areas. Cancer mortality datasets for this region therefore include a large proportion of zero counts, coupled with unique nonzero count values. The cancer mortality data come from the National Center for Health Statistics vital statistics data files21 and the population counts come from the SEER website.

In each example, we compute annual age-adjusted rate estimates for 258 counties between the years of 2000 and 2017. Death counts and population sizes are grouped according to the following 11 age categories: < 40, 40 to 44, 45 to 49, 50 to 54, 55 to 59, 60 to 64, 65 to 69, 70 to 74, 75 to 79, 80 to 84, and 85+. Though each dataset includes more granular age groupings for individuals under 40 years old, we combine the counts and population sizes since there are very few cancer deaths in the individual age groups. Upon aggregating these counts, the <40 age group has a similar number of deaths to the 40 to 44 age group. Both the proposed hurdle model from Section 2 and the Poisson model described in Section 3.1 are fit to each dataset for comparison. We utilize a complementary log-log link function in stage one of the hurdle model fits, as it provides a better fit to these datasets compared with the logit link function. The model fits are compared using DIC and the Watanabe-Akaike information criterion (WAIC).22

The liver cancer and colorectal cancer datasets vary in the proportion of zero counts, the magnitude of the nonzero count values, and the trends in mortality over time. Liver cancer mortality is rare in this region, with 88% zero counts when stratifying the death counts by county, age group, and year. The nonzero counts have a mean value of 1.5 and a median value of 1. Each age group has between 3 and 16 unique nonzero count values. Furthermore, liver cancer mortality exhibits an increasing pattern over time. Colorectal cancer mortality has decreased substantially over time. It is more common in this region, with 63% zero counts in the dataset. The nonzero count values have a mean of 2.3 and a median of 1, and each age group has between 5 and 47 unique nonzero count values.

When fitting the hurdle and Poisson models to the liver cancer data, the hurdle model provided a more adequate fit to the data based on both DIC and WAIC. The DIC values for the hurdle and Poisson models were 35 846 and 35 861, respectively, resulting in a ΔDIC value of −16. The WAIC values were similar to the DIC values, and ΔWAIC was −15. See Table 3 for a full comparison of the two model fits based on DIC and WAIC.

TABLE 3

DIC and WAIC values for the hurdle and Poisson models fit to the liver cancer and colorectal cancer datasets

CancerModelDICΔDICWAICΔWAIC
LiverHurdle35 846−1635 855−15
Poisson35 86135 869
ColorectalHurdle89 0074189 01339
Poisson88 96688 975

Open in a separate window

Note: Note that DIC, WAIC, ΔDIC, and ΔWAIC values are rounded to the nearest whole number.

Abbreviations: DIC, deviance information criterion; WAIC, Watanabe-Akaike information criterion.

Maps of the age-adjusted rates estimated from the hurdle and Poisson models for the years 2000, 2010, and 2017 are provided in Figure 3. These rates ranged from about 1.5 to 6.9 deaths per 100 000 people. There was a strong linear association between the rates produced by the hurdle and Poisson models, and both sets of estimates captured the increasing pattern in death rates over time. However, the hurdle model tended to produce lower age-adjusted mortality rates than the Poisson model indicating that stage one of the hurdle model produced higher estimated probabilities of zero counts than did the Poisson model. Treating the data as zero-inflated appears to be reasonable and helpful when modeling this region’s liver cancer mortality rates.

Open in a separate window

FIGURE 3

Maps of age-adjusted liver cancer mortality rates estimated from the hurdle model and from the Poisson model

In contrast to the liver cancer results, the Poisson model outperformed the hurdle model when fit to the colorectal cancer dataset. The DIC values for the hurdle and Poisson models were 89 007 and 88 966, respectively, yielding a ΔDIC value of 41. The WAIC values were similar to the DIC values and ΔWAIC was 39, supporting the conclusion that the Poisson model provides a better fit to the data than the hurdle model in this example. See Table 3 for a full comparison of the performance metrics. We would therefore recommend selecting the Poisson model over the hurdle model in this setting. Interestingly, this difference in DIC values is very similar to the ΔDIC values from Scenarios 10 and 11 of the simulation study, when comparing the hurdle and Poisson model fits on datasets that were truly Poisson-generated. Based on these model fits, it is unlikely that the true generating process of the colorectal cancer data was the proposed hurdle model, despite the large number of zeros in the dataset.

Figure 4 displays maps of the estimated age-adjusted colorectal cancer mortality rates from each model for the years 2000, 2010, and 2017. Rate estimates ranged from about 10.0 to 29.2 deaths per 100 000 people in the population. The maps indicate a strong decline in colorectal cancer mortality from 2000 to 2017 with the rates in 2000 primarily falling into the highest two categories and the rates in 2017 primarily falling into the lowest two categories. Both models produced rate estimates that were strongly correlated and generally in agreement with respect to the counties and years that had the highest and lowest rate estimates. Some of the highest age-adjusted mortality rates occurred in counties in southern and central Iowa, Northern Minnesota, and Northern Wisconsin. When qualitatively comparing the rates from the two models, the estimated rates from the hurdle model tended to be higher than the estimated rates from the Poisson model.

Open in a separate window

FIGURE 4

Maps of age-adjusted colorectal cancer mortality rates estimated from the hurdle model and from the Poisson model

In addition to the rate estimates, we examined the variability in the age-adjusted rates produced by both the hurdle and Poisson models. In both examples, the hurdle model produced more precise estimates of age-adjusted rates than the Poisson model. Figure 5 displays the variances of the posterior samples drawn from the hurdle and Poisson models for each county and year’s age-adjusted rate. The left plot shows the liver cancer variances and the right plot shows the colorectal cancer variances. The majority of the Poisson variances exceeded the hurdle variances in both examples, as evidenced by the points falling above the one-to-one lines. Figures 6 and ​and77 map the variances of the age-adjusted rates for liver cancer and colorectal cancer, respectively. These maps once again illustrate the increased precision in age-adjusted rates when estimated from the hurdle model. They also indicate where age-adjusted rates were most precise, with the lowest variances generally occurring in highly populated counties and the highest variances generally occurring in more sparsely populated counties.

Open in a separate window

FIGURE 5

Scatter plots comparing the variances of the posterior samples of age-adjusted rates from the hurdle and Poisson models. The plot on the left displays the variances of the liver cancer age-adjusted rates and the plot on the right displays the variances of the colorectal cancer age-adjusted rates

Open in a separate window

FIGURE 6

Maps of the variances of the age-adjusted liver cancer mortality rates estimated from the hurdle model and from the Poisson model

Open in a separate window

FIGURE 7

Maps of the variances of the age-adjusted colorectal cancer mortality rates estimated from the hurdle model and from the Poisson model

These examples illustrate that the proposed hurdle model provides advantages over the Poisson model for computing age-adjusted rates for some cancers and geographic regions but that the best model fit is ultimately disease-dependent. We have evidence in favor of the hurdle model for computing liver cancer age-adjusted mortality rates in this region, which contained a very large proportion of zero counts along with some sizeable nonzero counts. However, the simpler Poisson model was a more appropriate choice for estimating colorectal cancer mortality rates in this region in terms of both DIC and WAIC. The assumption of two distinct data-generating processes (one for zeros and one for nonzero counts) is likely more realistic for estimating counts for certain cancer types and less realistic for others. In both cases, the hurdle model produced more precise estimates of the age-adjusted rates.

We also performed a sensitivity analysis to assess the effect of the choices of prior on age-adjusted rate estimates and posterior inference. See Table 4 for a summary of the results. The age-adjusted rates modeled using each set of priors described in Section 2.1.2 were nearly identical to those produced in our analysis. Table 4 displays the minimum and maximum difference in age-adjusted rates per 100 000 produced using the specified prior, compared with the priors in our analysis. The largest difference in rates occurred when applying the Gamma(0.01, 0.01) prior to the precision parameters. However, the minimum and maximum differences were on the magnitude of −0.29 and 0.26 per 100 000, and thus still aligned very closely with the rates produced in our analysis.

TABLE 4

Results of the sensitivity analysis

CancerPriorDifference in rate estimatesPosterior mean and 95% credible interval
MinimumMaximumτγ1τδ1(1ρ12)τγ2τδ2(1ρ22)τϵ
LiverHalf-Cauchy(10)NANA9.53 (5.65,15.40)141.33 (13.07, 530.43)10.41 (4.54, 21.00)43.51 (1.97,179.92)95.32 (26.22, 290.82)
Half-Cauchy(100)−0.030.029.56(5.62,15.31)137.45 (11.09, 535.67)10.50 (4.49, 20.91)39.25 (1.37,168.23)96.85 (26.15, 303.12)
Flat−0.010.029.55(5.62,15.33)128.93 (11.71,451.88)10.55 (4.46, 20.96)35.95 (1.00,152.43)89.35 (25.27, 260.67)
Gamma(0.01, 0.01)−0.030.029.77(5.78,15.88)98.65 (15.29, 320.57)11.19 (4.78, 22.27)39.31 (3.80,126.74)73.55 (25.12, 186.54)
Gamma(0.001, 0.001)−0.040.079.84(5.79,16.06)195.82 (47.44, 554.10)11.19 (4.84, 22.55)68.72 (7.17, 237.40)132.62 (29.74, 477.42)
ColorectalHalf-Cauchy(10)NANA33.09 (18.72, 55.84)42.98(2.46,188.81)21.47 (12.89, 32.94)13.67 (0.33, 62.57)552.84 (174.93, 1629.86)
Half-Cauchy(100)−0.030.0233.10 (18.78, 56.01)29.44 (0.94,122.25)21.19 (13.09, 33.01)8.09 (0.16, 32.41)569.30 (177.88, 1673.01)
Flat−0.050.0333.01 (18.80, 56.01)37.29 (1.44,163.50)21.37 (13.01, 33.00)11.35 (0.24, 48.77)550.82 (174.81, 1621.97)
Gamma(0.01, 0.01)−0.290.2633.57 (19.00, 56.57)39.28 (4.75,122.65)21.80 (13.43, 34.13)21.04 (1.85, 70.35)264.31 (135.60, 483.25)
Gamma(0.001, 0.001)−0.040.0734.12 (19.37, 60.14)49.32 (4.84,158.52)21.78 (13.40, 33.92)24.09 (2.17, 78.87)541.21 (180.75, 1485.15)

Open in a separate window

Note: The prior corresponds to the prior distribution applied to either the precision or SD parameters in the hurdle model. The difference in rate estimates correspond to the rates estimated in the analysis using the Half-Cauchy(10) priors minus the rates estimated using the listed prior distribution. In addition, the posterior mean and 95% credible intervals are displayed for the precision parameters.

Table 4 also displays the posterior mean and 95% credible interval for each precision parameter under each of the scenarios described in Section 2.1.2. The choice of prior had little effect on the estimates and credible intervals of the spatial precision parameters, yet a larger effect on the posterior distributions of the marginal precisions of the temporal random effects. This is likely due to the fact that there were only 18 unique times, whereas there were 258 unique regions. The use of a Gamma(0.01, 0.01) vs Gamma(0.001, 0.001) prior also affected the posterior inference on many of the precision parameters. This is not surprising, as Gelman17 acknowledges that posterior inference in a Bayesian hierarchical model can be highly sensitivity to the choice of small rate and shape values when using the Gamma distribution. We thus stand by our decision to use weakly informative rather than noninformative priors for the precision terms. While not shown, the marginal posterior distributions of the α and β coefficients were nearly identical, regardless of the choice of prior.

5 |. DISCUSSION

Age-adjusted rates are an important epidemiological tool for comparing disease incidence and mortality across populations of interest. For low-prevalence diseases, using raw counts to compute age-adjusted mortality rates for small geographic areas results in highly imprecise and unstable estimates. Several studies have implemented Bayesian hierarchical modeling approaches to estimate age-adjusted rates,3,13,23 though most studies have used these approaches to estimate SMR. We add to the body of literature by introducing a flexible modeling approach that addresses the additional zero-inflation that is created in the process of computing age-adjusted rates for small geographic regions using a hurdle model.

Our approach has several advantages over Bayesian approaches that have been previously introduced for computing age-adjusted rates over space and time. First, it accounts for the excess of zeros that are created in the process of computing age-adjusted rates. As evidenced by the liver cancer dataset, which was 88% zeros, the hurdle model provided an improved fit to the data over the Poisson model. We also incorporate a novel population adjustment in stage one of the hurdle model, since the probability of a nonzero count is related, at least in part, to population size and age group in the disease mapping setting. Finally, this approach assumes that there is one zero-generating process, which may be more appropriate than the assumption of two zero-generating processes for certain diseases.

The modeling approach we present is also highly flexible. We produce reliable rate estimates with age group as the only model covariate by employing spatio-temporal smoothing. However, the proposed model can also be implemented with additional county-level covariates, depending on a researcher’s modeling objectives. A set of covariates can easily be included to explain whether an age group within a county has a zero or nonzero count. Furthermore, a different set of covariates can be used, if desired, to explain the size of the nonzero counts in stage two. We intend to include county-level covariates in future work to see how these covariates impact the estimates of cancer mortality rates and the overall model fits that were illustrated in the application.

In addition to the strengths, our proposed modeling framework has some limitations. As demonstrated in the simulation study, model fitting problems arise if at least one age group exhibits no variability in nonzero count values. This occurs because a coefficient is estimated for each age group in stages one and two of the hurdle model. For very rare diseases or particularly small geographic areas, all of the nonzero counts for a given age group could be one. Both Mokdad et al3 and Nandram et el13 treated age group as a continuous variable when estimating age-adjusted rates, and a similar adaptation of our proposed hurdle model could be implemented in these cases to reduce the number of model parameters. A second limitation is the relative computational expense of the proposed approach. The hurdle model is more computationally expensive to fit than a simpler Poisson model since it is a two stage model. Though more computational resources might be required, the assumption of independence between the stages of the hurdle model does allow for a parallel implementation which could reduce model runtime.

While the primary objective of this research is to estimate and visually compare age-adjusted rates across space and time, it is often of interest to compare age-adjusted rates between specific (possibly nested) geographic regions using rate ratios. Jiang et al23 recently developed an approach to compare the age-adjusted rate for a county of interest to the overall age-adjusted rate for the entire study region. They use a Bayesian convolution model to obtain estimates of age-adjusted rate ratios and their associated credible intervals. Their method accounts for uncorrelated heterogeneity as well as spatially correlated heterogeneity but does not account for excess zero values using a hurdle model.

As noted in the application, the assumption of two distinct data-generating processes for zero and nonzero counts may be more realistic for certain cancers and less realistic for others. It may be more reasonable to assume the data generating process is separable for cancer types that are largely caused by environmental or lifestyle factors. Another promising application is the use of our approach for modeling age-adjusted rates of environmental diseases. Emphysema is an environmental disease that is primarily caused by exposure to airborne irritants including smoke, air pollution, and chemical fumes. Excess zero counts are likely to occur in regions with low levels of air pollution, low proportions of cigarette smokers, and regions with minimal occupational exposures, whereas nonzero count values might occur in regions with high levels of these external exposures. Separate generating processes for zero and nonzero counts could lead to an improved model fit over the standard Poisson model in the environmental disease application.

In conclusion, we present a new Bayesian modeling approach for estimating age-adjusted rates for small geographic areas in the presence of many zero counts. Our approach creates smoothed estimates using data points from neighboring counties and nearby years to improve estimate stability. We describe the calculation of age-adjusted rates using the hurdle model and provide INLA code for this method in the supplementary materials to facilitate the implementation of this approach. Reliable estimates of age-adjusted rates can aid public health professionals in community-level decisionmaking to promote tailored interventions.

Supplementary Material

supinfo1

Click here to view.(5.7K, r)

ACKNOWLEDGEMENTS

This research was funded by the National Institute of Environmental Health Sciences through the University of Iowa Environmental Health Sciences Research Center, NIEHS/NIH P30 ES005605 and by the National Science Foundation Graduate Research Fellowship Program under Grant No. 000390183. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. We thank the associate editor and referee for their constructive comments to improve the article.

Funding information

National Institute of Environmental Health Sciences through the University of Iowa Environmental Health Sciences Research Center, Grant/Award Number: NIEHS/NIH P30 ES005605; National Science Foundation Graduate Research Fellowship Program, Grant/Award Number: Grant No. 000390183

Footnotes

SUPPORTING INFORMATION

Additional supporting information may be found online in the Supporting Information section at the end of this article.

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available from the National Center for Health Statistics. Restrictions apply to the availability of these data. Access to the restricted-use “Detailed Mortality - All Counties” dataset may be requested at the following url: https://www.cdc.gov/nchs/nvss/nvss-restricted-data.htm.

REFERENCES

1. Elliott P, Wartenberg D. Spatial epidemiology: current approaches and future challenges. Environ Health Perspect. 2004;112(9):998–1006. [PMC free article] [PubMed] [Google Scholar]

2. Hay SI, Battle KE, Pigott DM, et al. Global mapping of infectious disease. Philos Trans R Soc Lond B Biol Sci. 2013;368(1614):20120250. [PMC free article] [PubMed] [Google Scholar]

3. Mokdad AH,Dwyer-Lindgren L, Fitzmaurice C,et al. Trends and patterns of disparities in cancer mortality among US counties, 1980–2014. J Am Med Assoc. 2017;317(4):388–406. [PMC free article] [PubMed] [Google Scholar]

4. Neyens T, Lawson AB, Kirby RS, et al. Disease mapping of zero-excessive mesothelioma data in Flanders. Ann Epidemiol. 2017;27(1):59–66. [PMC free article] [PubMed] [Google Scholar]

5. Ward C, Oleson J, Jones K, Charlton M. Showcasing cancer incidence and mortality in rural ZCTAs using risk probabilities via spatio-temporal Bayesian disease mapping. Appl Spat Anal Policy. 2019;12(4):907–921. [Google Scholar]

6. Aregay M, Lawson AB, Faes C, Kirby RS, Carroll R, Watjou K. Zero-inflated multiscale models for aggregated small area health data. Environmetrics. 2018;29(1):e2477. [PMC free article] [PubMed] [Google Scholar]

7. Asmarian N, Ayatollahi SMT, Sharafi Z, Zare N. Bayesian spatial joint model for disease mapping of zero-inflated data with R-INLA: a simulation study and an application to male breast cancer in Iran. Int J Environ Res Public Health. 2019;16(22):4460. [PMC free article] [PubMed] [Google Scholar]

8. Klein RJ, Schoenborn CA. Age Adjustment Using the 2000 Projected US Population. Healthy People Statistical Notes, No. 20. Hyattsville, MD: National Center for Health Statistics; 2001. [PubMed] [Google Scholar]

9. Buescher PA. Age-adjusted death rates. statistical primer, No. 13. State Center for Health Statistics, North Carolina Division of Public Health; 2010. [Google Scholar]

10. Arab A. Spatial and spatio-temporal models for modeling epidemiological data with excess zeros. Int J Environ Res Public Health. 2015;12(9):10536–10548. [PMC free article] [PubMed] [Google Scholar]

11. Rose CE, Martin SW, Wannemuehler KA, Plikaytis BD. On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data. J Biopharm Stat. 2006;16(4):463–481. [PubMed] [Google Scholar]

12. Loquiha O, Hens N, Chavane L, et al. Mapping maternal mortality rate via spatial zero-inflated models for count data: a case study of facility-based maternal deaths from Mozambique. PLoS One. 2018;13(11):e0202186. [PMC free article] [PubMed] [Google Scholar]

13. Nandram B, Sedransk J, Pickle LW. Bayesian analysis and mapping of mortality rates for chronic obstructive pulmonary disease. J Am Stat Assoc. 2000;95(452):1110–1118. [Google Scholar]

14. Corpas-Burgos F,García-Donato G,Martinez-Beneito MA. Some findings onzero-inflated and hurdle Poisson models for disease mapping. Stat Med. 2018;37(23):3325–3337. [PubMed] [Google Scholar]

15. Banerjee S, Carlin BP, Gelfand AE. Hierarchical Modeling and Analysis for Spatial Data. Boca Raton, FL: Chapman & Hall/CRC Press; 2014. [Google Scholar]

16. Rue H, Martino S, Chopin N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Series B Stat Methodol. 2009;71(2):319–392. [Google Scholar]

17. Gelman A. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 2006;1(3):515–534. [Google Scholar]

18. R Core Team (2020). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, https://www.R-project.org/. [Google Scholar]

19. Gómez-Rubio V. Bayesian Inference with INLA. Boca Raton, FL: CRC Press; 2020. [Google Scholar]

20. Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J R Stat Soc Ser B Stat Methodol. 2002;64(4):583–639. [Google Scholar]

21. National Center for Health Statistics. Detailed mortality - all counties (1968–2017), as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program; 2017.

22. Watanabe S. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J Mach Learn Res. 2010;11(Dec):3571–3594. [Google Scholar]

23. Jiang Y, Lawson AB, Zhu L, Feuer EJ. Interval estimation for age-adjusted rate ratios using Bayesian convolution model. Front Public Health. 2019;7:144. [PMC free article] [PubMed] [Google Scholar]

A Bayesian approach for estimating age-adjusted rates for low-prevalence diseases over space and time (2024)
Top Articles
Latest Posts
Article information

Author: Stevie Stamm

Last Updated:

Views: 5559

Rating: 5 / 5 (80 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Stevie Stamm

Birthday: 1996-06-22

Address: Apt. 419 4200 Sipes Estate, East Delmerview, WY 05617

Phone: +342332224300

Job: Future Advertising Analyst

Hobby: Leather crafting, Puzzles, Leather crafting, scrapbook, Urban exploration, Cabaret, Skateboarding

Introduction: My name is Stevie Stamm, I am a colorful, sparkling, splendid, vast, open, hilarious, tender person who loves writing and wants to share my knowledge and understanding with you.