Linking Health and Environmental Data in Geographical Analysis: It’s So Much More than Centroids

719813…in Spatial and Spatio-Temporal Epidemiology, Volume 1, Issue 1…

Linda J. Young, Carol A. Gotway, Jie Yang, Greg Kearney, Chris DuClos

“Programs and studies increasingly use existing data from multiple sources (e.g., surveillance systems, health registries, or governmental agencies) for analysis and inference. These data usually have been collected on different geographical or spatial units, with each varying from the ones of interest. Combining such disparate data creates statistical challenges. Florida’s efforts to move toward implementing the Centers for Disease Control and Prevention (CDC)’s Environmental Public Health Tracking (EPHT) program aptly illustrate these concerns, which are typical of studies designed to measure the association between environmental and health outcomes. In this paper, we develop models of spatial associations between myocardial infarctions (MIs) and ambient ozone levels in Florida during August 2005 and use these models to illustrate the problems that can occur when making inferences from aggregated data, the concept of spatial support, and the importance of correct uncertainty assessment. Existing data on hospital discharges and emergency department visits were obtained from Florida’s Agency for Health Care Administration. Environmental data were obtained from Florida’s Department of Environmental Protection; sociodemographic data were obtained from the US Census Bureau; and data from CDC’s Behavioral Risk Factor Surveillance System were used to provide additional information on other risk factors. We highlight the opportunities and challenges associated with combining disparate spatial data for EPHT analyses. We compare the results from two different approaches to data linkage, focusing on the need to account for spatial scale and the support of spatial data in the analysis. We use geographically weighted regression, not as a visual mapping tool, but as an inferential tool designed to indicate the need for spatial coefficients, a test that cannot be made by using the majority of Bayesian models. Finally, we use geostatistical simulation methods for uncertainty analysis to demonstrate its importance in models with predicted covariates. Our focus is on relatively simple methods and concepts that can be implemented with ESRI’s ArcGIS software.”