Geostatistical interpolation model selection based on ArcGIS and spatio-temporal variability analysis of groundwater level in piedmont plains, northwest China

SpringerPlus, Published 11 April 2016

By Yong Xiao, Xiaomin Gu, Shiyang YinEmail author, Jingli Shao, Yali Cui, Qiulan Zhang, and Yong Niu

“Based on the geo-statistical theory and ArcGIS geo-statistical module, datas of 30 groundwater level observation wells were used to estimate the decline of groundwater level in Beijing piedmont. Seven different interpolation methods (inverse distance weighted interpolation, global polynomial interpolation, local polynomial interpolation, tension spline interpolation, ordinary Kriging interpolation, simple Kriging interpolation and universal Kriging interpolation) were used for interpolating groundwater level between 2001 and 2013. Cross-validation, absolute error and coefficient of determination (R2) was applied to evaluate the accuracy of different methods.

Groundwater level drawdown during 2001 and 2013.

Groundwater level drawdown during 2001 and 2013.

“The result shows that simple Kriging method gave the best fit. The analysis of spatial and temporal variability suggest that the nugget effects from 2001 to 2013 were increasing, which means the spatial correlation weakened gradually under the influence of human activities. The spatial variability in the middle areas of the alluvial–proluvial fan is relatively higher than area in top and bottom. Since the changes of the land use, groundwater level also has a temporal variation, the average decline rate of groundwater level between 2007 and 2013 increases compared with 2001–2006. Urban development and population growth cause over-exploitation of residential and industrial areas. The decline rate of the groundwater level in residential, industrial and river areas is relatively high, while the decreasing of farmland area and development of water-saving irrigation reduce the quantity of water using by agriculture and decline rate of groundwater level in agricultural area is not significant.”

Geographically Weighted Regression to Measure Spatial Variations in Correlations between Water Pollution versus Land Use in a Coastal Watershed

OCMOcean & Coastal Management, Volume 103, January 2015, Pages 14–24

By Jinliang Huang, Yaling Huang,Robert Gilmore Pontius Jr., and Zhenyu Zhang


  • GWR reveals spatial variation in water pollution-land use linkages.
  • Water pollution is associated more with built-up than with cropland or forest.
  • More built-up is associated with more pollution for less urbanized sub-watersheds.
  • Forest has a stronger negative association with pollution in urban sub-watersheds.
  • Cropland has a weak association with water pollution among 21 sub-watersheds.

“Land use can influence river pollution and such relationships might or might not vary spatially. Conventional global statistics assume one relationship for the entire study extent, and are not designed to consider whether a relationship varies across space. We used geographically weighted regression to consider whether relationships between land use and water pollution vary spatially across a subtropical coastal watershed of Southeast China. Surface water samples of baseflow for seven pollutants were collected twelve times during 2010–2013 from headwater sub-watersheds. We computed 21 univariate regressions, which consisted of three regressions for each of the seven pollutants. Each of the three regressions considered one of three independent variables, i.e. the percent of the sub-watershed that was cropland, built-up, or forest.

Local R2 values and local parameter estimates for GWR cropland models among three types of sub-watershed.

Local R2 values and local parameter estimates for GWR cropland models among three types of sub-watershed.

“Cropland had a local R2 less than 0.2 for most pollutants, while it had a positive association with water pollution in the agricultural sub-watersheds and a negative association with water pollution in the non-agricultural sub-watersheds. Built-up had a positive association with all pollutants consistently across space, while the increase in pollution per increase in built-up density was largest in the sub-watersheds with low built-up density. The local R2 values were stronger with built-up than with cropland and forest. The local R2 values for built-up varied spatially, and the pattern of the spatial variation was not consistent among the seven pollutants. Forest had a negative association with most pollutants across space. Forest had a stronger negative association with water pollution in the urban sub-watersheds than in the agricultural sub-watersheds. This research provides an insight into land-water linkages, which we discuss with respect to other watersheds in the literature.”

Geo-Based Statistical Models for Vulnerability Prediction of Highway Network Segments

isprsISPRS International Journal of Geo-Information, 2014, 3(2), 619-637

By Keren Pollak, Ammatzia Peled, and Shalom Hakkert

“This study describes four statistical models—Poisson; Negative Binomial; Zero-Inflated Poisson; and Zero-Inflated Negative Binomial—which were devised in order to examine traffic accidents and estimate the best probability estimating model in terms of future risk assessment at interurban road sections. The study was conducted on four sets of fixed-length sections of the road network: 500, 750, 1000, and 1500 m. The contribution of transportation and spatial parameters as predictors of road accident rates was evaluated for all four data sets separately. In addition, the Empirical Bayes method was applied. This method uses historical accidents information, allowing regression to the mean phenomenon so as to improve model results.

Expected number of accidents comparing real number of accidents and predicted number after applying EB method (road section of 500 m)—observation 3000 until 3300.

Expected number of accidents comparing real number of accidents and predicted number after applying EB method (road section of 500 m)—observation 3000 until 3300.

“The study was performed using Geographic Information System (GIS) software. Other analyses, such as statistical analyses combined with spatial parameters, interactions, and examination of other geographical areas, were also performed. The results showed that the short road sections data sets of 500 and 750 m yielded the most stable models. This allows focused treatment on short sections of the road network as a way to save resources (enforcement; education and information; finance) and potentially gain maximum benefit at minimum investment. It was found that the significant parameters affecting accident rates are: curvature of the road section; the region and traffic volume. An interaction between the region and traffic volume was also found. ”

Identification of Optimum Scopes of Environmental Factors for Snails using Spatial Analysis Techniques in Dongting Lake Region, China

pnvParasites & Vectors 7:216, Published Online 09 May 2014

By Jin-Yi Wu, Yi-Biao Zhou, Lin-Han Li, Sheng-Bang Zheng, Song Liang, Ashley Coatsworth, Guang-Hui Ren, Xiu-Xia Song, Zhong He, Bin Cai, Jia-Bian You, and Qing-Wu Jiang

Owing to the harmfulness and seriousness of Schistosomiasis japonica in China, the control and prevention of S. japonica transmission are imperative. As the unique intermediate host of this disease, Oncomelania hupensis plays an important role in the transmission. It has been reported that the snail population in Qiangliang Lake district, Dongting Lake Region has been naturally declining and is slowly becoming extinct. Considering the changes of environmental factors that may cause this phenomenon, we try to explore the relationship between circumstance elements and snails, and then search for the possible optimum scopes of environmental factors for snails.

Moisture content of soil, pH, temperature of soil and elevation were collected by corresponding apparatus in the study sites. The LISA statistic and GWR model were used to analyze the association between factors and mean snail density, and the values in high-high clustered areas and low-low clustered areas were extracted to find out the possible optimum ranges of these elements for snails.


A total of 8,589 snail specimens were collected from 397 sampling sites in the study field. Besides the mean snail density, three environmental factors including water content, pH and temperature had high spatial autocorrelation. The spatial clustering suggested that the possible optimum scopes of moisture content, pH, temperature of the soil and elevation were 58.70 to 68.93%, 6.80 to 7.80, 22.73 to 24.23[degree sign]C and 23.50 to 25.97 m, respectively. Moreover, the GWR model showed that the possible optimum ranges of these four factors were 36.58 to 61.08%, 6.541 to 6.89, 24.30 to 25.70[degree sign]C and 23.50 to 29.44 m, respectively.

The results indicated the association between snails and environmental factors was not linear but U-shaped. Considering the results of two analysis methods, the possible optimum scopes of moisture content, pH, temperature of the soil and elevation were 58.70% to 68.93%, 6.6 to 7.0, 22.73[degree sign]C to 24.23[degree sign]C, and 23.5 m to 26.0 m, respectively. The findings in this research will help in making an effective strategy to control snails and provide a method to analyze other factors.”

Spatial Distribution of Soil Organic Carbon and Total Nitrogen Based on GIS and Geostatistics in a Small Watershed in a Hilly Area of Northern China

PLOS_ONEPLOS One, Published Online 31 December 2013

By Gao Peng, Wang Bing, Geng Guangpo, and Zhang Guangcan

“The spatial variability of soil organic carbon (SOC) and total nitrogen (STN) levels is important in both global carbon-nitrogen cycle and climate change research. There has been little research on the spatial distribution of SOC and STN at the watershed scale based on geographic information systems (GIS) and geostatistics. Ninety-seven soil samples taken at depths of 0–20 cm were collected during October 2010 and 2011 from the Matiyu small watershed (4.2 km2) of a hilly area in Shandong Province, northern China. The impacts of different land use types, elevation, vegetation coverage and other factors on SOC and STN spatial distributions were examined using GIS and a geostatistical method, regression-kriging.

Distribution map of SOC and STN concentrations by regression-kriging (a, b) and ordinary kriging (c, d) in Matiyu small watershed.

Distribution map of SOC and STN concentrations by regression-kriging (a, b) and ordinary kriging (c, d) in Matiyu small watershed.

“The results show that the concentration variations of SOC and STN in the Matiyu small watershed were moderate variation based on the mean, median, minimum and maximum, and the coefficients of variation (CV). Residual values of SOC and STN had moderate spatial autocorrelations, and the Nugget/Sill were 0.2% and 0.1%, respectively. Distribution maps of regression-kriging revealed that both SOC and STN concentrations in the Matiyu watershed decreased from southeast to northwest. This result was similar to the watershed DEM trend and significantly correlated with land use type, elevation and aspect. SOC and STN predictions with the regression-kriging method were more accurate than those obtained using ordinary kriging. This research indicates that geostatistical characteristics of SOC and STN concentrations in the watershed were closely related to both land-use type and spatial topographic structure and that regression-kriging is suitable for investigating the spatial distributions of SOC and STN in the complex topography of the watershed.”

Geostatistical Approach for Site Suitability Mapping of Degraded Mangrove Forest in the Mahakam Delta, Indonesia

Journal of Geographic Information SystemJournal of Geographic Information System, Vol.5 No.5, October 2013

Ali Suhardiman, Satoshi Tsuyuki, Muhammad Sumaryono, and Yohanes Budi Sulistioadi

“As part of operational guidance of mangrove forest rehabilitation in the Mahakam delta, Indonesia, site suitability mapping for 14 species of mangrove was modelled by combining 4 underlying factors—clay, sand, salinity and tidal inundation. Semivariogram analysis and a geographic information system (GIS) were used to apply a site-suitability model, while kriging interpolation generated surface layers, based on sample point data collection. The tidal inundation map was derived from a tide table and a digital elevation model from topographic maps. The final site-suitability maps were produced using spatial analysis technique, by overlaying all surface layers. We used a Gaussian model to adjust a semivariogram graph in order to help to understand the variation of sample data values, and create a natural surface layer of data distribution over the area of study.”

Site suitability map of our study sites generated using geostatistical analysis and GIS operations.

Site suitability map of study sites generated using geostatistical analysis and GIS operations.

“By examining the statistical value and the visual inspection of surface layers, we saw that the models were consistent with the expected data behavior; therefore, we assumed that interpolation has been carried out appropriately. Our site-suitability map showed that Avicennia species was the most suitable species and matched with 50% of the study area, followed by Nypa fruticans, which occupied about 42%. These results were actually consistent with the mangrove zoning pattern in the region prior to deforestation and conversion.”

Understanding Spatial Filtering for Analysis of Land Use-transport Data

Journal of Transport GeographyJournal of Transport Geography, Volume 31, July 2013, Pages 123–131

Yiyi Wang, Kara M. Kockelman, Xiaokun (Cara) Wang


  • We explore use of spatial filtering (SF) for regression model estimation.
  • We compare SF models and SAR-type models, and a distance decay parameter.
  • Data sets contain appraised values for private properties across Texas’ Travis County.
  • SF methods allow focus on the marginal effects of policy variables and other covariates.

“This paper summarizes the literature on spatial filtering (SF) for analysis of spatial data. Given the scarcity of its application in transportation and its fledgling nature, preliminary case studies were conducted using continuous and discrete response data sets, for land values and land use, in comparison with results from spatial autoregressive (SAR) models with distance decay parameters estimated using Bayesian techniques. For both the continuous land value and binary land use cases, the SF approach demonstrates great potential as a worthy competitor to more conventional SAR-based models. In addition to offering high fit statistics, somewhat shorter computing times, and more straightforward computations, the SF approach makes explicit the patterns of spatial dependency in the land value and land use data. By controlling for these spatial relationships, the SF approach yields more reliable marginal effects of policy variables of interest. Model results confirm the important role of transportation access (as quantified using distances to a region’s central business district, and various roadway types).”

Typhoid Fever and Its Association with Environmental Factors in the Dhaka Metropolitan Area of Bangladesh: A Spatial and Time-Series Approach

PLoS Negl Trop Dis PLOS Neglected Tropical Diseases, 24 January 2013

Ashraf M. Dewan, Robert Corner, Masahiro Hashizume, and Emmanuel T. Ongee

“Typhoid fever is a major cause of death worldwide with a major part of the disease burden in developing regions such as the Indian sub-continent. Bangladesh is part of this highly endemic region, yet little is known about the spatial and temporal distribution of the disease at a regional scale. This research used a Geographic Information System to explore, spatially and temporally, the prevalence of typhoid in Dhaka Metropolitan Area (DMA) of Bangladesh over the period 2005–9. This paper provides the first study of the spatio-temporal epidemiology of typhoid for this region. The aims of the study were: (i) to analyse the epidemiology of cases from 2005 to 2009; (ii) to identify spatial patterns of infection based on two spatial hypotheses; and (iii) to determine the hydro-climatological factors associated with typhoid prevalence. Case occurrences data were collected from 11 major hospitals in DMA, geocoded to census tract level, and used in a spatio-temporal analysis with a range of demographic, environmental and meteorological variables. Analyses revealed distinct seasonality as well as age and gender differences, with males and very young children being disproportionately infected. The male-female ratio of typhoid cases was found to be 1.36, and the median age of the cases was 14 years. Typhoid incidence was higher in male population than female (χ2 = 5.88, p<0.05). The age-specific incidence rate was highest for the 0–4 years age group (277 cases), followed by the 60+ years age group (51 cases), then there were 45 cases for 15–17 years, 37 cases for 18–34 years, 34 cases for 35–39 years and 11 cases for 10–14 years per 100,000 people. Monsoon months had the highest disease occurrences (44.62%) followed by the pre-monsoon (30.54%) and post-monsoon (24.85%) season.

Spatial regression between typhoid incidence (per 100,000 people) and distance to water bodies.

Spatial regression between typhoid incidence (per 100,000 people) and distance to water bodies. A) Shows spatial distribution of the t-value, B) shows the parameter estimates.

“The Student’s t test revealed that there is no significant difference on the occurrence of typhoid between urban and rural environments (p>0.05). A statistically significant inverse association was found between typhoid incidence and distance to major waterbodies. Spatial pattern analysis showed that there was a significant clustering of typhoid distribution in the study area. Moran’s I was highest (0.879; p<0.01) in 2008 and lowest (0.075; p<0.05) in 2009. Incidence rates were found to form three large, multi-centred, spatial clusters with no significant difference between urban and rural rates. Temporally, typhoid incidence was seen to increase with temperature, rainfall and river level at time lags ranging from three to five weeks. For example, for a 0.1 metre rise in river levels, the number of typhoid cases increased by 4.6% (95% CI: 2.4–2.8) above the threshold of 4.0 metres (95% CI: 2.4–4.3). On the other hand, with a 1°C rise in temperature, the number of typhoid cases could increase by 14.2% (95% CI: 4.4–25.0).”

The Effects of City Streets on an Urban Disease Vector

PLoS Comput Biol PLOS Computational Biology, 17 January 2013

Corentin M. Barbu, Andrew Hong, Jennifer M. Manne, Dylan S. Small, Javier E. Quintanilla Calderón, Karthik Sethuraman, Víctor Quispe-Machaca, Jenny Ancca-Juárez, Juan G. Cornejo del Carpio, Fernando S. Málaga Chavez, César Náquira, and Michael Z. Levy

“With increasing urbanization vector-borne diseases are quickly developing in cities, and urban control strategies are needed. If streets are shown to be barriers to disease vectors, city blocks could be used as a convenient and relevant spatial unit of study and control. Unfortunately, existing spatial analysis tools do not allow for assessment of the impact of an urban grid on the presence of disease agents. Here, we first propose a method to test for the significance of the impact of streets on vector infestation based on a decomposition of Moran’s spatial autocorrelation index; and second, develop a Gaussian Field Latent Class model to finely describe the effect of streets while controlling for cofactors and imperfect detection of vectors. We apply these methods to cross-sectional data of infestation by the Chagas disease vector Triatoma infestans in the city of Arequipa, Peru.

Spatial distribution of Triatoma infestans presence in households of Paucarpata, Arequipa, Peru.

Spatial distribution of Triatoma infestans presence in households of Paucarpata, Arequipa, Peru. Map of the study area. Black indicates infested households, white non-infested households, and grey non-inspected households. The area encircled by dashes was used to fit the Gaussian Field Latent Class model; the remaining area was used as a validation dataset. The close-up shows the urban grid underneath and the aggregation of vectors within city blocks.

“Our Moran’s decomposition test reveals that the distribution of T. infestans in this urban environment is significantly constrained by streets (p<0.05). With the Gaussian Field Latent Class model we confirm that streets provide a barrier against infestation and further show that greater than 90% of the spatial component of the probability of vector presence is explained by the correlation among houses within city blocks. The city block is thus likely to be an appropriate spatial unit to describe and control T. infestans in an urban context. Characteristics of the urban grid can influence the spatial dynamics of vector borne disease and should be considered when designing public health policies.”

Comparison of Geostatistical Interpolation and Remote Sensing Techniques for Estimating Long-Term Exposure to Ambient PM2.5 Concentrations across the Continental United States

Environmental Health PerspectivesEnvironmental Health Perspectives, 120:1727–1732 (2012)

Seung-Jae Lee, Marc L. Serre, Aaron van Donkelaar, Randall V. Martin, Richard T. Burnett, and Michael Jerrett

“Background: A better understanding of the adverse health effects of chronic exposure to fine particulate matter (PM2.5) requires accurate estimates of PM2.5 variation at fine spatial scales. Remote sensing has emerged as an important means of estimating PM2.5 exposures, but relatively few studies have compared remote-sensing estimates to those derived from monitor-based data.

“Objective: We evaluated and compared the predictive capabilities of remote sensing and geostatistical interpolation.

Map of the United States indicating the month of the year when the monthly average PM2.5 concentration was highest; circles indicate individual monitoring sites.

Map of the United States indicating the month of the year when the monthly average PM2.5 concentration was highest; circles indicate individual monitoring sites.

“Methods: We developed a space–time geostatistical kriging model to predict PM2.5 over the continental United States and compared resulting predictions to estimates derived from satellite retrievals.

“Results: The kriging estimate was more accurate for locations that were about 100 km from a monitoring station, whereas the remote sensing estimate was more accurate for locations that were > 100 km from a monitoring station. Based on this finding, we developed a hybrid map that combines the kriging and satellite-based PM2.5 estimates.

“Conclusions: We found that for most of the populated areas of the continental United States, geostatistical interpolation produced more accurate estimates than remote sensing. The differences between the estimates resulting from the two methods, however, were relatively small. In areas with extensive monitoring networks, the interpolation may provide more accurate estimates, but in the many areas of the world without such monitoring, remote sensing can provide useful exposure estimates that perform nearly as well.”