Analysis of Syntactic and Semantic Features for Fine-grained Event–Spatial Understanding in Outbreak News Reports

Journal of Biomedical Semantics, 2010, 1:3 (31 March 2010)

Hutchatai Chanlekha and Nigel Collier

“Background: Previous studies have suggested that epidemiological reasoning needs a fine-grained modelling of events, especially their spatial and temporal attributes. While the temporal analysis of events has been intensively studied, far less attention has been paid to their spatial analysis. This article aims at filling the gap concerning automatic event-spatial attribute analysis in order to support health surveillance and epidemiological reasoning.

“Results: In this work, we propose a methodology that provides a detailed analysis on each event reported in news articles to recover the most specific locations where it occurs. Various features for recognizing spatial attributes of the events were studied and incorporated into the models which were trained by several machine learning techniques. The best performance for spatial attribute recognition is very promising; 85.9% F-score (86.75% precision / 85.1% recall).

“Conclusions: We extended our work on event-spatial attribute recognition by focusing on machine learning techniques, which are CRF, SVM, and Decision tree. Our approach avoided the costly development of an external knowledge base by employing the feature sources that can be acquired locally from the analyzed document. The results showed that the CRF model performed the best. Our study indicated that the nearest location and previous event location are the most important features for the CRF and SVM model, while the location extracted from the verb’s subject is the most important to the Decision tree model.”