A Study on Text Classification for Webmining Based Spatio Temporal Analysis of the Spread of Tropical Diseases

Proc. of International Conference on Advance Computer Science & Information System (ICACSIS) 2010, pp.311-314, Bali, Indonesia, 2010

Fatimah Wulandini, Iqbal Yasin, Anto Satriyo Nugroho, Bowo Prasetyo, Mohammad Teduh, Uliniansyah, Vitria Pragesjvara, Made Gunawan, Gunarso, Ratih Irbandini, and Dwi Handoko

“The rapid growth of tropical diseases in Indonesia had led to countless number of victims. Experts had tried to overcome the problem by monitoring the spreading and collecting useful information regarding these diseases. Web mining is one technique to collect data information from the Internet. Spatio-temporal data of tropical diseases can be collected by using web mining so the useful information can be extracted for further analysis. The main objective of this study is to create a text classification system which classified the web document using several learning methods including naïve Bayes, nearest neighbor, decision tree and support vector machine (SVM) with Sequential Minimal Optimization algorithm. The classification is intended to construct a spatio temporal analysis for documents classified into health. The result showed that naïve Bayes and SVM-SMO achieve good performance (naïve Bayes: 95% and SVM-SMO: 92%). Multinomial distribution of naïve Bayes is able to normalize the length of document while SVM-SMO performed well in high-dimensional data.”