EXAMINING THE POTENTIAL OF BIG DATA ANALYTICS FOR EARLY DETECTION OF INFECTIOUS DISEASE OUTBREAKS: A COMPREHENSIVE REVIEW OF GLOBAL SURVEILLANCE SYSTEMS

Authors

  • Md Shafiqul Islam, Amir Hamza Akash, Md Rashedul Bari, Md Ariful Islam, Mohammad Nowsher Ali Author

Keywords:

Big Data Analytics, Infectious Disease Outbreaks, Early Detection, Global Surveillance Systems, Logistic Regression, Random Forest, Anomaly Detection, ROC Curve Analysis, Public Health

Abstract

Big data analytics has a revolutionary potential in early warning of any outbreak of infectious disease through supporting global surveillance mechanisms. This paper compares the logistic Regression and Random Forest methods and the Isolation Forest, which are used in classification and identification of anomalous cases, respectively, in a synthetic set of data that includes reported cases, hospital admissions, and environmental data of various countries. The analysis shows the presence of strong positive correlations in the form of 0.76 between Year and Reported Cases as well as between Year and Vaccination coverage, and a significant negative relationship of -0.77 between Week Number and Weather Index, which may be regarded as key influencing factors. Logistic Regression and Random Forest had an accuracy of 0.8510 and 0.8529, respectively, with their confusion matrices indicating strong performance in predicting the majority group, but Random Forest performs better in predicting class 1 (412 versus 409 true positives). It illustrates the presence of outliers in Lab Confirmed Cases versus Hospital Admissions, as it is provided by Isolation Forest, which proves possible goals of subsequent research. The feature importance analysis identifies clinical variables (e.g., Hospital_Admissions, Lab_Confirmed) as essential predictors, and gives additional information with the help of the social and mobility samples through anomaly detection. The results highlight the importance of combining various sources of data into surveillance systems to enhance early warning. The future research may involve adding real-time, multi-country data to the existing datasets and investigating ensemble approaches to gain more accuracy and overcome overfitting. The paper adds to the evidence base of data-driven strategies literature, which could serve as a basis for the optimal global health monitoring systems.

Downloads

Published

2025-09-27

Issue

Section

Articles