DATA SCIENCE TECHNIQUES FOR ENHANCING CYBERSECURITY THROUGH ANOMALY DETECTION
Keywords:
Anomaly Detection, Cybersecurity, Machine Learning, Deep Learning, Hybrid Framework, Real-Time Monitoring, Scalability, Threat Intelligence, False Positives, User AwarenessAbstract
The escalating sophistication of cyber threats necessitates advanced detection methods that can proactively identify novel attacks. Data science, and specifically machine learning-based anomaly detection, presents a powerful paradigm for moving beyond the limitations of traditional, signature-based security systems. This study aimed to systematically develop, compare, and evaluate the performance of four distinct machine learning models. Logistic Regression, Random Forest, Support Vector Machine, and Neural Networks, for the task of network intrusion detection, and to identify the most effective approach. A quantitative, experimental design was employed using a public network intrusion dataset. The methodology involved comprehensive data preprocessing, including feature selection using Recursive Feature Elimination and class imbalance handling with SMOTE. The four models were trained and then rigorously evaluated using a stratified 5-fold cross-validation strategy. Performance was assessed via accuracy, precision, recall, F1-score, and an in-depth confusion matrix analysis to quantify classification errors. A clear performance hierarchy emerged. The Random Forest model achieved perfect performance (100% across all metrics), followed closely by the Neural Network model (99%). Both significantly outperformed the SVC (97%-98%) and the Logistic Regression (92%-93%) models. Error analysis confirmed that the Random Forest model produced minimal false positives (7) and false negatives (11), demonstrating its exceptional reliability. Ensemble learning models like Random Forest and deep learning models are exceptionally effective for network anomaly detection. A rigorous data science workflow is critical for developing reliable, high-performing cybersecurity solutions capable of addressing the complexity of the modern threat landscape.