An Empirical Study on the Effectiveness and Efficiency of Machine Learning Classifiers for Liver Disease Prediction
DOI:
https://doi.org/10.58681/ajrt.25090106Keywords:
Liver Disease Prediction, Machine Learning Classification, Class Imbalance, Hyperparameter Tuning, Ensemble MethodsAbstract
Liver disease poses a significant global health burden, with high mortality rates exacerbated by challenges in early detection. Machine learning (ML) offers promising avenues for developing automated diagnostic tools to address this critical need. While various ML classifiers have been explored for liver disease prediction, a comprehensive, systematic comparison of a wide range of modern algorithms, incorporating robust pre-processing, handling of class imbalance, hyper parameter tuning with cross-validation, and analysis of computational efficiency, is essential to guide the selection of models for practical application. This study systematically evaluates thirteen diverse ML classification algorithms using the Liver Patient Dataset (LDPD). The methodology includes data pre-processing with imputation, encoding, and standardization within a pipeline to prevent data leakage, handling class imbalance using SMOTE, splitting data into training and testing sets, and employing RandomizedSearchCV with Stratified K-Fold cross-validation for hyper parameter optimization. Performance was assessed using key metrics including Accuracy, Precision, Recall, Specificity, F1-Score, and ROC AUC on an independent test set, alongside training time. Results demonstrate that ensemble and advanced tree-based methods achieve superior predictive performance. Hyper parametertuning further optimized performance, with Tuned Random Forest achieving the highest ROC AUC (0.9995) and Specificity (0.9973), and Tuned LightGBM achieving the highest Recall (0.9996). The study highlights a crucial trade-off: while tuning yields peak performance, default configurations of efficient models like LightGBM and XGBoost offer exceptionally high performance (ROC AUC ≥ 0.9993) combined with significantly faster training times (≤ 0.41 seconds), providing a favorable balance for practical application. This research identifies highly effective and efficient ML models for liver disease prediction, contributing empirical evidence to support the development of automated diagnostic aids.