|
چکیده
|
Accurate and reliable prediction of dust event frequency, based on stable environmental factors and understanding their
contribution, is essential to reduce harmful effects on human health and the environment. This study proposes a novel
consensus-based voting strategy combining six feature selection methods—correlation analysis, mutual information, elastic
net, genetic algorithm, recursive feature elimination, and random forest—along with variance inflation factor analysis to
identify the most stable environmental factors of dust event frequency in Iran’s Central Plateau, West Asia. Six tree-based
machine learning models (Decision Tree, Random Forest, Extra Trees, XGBoost, LightGBM, and CatBoost) were trained to
predict dust event frequency. Their performance was compared using a multi-criterion ranking approach based on R-square,
root mean square error, mean absolute error, median absolute error, mean absolute percentage error, and uncertainty analyses
on training and test sets. Shapley additive explanation values were applied to interpret predictor importance. Among
thirty-four environmental variables, twelve were identified as key factors affecting monthly dust event frequency variability.
According to proposed approach, CatBoost outperformed other models, followed by Random Forest, XGBoost, Decision
Tree, LightGBM, and Extra Trees. The best model yielded R-squared values of 0.94 for training and 0.61 for testing with
corresponding root mean square error (8.8, 20.6), mean absolute error (7.1, 16.6), mean absolute percentage error (42.2,
101.3), and median absolute error (6.1, 13.7). Monthly evapotranspiration, surface pressure, rain, and soil moisture variability
were the strongest governing factors. The results form the foundation for early warning systems, dust abatement planning,
and policy making for arid environments.
|