December 22, 2024
Zohre Ebrahimi-Khusfi

Zohre Ebrahimi-Khusfi

Academic rank: Associate professor
Address:
Education: PhD. in dedesertification
Phone:
Faculty:

Research

Title
Predicting the number of dusty days around the desert wetlands in southeastern Iran using feature selection and machine learning techniques
Type Article
Keywords
Dust emissions Boruta MARS Recursive feature elimination Multicollinearity Stochastic gradient boosting Hamoun wetlands
Researchers Zohre Ebrahimi-Khusfi, Alireza Nafarzadegan, Fatemeh Dargahian

Abstract

In the past decades, some desert wetlands have become critical regions for dust production in the arid and semiarid regions of the world. Accurate prediction of the number of dusty days (NDDs) in these areas is of great importance. The most popular method for predicting climatic and environmental variables is machine learning (ML). Although it has received more attention for spatial prediction, it has received less attention for the temporal prediction of these variables. This work is the first effort to predict NDDs in the major source of dust production in southeastern Iran using ML models and different feature selection (FS) techniques. For this purpose, monthly data of 21 predictor variables related to the study period (1988–2017) was used to predict the target variable (NDDs). The main aim was to evaluate the support vector machine (SVM), conditional inference random forest (CRF), and stochastic gradient boosting (SGB) models based on three FS algorithms, including Boruta, multivariate adaptive regression splines (MARS), and recursive feature elimination (RFE) techniques in predicting NDDs around the Hamoun wetlands. After analyzing the collinearity effect and removing the independent variables with a Tolerance < 0.11, the best attributes were selected to train the SVM, SGB, and CRF models. All datasets were randomly classified into training (70%) and verification (30%) sets. The performance of models was evaluated based on the determination coefficient (R-square), root mean square error (RMSE), mean absolute error (MAE), and Nash Sutcliffe efficiency (NSE) coefficient related to holdout data. The results indicated that SGB-MARS, SGB-RFE, and SGB-Boruta outperformed other models with different FS techniques, in terms of R2 (0.9), RMSE (2.5), MAE (1.9), and NSE (0.9). Furthermore, surface winds speed, maximum air temperature, relative humidity, wetland dried bed, and erosive winds frequency were detected as the most important factors for predicting NDDs in the study area. This study encourages us to use the SGB model with various FS techniques to predict NDDs around the desert wetlands. These results can help decision-makers reduce the risks of dust emission and increase the safety of residents around the desert wetlands