It is necessary to predict hair mercury
(Hg) levels and specify the related effective factors to
develop preventive strategies to reduce Hg exposure
in different regions. This study is the first effort to
investigate the effectiveness of eight machine learning
(ML) models (including multiple linear regression,
decision tree regression, least absolute shrinkage and
selection operator, multivariate adaptive regression
splines, random forest, extreme gradient boosting,
K-nearest neighbor, and Gaussian process) for predicting
hair Hg levels and identifying the most important
factors affecting them in residents of southwestern
Iran. All ML models were trained with 70% of
the dataset and their performance was evaluated using
the determination coefficient (
R2), root mean square
error (RMSE), and mean absolute error (MAE) based
on the remaining dataset. Finally, the Permutation
Feature Importance (PFI) method was used to determine
the relative importance (RI) of influencing factors.
Mean hair Hg (3.31 μg g⁻1) was higher than the
United States Environmental Protection Agency (US
EPA) and World Health Organization (WHO) limits.
It was indicated a high exposure risk for some
people in this region. The extreme gradient boosting
(XGB) model outperformed other algorithms in
modeling hair Hg levels, with R2
= 0.61, RMSE = 2.2,
and MAE = 1.25. According to the PFI analysis,
weight (RI: 43.4%) and geographic place (RI: 41.8%)
were found as the most important demographic factors
influencing Hg variation in the study population.
Additionally, occupation (RI: 46.1%) and the
frequency of fish and canned fish consumption (RI:
22%) were identified as the most significant exposure
factors controlling hair Hg variability in southwestern
Iran. These findings can be useful for formulating
appropriate strategies to reduce the health risk of Hg
exposure and improve human health.