Please use this identifier to cite or link to this item: http://hdl.handle.net/2122/14629
Authors: Bui, Dieu Tien* 
Khosravi, Khabat* 
Karimi, Mahshid* 
Busico, Gianluigi* 
Sheikh Khozani, Zohreh* 
Nguyen, Hoang* 
Mastrocicco, Micol* 
Tedesco, Dario* 
Cuoco, Emilio* 
Kazakis, Nerantzis* 
Title: Enhancing nitrate and strontium concentration prediction in groundwater by using new data mining algorithm
Journal: Science of The Total Environment 
Series/Report no.: /715 (2020)
Publisher: Elsevier
Issue Date: 1-May-2020
DOI: 10.1016/j.scitotenv.2020.136836
Keywords: Data mining; Gaussian process; Italy; Nitrate; Prediction; Strontium
Abstract: Groundwater resources constitute the main source of clean fresh water for domestic use and it is essential for food production in the agricultural sector. Groundwater has a vital role for water supply in the Campanian Plain in Italy and hence a future sustainability of the resource is essential for the region. In the current paper novel data mining algorithms including Gaussian Process (GP) were used in a large groundwater quality database to predict nitrate (contaminant) and strontium (potential future increasing) concentrations in groundwater. The results were compared with M5P, random forest (RF) and random tree (RT) algorithms as a benchmark to test the robustness of the modeling process. The dataset includes 246 groundwater quality samples originating from different wells, municipals and agricultural. It was divided for the modeling process into two subgroups by using the 10-fold cross validation technique including 173 samples for model building (training dataset) and 73 samples for model validation (testing dataset). Different water quality variables including T, pH, EC, HCO3-, F-, Cl-, SO42-, Na+, K+, Mg2+, and Ca2+ have been used as an input to the models. At first stage, different input combinations have been constructed based on correlation coefficient and thus the optimal combination was chosen for the modeling phase. Different quantitative criteria alongside with visual comparison approach have been used for evaluating the modeling capability. Results revealed that to obtain reliable results also variables with low correlation should be considered as an input to the models together with those variables showing high correlation coefficients. According to the model evaluation criteria, GP algorithm outperforms all the other models in predicting both nitrate and strontium concentrations followed by RF, M5P and RT, respectively. Result also revealed that model's structure together with the accuracy and structure of the data can have a relevant impact on the model's results.
Appears in Collections:Article published / in press

Files in This Item:
File Description SizeFormat Existing users please Login
Bui et al. (2020).pdf2.55 MBAdobe PDF
Show full item record

Page view(s)

40
checked on Apr 17, 2024

Download(s)

2
checked on Apr 17, 2024

Google ScholarTM

Check

Altmetric