project

Data mining in suspect screening data; searching for relevant new chemicals in different water types

Data from suspect screening analyses collected over the past few years contain a treasure of information. The data originate from samples of drinking water, groundwater, surface water and wastewater effluents. However, it isn’t easy to distil the information and knowledge contained in all these data. New software makes it possible to compare large numbers of samples with each other and to simultaneously screen them for several known and unknown compounds (data mining).

In this research project, additional useful information is extracted from the large volumes of already existing suspect screening data. The project’s first objective is to search the data for a variety of relevant chemicals which, on the basis of their production volumes, usage or toxicological properties, are of possible relevance for the (drinking) water cycle. The second objective is to characterise the water quality of a variety of water types.

Data mining and suspect screening data in the process of searching for relevant chemicals and identifying patterns/relations in the occurrence of chemicals in different water types.

The differences in concentration sums of different chemicals per sample (in internal standard equivalents) among different water types. EFF = effluent, OW = surface water, GW = groundwater, DW = drinking water, n = sample number)

The differences in concentration sums of different chemicals per sample (in internal standard equivalents) among different water types. EFF = effluent, OW = surface water, GW = groundwater, DW = drinking water, n = sample number)

Suspect screening and statistical analysis

The first phase of the research consisted of the screening of a large amount of suspect screening data (151 samples) in a preselection of relevant anthropogenic chemicals in the European market (5,200 chemicals). Then, in a separate study, the chemicals identified were prioritised (BTO 2015.003). In the second phase, the focus was on the identification of patterns in the occurrence of chemicals in various water types. The samples were compared to each other using statistical methods. Both the large-scale suspect screening [linking (unknown) compounds from suspect screening data to potential candidates] as well as the statistical methods employed are new in data analysis of suspect screening chemical research.

Valuable method for interpreting screening data

Suspect screening is a powerful technique for the detection of new anthropogenic chemicals, provided that additional identification is carried out. Of the 5,200 ‘suspects’, 1,260 potential candidate chemicals were identified.

The application of suspect screening offers a possible means of realising the objective of researching ‘other anthropogenic compounds’ laid out in the Dutch Decree on Water Quality. The identification of emerging contaminants guides the process of monitoring and safeguarding water quality.

Following the identification of suspects, it is in principle possible, on the basis of chemical properties, to determine the relations between a specific water type and the occurrence of a chemical. Such relations help in assessing the behaviour of non-analysed chemicals in the water.