A method to identify geochemical mineralization on linear transect

Mineral exploration in biogeochemistry is related to the detection of anomalies in soil, which is driven by many factors and thus a complex problem. Mik\v{s}ov\'a, Rieser, and Filzmoser (2019) have introduced a method for the identification of spatial patterns with increased element concentrations in samples along a linear sampling transect. This procedure is based on fitting Generalized Additive Models (GAMs) to the concentration data, and computing a curvature measure from the pairwise log-ratios of these fits. The higher the curvature, the more likely one or both elements of the pair indicate local mineralization. This method is applied on two geochemical data sets which have been collected specifically for the purpose of mineral exploration. The aim is to test the technique for its ability to identify pathfinder elements to detect mineralized zones, and to verify whether the method can indicate which sampling material is best suited for this purpose. Reference: Mik\v{s}ov\'a D., Rieser C., Filzmoser P. (2019)."Identification of mineralization in geochemistry along a transect based on the spatial curvature of log-ratios."arXiv, (1912.02867).


Introduction
The identification of mineralized zones belongs to the important challenges in applied geochemistry. The difficulty is that the targeted mineralizations could be of any arbitrary size, and in any depth, depending on the type of mineralization. The common procedure to discover mineralized zones is based on sampling, using strategic sampling designs in order to be as economic as possible. Samples can be taken from different soil layers, but also from different trees and plants around the presumed target. Since samples and their analysis of element concentrations is cost intensive, the sampling is often done on linear transects, crossing the presumed mineralized zones. If there is more evidence, drilling is also used in order to obtain a depth profile of the element concentrations. More information on different sampling strategies can be found in Mikšová et al. (2019a).
In this work we assume that the sampling has been carried out on a linear transect, or that the available samples can be aggregated to such linear transects. This means that the spatial locations can be considered along a line, and thus it is simple to graphically investigate the spatial variability of the measured elements by simply plotting the element concentrations against the locations (Torppa and Middleton, 2017). However, in modern geochemistry, the number of elements that can be reliably measured is in the range of 30-60, and if the samples have been obtained from several different sampling media, it is a challenging task to study all resulting plots for abrupt changes in the concentration values. Such changes could indicate mineralized zones, since their signals could lead to sudden increases of element concentrations. There is, however, the problem that due to the (economic) sampling procedure, only very few samples might have been taken on top of the mineralizations, and together with measurement and analysis uncertainties, the resulting concentration changes might not be clearly expressed. The second problem is that there is an interplay of the concentration values among the elements, because geochemical data are compositional by their nature (Aitchison, 1986;Filzmoser et al., 2018).
Consider a composition x m 1 , . . . , x m Dm , consisting of D m chemical elements, measured in m = 1, . . . , M different sample materials. For the analysis of compositional data it has become popular to use the so-called log-ratio methodology, introduced by Aitchison (1986). This refers to the use of logarithms of ratios, and the basic information are logratio pairs ln(x m j /x m l ), for j, l ∈ {1, . . . , D m }. The use of log-ratios eventually leads to a sound geometrical concept, referred to as the Aitchison geometry (Pawlowsky-Glahn et al., 2015). There are, however, also practical reasons why log-ratios are useful, such as symmetry around zero, and equal variance if numerator and denominator are exchanged.
A further argument for considering (log-)ratios is the assumption that there could be elements which are stable and thus not affected by a mineralization, and others are very indicative of mineralized zones. The log-ratio of such elements could even better express the local change around a mineralization, because measurement and analysis uncertainties could cancel each other out (Mikšová et al., 2019b).
On the other hand, D m different elements would lead to D m (D m −1)/2 different (and relevant) pairwise log-ratios, which makes a visual inspection practically impossible. For this reason, Mikšová et al. (2019b) have introduced a procedure to rank the list of log-ratio pairs according to their ability to indicate mineralization. This is done by first approximating the individual element concentration by a smooth fit, taking log-ratios of the smooth fits, and computing a measure of curvature. The higher the curvature, the more likely (at least) one of the log-ratio pair elements shows sudden changes. In addition, the visualization of the smooth fits and their log-ratio allows to localize the presumed mineralized zones.
In this paper we briefly review the method of Mikšová et al. (2019b). Then we apply this procedure to two geochemical data sets, originating from surveys carried out in Greenland and France, respectively, in the frame of the ongoing project "UpDeep" (UpDeep, 2017(UpDeep, -2020, which aims at developing and implementing a methodology to identify mineralization.

Methodology
As already indicated in the introduction, the main idea of the methodology developed in Mikšová et al. (2019b) is that at the beginning and end of a transect crossing a potential mineralization, important log-ratios of an element pair display a very quick spatial change which can be captured by a measure based on the curvature of the latter.
The first step of the methodology consists in fitting a so called GAM model, see Wood (2017), to each element, with concentration values y i at locations x i , for i = 1, . . . , n. After considering the nature of our data and after inspection of the corresponding residual plots we decided to model the data belonging to the Tweedie family with a log-link and additional weights. Modelling the element concentrations in such a way means that for each element the following optimization problem based on the log-likelihood function l is solved to obtain a linear predictor η with predefined weights ω i , upweighting certain points, a suitable function space H and a smoothing parameter λ.
This results in GAM fitsf el 1 andf el 2 for each pair of elements el 1 and el 2 , and the log-ratio of the fits for any location x along the transect can be obtained subsequently as where h(·) stands for link function. Its curvature is then computed by where k is a scaling factor allowing the curvatures to be comparable amongst different pairs. Finally, for each pair of log-ratios, the following quantity is introduced to measure quantitatively important spatial changes potentially indicating the beginning and the end of a mineralization, namely: This measure is denoted as the c-value in the following. Here, (·) + denotes max(·, 0), and T is a threshold, L is the number of times that the curvature κ crosses the threshold, and [x j 2l−1 , x j 2l ] are the corresponding points where this happens. It is easy to see that only points x for which the curvature is above the threshold are influencing this measure c(el 1 , el 2 ). This avoids any influence of small values of κ(x), meaning that only very high signal changes of the log-ratio are taken into account. Summing up over all maximum leads to a quantity measuring the mean number of high signal changes. For a more detailed description of the weights ω i , the smoothing parameter λ, the scaling factor k, the threshold T , and the numerical computation of the derivative, as well as the measure c(el 1 , el 2 ) we refer to Mikšová et al. (2019b).
Since we are dealing with compositional data, one could argue that not the absolute element concentrations should be used for the GAM fits, but rather the log-ratios of all pairs of elements. Although this could be a reasonable approach, there are several arguments against this idea: (a) The GAM fits may require some manual adjustment and tuning, which is not feasible for all pairwise log-ratios. (b) Typically, the number of observations is rather low, and there could be some data quality issues as well. GAM fits on the raw data could, to some extent, "repair" this effect, particularly if there is uncertainty in small concentrations, and ideally the data quality after the log-ratios of the GAM fits increases.

Results
One important part of the UpDeep project has been to take samples in two countries, namely in Greenland and France. This sampling procedure was successfully accomplished by the sample providers GEUS (Geological Survey of Denmark and Greenland) and BRGM (The French Geological Survey), respectively. The sampling was performed by executing geochemical sampling surveys according to the established protocols in geologically well-known mineralized areas. The samples in both countries were taken in the years 2017 and 2018, however in this context we focus on one specific year 2018.

GEUS data
The sampling areas were chosen due to known mineralization and exploration in the area. The interest is in the area Isortoq, which is situated in the very south of Greenland, see Figure 1 for a detailed map, where the map background is obtained using Google maps (Kahle and Wickham, 2013). In total, three traverses were sampled which are 300 meters apart. The samples from the different traverses are shown in different color in the map. Green color refers to the locations of known mineralizations. In this case the deposit is an Iron (Fe), Vanadium (V), Titanium (Ti) deposit. A possible proxy for V could be Scandium (Sc) (since V tends to be analyzed poorly). In our analyses we merge the samples from the three traverses into one linear transect, which means that all samples have been taken, but their locations are set to a linear transect in the center of the three traverses. The individual samples are now at a distance between 50 to 400 meters. The total length of the transect is about 12 km.
Two different plant species and soil samples have been investigated, namely Salix Glauca and Empetrum Nigrum with 49 samples, and soil comprises 47 observations containing only so called routine samples. Following the procedure of Section 2, Figure 2 shows the log-ratio pair of the GAM fits of the elements Ti and Ca (Calcium) measured in soil, and Figure 3 displays the resulting log-ratio for Fe and P (Phosphorus) in soil. Both log-ratios yield top-ranked cvalues, see Equation (1). In these plots, the mineralized zones are shown by red points. The blue points are the predicted mineralizations, when the curvature exceeds the threshold T , which is indicated by the horizontal dashed line. The predictions confirm the presumed mineralized zones very well, and they do not indicate new mineralized areas.
A useful tool to display the overall information about meaningful log-ratios is the heatmap. The input for the heatmap is a matrix of c-values, computed from the GAM fits of all pairwise log-ratios of a specific sampling material (plants or soil). Figure 4 shows the resulting heatmap for Salix Glauca (left), Empetrum Nigrum (right), and soil (bottom). Obviously, the heatmaps are symmetric due to the symmetry of the log-ratios. The darker the blue color, the higher is the c-value obtained from the corresponding log-ratio. A dark blue row or column in the heatmap indicates so-called pathfinder elements, which potentially refer to mineralization. From a geochemical point of view, most of the elements with higher c-values are related to the deposit  These heatmaps can also be used to identify potentially interesting pathfinder elements that could indicate new mineralized zones. For example, the heatmap for soil (bottom plot in Figure 4) shows a high c-value for the pair Na (Sodium) and Pb (Lead).

BRGM data
The second data set originates from the Vendée area in middle-west France, which has been sampled in 2018. The area was investigated because of some historical knowledge of the occurrence of rare elements. Moreover, an easy access allowed for a valuable recognition of the area prior to sampling. Figure 6 shows a satellite map of the area where the samples have been taken from three different sites. Each of these subareas contains two traverses which are again merged to one transect in our procedure in order to increase the number of observations per site. The first site in the south-west of Figure 6 holds approximately 30 samples, the second (middle) site about 40, and the third (north-eastern) site only 18 samples. The presumed mineralization type on all sites is Antimony (Sb) and Gold (Au). Due to pre-studies it turned out that the second site has the highest concentrations of Sb. The element Au is in any case difficult to measure.
This data set provides in total 6 different sample materials, namely Ah horizon with Aqua Regia leach (AhAQ), Ah with deionized water leach (AhL1), Ah with sodium pyrophosphate leach (AhL3), Bramble branch (BB), Bramble leaves (BL), and Oak bark (OB). Rather than investigating again the curvature plots, we focus now on the task to identify the most promising sample material indicating mineralization. An answer would be highly relevant, because sampling of the different materials is very time-and cost-intensive. Figure 7 presents for each sample site the top-ranked 70 c-values from all pairwise log-ratios of the GAM fits, separated by sample material. Since the log-ratios of the fitted values are scaled to the interval [0, 1], the c-values are comparable, regardless of sample site and sample material. We obtain the highest c-values for Site 2, which is the most reliable sample site due to the higher number of observations. The plot for Site 2 reveals a clear difference in the top ranked c-values for the mineralization, while the soils seem to be highly informative. All sites show that sample material OB performs q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Site 1 Site 2 Site 3 Figure 6: Map with the sample locations taken by BRGM in the Vendée area in 2018 worst in terms of the c-values, and thus this is the least interesting sample material. The heatmaps in Figure 8 confirm our findings. The left plot for the soil material AhAQ identifies Sb (and to a lesser extent Zn) as important pathfinder element of mineralization. The right plot for plant BB uses the same color scheme, but represents much lower c-values (see Figure 7, middle). This heatmap shows a rather inhomogeneous structure and thus no clear pathfinder elements.

Summary
Due to the technological developments, mineral exploration nowadays belongs to the most important tasks in geochemistry. Although many chemical elements can be investigated for their concentration in different sample materials, sampling is still timeand cost-intensive, and this is the reason why usually only 20-60 samples are available at a potentially mineralized zone. The common strategy is to position the samples on (a) linear transect(s), crossing the mineralized zones, and mineralization would then appear in terms of increased element concentrations.
Rather than investigating single element concentrations, Mikšová et al. (2019b)   have developed a method based on considering log-ratios of all pairs of elements. Since the number of possible pairs increases quickly with the number of investigated chemical elements, a strategy has been proposed to rank the element pairs according to their relevance for mineral exploration. This strategy uses a measure of curvature for logratios of smooth fits of the concentration values. The resulting c-values are normalized and can be compared across different pairs, and even across different sample materials and sites.
In this paper we have demonstrated the usefulness of this procedure based on two data sets that have been collected specifically for the purpose of mineral exploration. For the first data set originating from Greenland it has been shown that the c-values indeed identify important pathfinder elements to confirm presumed mineralized zones, but they seem also promising to point at new locations with potential mineralization. The second data set from France was employed to investigate which sample material is most promising to detect mineralization. It turned out that the soil samples are much more informative than the plant samples, but this may again depend on the type of mineralization, and probably even on further factors.
In our future work we will extend the methodology of Mikšová et al. (2019b) to the case where the samples are not necessarily taken along a linear transect, but on a sample grid with different x-and y-coordinates. This means that the smooth fits as well as the curvature measure need to be extended to the two-dimensional case.