Farmer’s data sourcing: A best practise example for mapping soil properties in permanent crops in South Tyrol (northern Italy)

In agriculture, detailed knowledge of soil properties is a key element for high-quality food production. However, soil data at a single parcel scale are generally unavailable. In this study, a best practice framework is presented where, through an operational chain from an individual farmer through a centralised database and with a geostatistical approach, new knowledge has been generated that enables application far beyond a single soil sample at the parcel scale. This study was carried out in intensively managed permanent crops in South


Introduction
Knowledge of soil chemical and biophysical properties and their spatial variability is a key element in soil management, providing farmers, land managers, and policy makers valuable information for the implementation of effective management strategies and sustainable agriculture (Alliaume et al., 2010;Rüdisser et al., 2015, McBratney et al., 2014Plant, 2001).
To maintain profitability and deliver high-quality food whilst maintaining and enhancing sustainable agriculture, integrated farming regulations are widely adopted throughout European agriculture (Edwards et al., 1993;Morris and Winter, 1999). Particularly, the integrated farming program aids in the improvement of soil quality via a guideline for soil management (Hendrickson et al., 2008). In this framework, farmers using these management practices collect a large amount of valuable spatio-temporal agronomic data. As done in some citizen science projects for collecting and disseminating soil information (Rossiter et al., 2015), farmers are performing valuable data sourcing by engaging in interdisciplinary collaborations with the various stakeholders (Bouma, 2015). Promoting the public availability of soil information from agricultural fields generates enormous interest among retailers interested in better leveraging data and exploring new potential applications (Drakos et al., 2015;GODAN, 2015;Woodard, 2016).
Throughout their agronomic practices and implementation of integrated agriculture guidelines, farmers regularly monitor several chemical-physical properties of soil. Among these, soil texture, soil organic matter (SOM), and pH are the most relevant. Soil texture influences nearly all soil processes and contributes to water, heat, and nutrient fluxes and holding capacity (Saxton et al., 1986;Sinowski et al., 1997;Grashey-Jansen, 2010). SOM affects soil-water relationships (Bot and Benites, 2005;Lal, 2007) and is a soil fertility indicator (Herrick, 2000). The pH directly affects nutrient availability, nutrient uptake, and heavy metal dissolution, which may be toxic to plants. Thus, non-optimal soil pH can cause plant stress (Läuchli and Grattan, 2017).
Digital soil mapping allows pointwise soil information to be extrapolated and soil properties in space to be predicted through a robust interpolation method (Minasny and McBratney, 2016;Robinson and Metternicht, 2006). These techniques have been extensively tested by various authors (Scull et al., 2003;Hengl et al., 2004). The most advanced techniques usually employed are the geostatistical methods for data interpolation by means of Kriging. For continuous variables such as pH, SOM, and several other chemical soil properties, Kriging has been proven to have good predictive capability (Hengl et al., 2004;Vieira et al. 1983). Meanwhile, the spatial prediction of compositional data such as soil texture can be estimated by means of compositional Kriging (De Gruijter et al., 1997;Kerry and Oliver, 2007;Zhang et al., 2013).
Collecting comprehensive datasets is a challenging task; national-level soil surveying covers medium-to large-scale and generates coarse-scale soil maps (Scull et al., 2003). Smallscale soil surveying covers short-distance changes in soil properties (Corstanje et al., 2007) related to topography (Schaetzl and Anderson, 2005) and land use management (Barrios et al., 2015;Sun et al., 2003). Single institutions collect limited amounts of soil data with highquality standards and methods generally used for larger areas, while projects with a citizen scientist approach collect large amounts of data with diverse quality and collection protocols.
In the South Tyrol case study, farmers, private stakeholders, and the public administration have established a rather unique and long tradition of collaboration. The local public research centre of agriculture and forestry provides important services (Dalla Via and Mantinger, 2012) to local farmers. It has also stored decades of soil data with reliable protocols and standards.
In this study, a best practice framework is presented, linking individual farmers' agronomic practices with a centralised soil database and a geostatistical approach, providing further insights into spatial soil properties, knowledge of which can be used to address issues beyond the single parcel of land. This contribution aims for the following objectives: (i) to present the best practice framework for farmer data sourcing of spatial soil information in intensively managed permanent crops in South Tyrol; (ii) to show the capability of a high-density soil database to predict accurate soil property maps; (iii) to briefly discuss the maps produced of soil texture, SOM, and pH.

Study area
This study covered the Venosta/Vinschgau and Adige/Etsch Valleys, in the Province of Bolzano/Bozen, South Tyrol, Italy ( Figure 1). South Tyrol lies on the southern side of the main alpine ridge, and the area has a typical continental alpine precipitation regime, with low total annual precipitation (450-850 mm), a summer characterised by convective rainfall events, and a relatively cold winter with weak Atlantic frontal systems and little precipitation (Hydrographic office South Tyrol). The prevalent soil types on hillsides are Leptosols and Cambisols, while in the valley floor they are gleyic Cambisols (partially calcaric) Fluvisols or Gleysols (Grashey-Jansen, 2010; ). South Tyrol is Europe's largest apple-growing area, at ca.
19,000 ha. The production is sold at about 50% within Italy, 15% to the rest of Europe, and 2% worldwide outside of Europe (Dalla Via and Mantinger, 2012). Vineyards cover about 5500 ha, representing ca. 1% of Italian production (49% regional consumption, 18% rest of Italy, 26% Europe, 7% worldwide). The rest of the agricultural land is covered by pastures of ca. 120,000 ha, meadows of ca. 70,000 ha, and other cultures of ca. 15,000 ha (Agricultural office, South Tyrol). Overall, 22.5% of the land surface is used for agriculture (Tasser et al., 2008).

Soil sample data sourcing
In South Tyrol, viticulturists do not follow specific guidelines for soil management, while apple farmers conduct integrated farming with precise guidelines and regulations that prescribe soil sampling repetition every 5 years (AGRIOS, 2017). About 70% of the farmers in South Tyrol send soil samples to the Public Research Centre of Agriculture and Forestry, Laimburg. The other 30% send them to smaller local laboratories, and thus the data is inaccessible. Thus, the current study focuses on data collected by the Laimburg laboratory during 2006-2013 with a dataset coming from apple orchards and vineyards located in the Venosta/Vinschgau and Adige/Etsch Valley. For each orchard or vineyard parcel, a minimum of 15 soil subsamples were taken in different locations at the same profile depth (0-20 cm) by each individual farmer. Samples from the same profile depth were mixed, and 1 kg of soil material was submitted to the laboratory. It was analysed to determine pH and chemical compositions, and textural soil classes were defined by a feel test (Thien, 1979 (Table 1). Each sample was assigned a unique cadastral parcel code and was joined to the cadastral map, with the sample point position selected as a random location within each cadastral parcel to avoid spatial artefacts caused by parcel geometry. Because of several mismatches between the reference parcel code in the soil sample database and the cadastral maps of some municipalities, it was not possible to georeference all the available soil samples. Finally, the dataset comprised ca. 16,139 sample points. For analysis only, the soil texture, pH and SOM are considered in this study (

Model calibration and validation
For each soil sample, two types of variables were generated: categorical (soil texture classes) and numerical (chemical analyses). Thus, two different geostatistical 8 methods were chosen to spatialise the studied variables: compositional Kriging (Gruijter et al. 1997, Walvoort andde Gruijter 2001) for soil texture classes and ordinary local Kriging for pH and SOM. Although the kriging estimator does not depend on any distribution assumption, if the empirical distribution of the data is skewed then the kriging estimators are sensitive to outliers (Lark, 2000). Thus, for pH and SOM, to properly estimate the patterns of dominant processes with minimal influence of spurious points indicating polluted data, the outliers were investigated before spatial interpolation, and if they exceeded the 1.5 interquartile range (Mcgill et al., 1978) they were excluded from further spatial analysis.
The optimal parameters for the interpolator in both methods were estimated using auto-calibration, which consists of running the geospatialisation function n times (i.e., compositional kriging/ordinary local kriging) for a range of input parameters and selecting the parameter combination that returns the best goodness of fit (GOF). The hydroPSO package (Zambrano-Bigiarini and Rojas 2014) within the R software was used for auto-calibration. For the numerical variables, the set of parameters that returned the lowest root mean square error (RMSE) was selected for the validation and final interpolation. Meanwhile, for the categorical variable, the best parameter set was chosen by selecting the iteration with the highest percentage of correspondence between the estimated and observed texture class.
To verify the model's performance, a 5-fold cross-validation approach was adopted to determine the accuracy and uncertainty for both methods. This approach randomly divides the observed data into five datasets and then trains the model using all except one subset (Hastie et al., 2009). The subset left aside is used for model validation (i.e., GOF estimation).

Spatial interpolation of soil properties
Georeferenced soil samples of pH and SOM were spatialised using ordinary local kriging (OLK) in the R environment (Gräler et al., 2016;Pebesma, 2004). OLK computes the spatial continuity of the dataset by variogram analysis. The model training process consists of computing an experimental variogram on a training dataset and then fitting a variogram model on the experimental one to obtain the model parameters. In this case, the exponential model showed better performance.
The model parameters are then passed to the kriging estimator for interpolation. A comprehensive description of the Kriging method can be found in (D. G. Krige, 1951;Matheron, 1963). The interpolated values are checked against a validation dataset, and the RMSE is computed to assess the accuracy of the interpolation. This process (auto-calibration) is repeated n times, changing the input parameters of the experimental variogram to find the best possible accuracy.
Soil texture classes were derived by feel test; thus, for each reference soil class, the average soil composition in terms of sand, silt, and clay percentage content was assigned using the USDA classification (FAO, 1990;HG, 1937). Soil texture is a compositional variable characterised by particle size fractions that are positivedefinite and sum to a constant, which usually equals 100% (Walvoort and Gruijter, 2001). A change in one component, therefore, results in a shift in all other components. (Pawlowsky, 1984). Therefore, soil texture classes, after characterisation by composition of sand, silt, and clay, were spatialised using compositional kriging in R (Pawlowsky-Glahn and Olea, 2004;Tolosana-Delgado et al., 2009). Compositional kriging is a straightforward extension of ordinary kriging, and its description can be found in (Walvoort and Gruijter, 2001). The spatialised soil components were finally aggregated into USDA soil texture classes. The original eight classes of the German soil texture classification (Ad-Hoc-AG, 2005) were aggregated to five classes according to the USDA texture (Soil Survey Staff, 1999) classification ( Figure 2 and Table 2), which were cross-validated against the validation dataset to quantify accuracy. A raster mask using the Corine Land Cover 2012 was adopted to constrain the interpolation to only agricultural fields in the valley bottom.
The final spatial resolution is 100 m × 100 m.

Farmer data sourcing and spatial analysis
The two main farm land uses resulted in an uneven sample distribution (Figure 3 points per km 2 with an average of 24.9 points per km 2 . The mean sample density for the entire dataset is 54 with an average sample distance of 136 m (Figure 3). None of the continuous variables is normally distributed (Figure 4). The raw data highlighted that several outliers might influence the magnitude of the skewness (see box plot in Figure 4). Soil pH ranged from very acidic (pH 4.09) to moderately alkaline (pH 7.93), with a median of pH 6.94 and standard deviation of ±0.43. pH showed a low relative variability 13 with CV of 6.3% and quasi-normal distribution with a moderate negative skewness of -0.73.
Soil SOM ranged from 0.59% to 44% with median of 3.89% and standard deviation of ±2.12.
The large skewness of 4.83 and high relative variability of 49.2% in SOM is due to the presence of extreme values on the far-right side of the distribution. Log transformation of the data was performed, and outliers exceeding the 1.5 interquartile ranges were excluded from the geostatistical analysis to overcome issues due to skewness.  During spatial interpolation, urban areas, industrial sites and forests were masked out to avoid inaccurate and unvalidated spatial prediction for these land uses. Thus, overall, the large data set combined with an extensive calibration and validation approach created robust estimates of spatially distributed soil properties.

15
The topsoil map of USDA textural classes is presented in Figure 5. The soil texture composition due to sediment deposition changes from upstream (upper part of Figure

Farmer data sourcing
The synergies between farmers as data providers, regulatory bodies for integrated farming, soil management, and a centralised database indicate a successful best practice for extensive and long-term soil surveying and monitoring in agricultural areas (Figure 8). Linking soil management by means of collecting time series of soil property data in a centralised database can provide information, spatial patterns and trends that are more valuable than disentangled point-wise analysis ("ISRIC -World Soil Information," 2014). Furthermore, the large number of samples with an average density of about 50 points per km 2 allow a good characterisation of soils in mountainous areas. This is due to the interplay of river floods, complex topography and geology, debris flow and glacier retreatment (Geitner, 2007;Grashey-Jansen and Schröder, 2009). Moreover, the relative sample density indicates the possibility to largely increase sample density and map accuracy. Nevertheless, the highest sample density was in apple orchard fields while vineyards had generally lower sample density. This is due to the soil management guidelines for integrated fruit production (AGRIOS, 2017). These results highlight how apple orchard production guidelines, prescribing precise soil management practices, have allowed the collection of a large amount of soil information. Adopting similar guidelines in vineyards would greatly improve data availability and spatial distribution, hence improving soil mapping spatial prediction. However, apple orchards still represent 80% of the permanent crop fields in South Tyrol. Soil sampling repetition creates a robust time series for long-term monitoring of physical and chemical properties for tracking fertility changes over time (Havlin et al., 2005).

Spatial analysis
Despite the large number of georeferenced soil samples, still 10% of the utilised data could not be georeferenced owing to mismatching between the soil sample associated parcel code and the local cadastral map. The development of an up-to-date and common spatial data infrastructure is currently ongoing to automate data ingestion and to prevent loss of data.
Auto calibration of the variogram fitting model and local kriging parameters permitted a robust selection of the optimal parameters with the best estimate performance. Generally, with more than 10,000 samples, validation can be carried out using simple fold crossvalidation; 10-20% of the data are randomly selected for validation, and the remaining 80-90% for training, respectively. However, a 5-fold cross-validation was preferred in favour of a more robust validation. The high sample number and density in this study ensure satisfactory performance of the interpolation models (Stahl et al., 2006;Webster and Oliver, 1992). The low nugget value indicates an adequate sample number and spatial variability of the studied variables. Nevertheless, the ranges for all the variables were lower than the corresponding mean sampling distances, indicating irregular sample distribution. Thus, it should be noted that the accuracy of Kriging models requires a sound sample design in terms of quality, number, and spatial distribution (Hengl et al., 2007a). For pH and SOM maps, the best fitting models were as found in similar studies (Bogunovic et. al., 2017;Mousavifard et al. 2013;Reza et al. 2016). Soil texture spatial prediction showed an accuracy comparable to what is generally presented in the literature (Hengl et al., 2007b). Moreover, with soil texture data almost doubling in quantity every 5 years, future soil texture spatial prediction will improve in accuracy.

Soil property maps
The results presented in this work show the complex interplay of fluvial processes as well as anthropogenic influences on the variability of soil texture, pH and SOM. The large number of samples combined with the even spatial distribution have contributed to the robust estimation of the soil texture, pH and SOM patterns. The feel test for soil texture classification, despite being an empirical soil texture estimation method, proved to be sufficiently accurate and could replace laboratory analysis for soil mapping purposes (Vos et al., 2016). Moreover, while chemical properties change over time, in dam-regulated networks like the Adige/Etsch river, soil texture generally maintains its properties. Thus, soil texture samples that cover a longer time span can be used to improve future soil spatial prediction. A general limitation of the presented results is that they refer only to the topsoil; this is, however, relevant in agriculture as this is where organic matter and microorganism biological activity are concentrated (Nair, 2016). Data on subsurface soil (20 -40 cm) as well as other agricultural land uses exist, but with a 10-fold lower sampling density. Therefore, spatial prediction of subsoil physical and chemical properties cannot be accurate. However, with the data growing over time, subsoil texture spatial prediction will become doable. Moreover, a detailed laboratory particle-size analysis would allow the estimation of the exact percent fraction distribution for a more detailed soil texture classification and hydraulic property estimation (Clapp and Hornberger, 1978;van Genuchten, 1980). Grashey-Jansen (2014) (Atucha et al., 2011;Mcgourty and Reganold, 2005). Finally, the integration of soil texture, SOM and pH data with the existing soil chemical data not analysed in this study can provide further information for estimating the spatial availability of specific nutrients (Fageria et al., 2002;Singh, 1994), predicting plant water availability (Saxton and Rawls, 2006), estimating soil water hydraulic properties for hydrological and hydropedological application (Thompson et al., 2012), fertilisation management (Grant, 2016) and carrying out ecological risk assessments (Suter II, 2016) .

Conclusions and outlook
The synergies between farmers as data source, farming guidelines and the public administration as in South Tyrol provide a promising framework for extensive and long-term soil monitoring in agricultural areas. Linking all individual farmers' agronomic practices can potentiate soil knowledge to address issues beyond the individual farmer's agronomic production. The production of detailed maps of soil properties is a crucial step in precision agriculture and helps in monitoring the use of agricultural compounds for pest management and fertilisation as well as to improve water use efficiency. These results can be used to make recommendations for best management practices within the local agriculture consultancy and public authorities. This study presents few of the available variables; further studies should consider the rest of the data, paving the way to digitally assess spatial patterns of nutrient availability and carrying out ecological risk assessments for sustainable agriculture and overall soil security. The integrated farming long-term data collection program will enhance the reliability of the produced maps over time and the estimated trends and changes in physical-chemical soil properties. This would open new paths toward accurate monitoring in the field of sustainable agriculture and creating long-term plans for soil security.