The spatial dynamics of wheat yield and protein content at the field scale 1

10 Wheat is a staple crop that is critical for feeding a hungry and growing planet, but its nutritive value has 11 declined as global temperatures have warmed. The price offered to producers depends not only on yield but 12 also grain protein content (GPC), which are often negatively related at the field scale but can positively 13 covary depending in part on management strategies, emphasizing the need to predict their variability within 14 individual fields. We measured yield and GPC in a winter wheat field in Sun River, Montana, USA and 15 tested the ability of normalized difference vegetation index (NDVI) measurements from an unpiloted aerial 16 vehicle (UAV) on spatial scales of ~10 cm and from Landsat on spatial scales of 30 m to predict them. 17 Landsat observations were poorly related to wheat measurements. A multiple linear model using 18 information from four (three) UAV flyovers was selected as the most parsimonious and predicted 26% 19 (40%) of the variability in wheat yield (GPC). We sought to understand the optimal spatial scale for 20 interpreting UAV observations given that the ~ 10 cm pixels yielded more than 12 million measurements 21 at far finer resolution than the 12 m scale of the harvester. The variance in NDVI observations was 22 ‘averaged out’ at larger pixel sizes but only ~ 20% of the total variance was averaged out at the spatial scale 23 of the harvester on some measurement dates. Spatial averaging to the scale of the harvester also made little 24 difference in the total information content of NDVI fit using Beta distributions as quantified using the 25 Manuscript Click here to access/download;Manuscript;UAV_precisionag_submit.docx

Kullback-Leibler divergence. Radially-averaged power spectra of UAV-measured NDVI revealed 26 relatively steep power law relationships with exponentially less variance at finer spatial scales. Results 27 suggest that larger pixels can reasonably capture the information content of within-field NDVI, but the 30 28 m Landsat scale is too coarse to describe some of the key features of the field, which are consistent with 29 topography, historic management practices, and edaphic variability. Future research should seek to 30 determine an 'optimum' spatial scale for NDVI observations that minimizes effort (and therefore cost) 31 while maintaining the ability of producers to make management decisions that positively impact yield and 32 GPC. 33 34

Introduction 35
Crop yields are often quite variable within individual fields due to differences in soil fertility and 36 topography, weediness, and management efforts, but also for reasons that are not entirely clear [1]. Canopy 37 spectral reflectance indices like the normalized difference vegetation index (NDVI) are useful for 38 estimating crop yield at multiple scales in space [2-4] because the absorption and reflectance of red and 39 near infrared wavelengths is a good proxy for leaf area, which in turn is a good proxy for growth [5] and 40 yield [6]. Following this notion, the yields of many different crops have been estimated using NDVI and 41 related vegetation indices using aerial and satellite-based platforms [7][8][9]. 42 Other crop attributes also determine price, like grain protein content (GPC) for the case of wheat 43 (Triticum aestivum L.) [10,11]. Understanding GPC is critical not only for agricultural management [12] 44 but also the global food system as it is predicted to decrease in a changing climate [13]. Wheat yield and 45 GPC are often inversely related within a field [14-16] because water stress during grain filling increases 46 GPC but decreases yield [17]. Despite this, yield and GPC can be positively related depending on edaphic 47 properties and management interventions [16,18], with great advantage to producers. Field-scale 48 management can therefore be improved by understanding relationships between NDVI, yield, and GPC. 49 The spatial variability of GPC has been successfully estimated from NDVI and other vegetation 50 We combined the four dates of UAV NDVI imagery into a single raster file for spatio-temporal 111 classification. We used k-means unsupervised classification in Erdas Imagine (Hexagon Geospatial, 112 Norcross, GA), with 50 initial classes. From these, we used the Grouping Tool to create three classes from 113 the 50 original classes using expert knowledge of the field (topography, geology, soil distribution, etc.) to 114 logically combine classes. We then imported the three-class classified map into ArcMap (Esri, Inc., 115 Redlands, CA), created masks for each group, and extracted the NDVI values for each of the four dates. 116 We averaged the NDVI values for each date and class to create four-date trajectories of average NDVI. 117 118

Comparison of NDVI to yield data 119
Georeferenced ('GPS-tagged') wheat yield and GPC measurements were made using a combine yield 120 monitor during harvest (Fig. S1). These data were cleaned using a Yield Editor tool ( We calculated the change in total variance of NDVI that results from averaging with increasingly large 132 pixels to understand how variance is "averaged out" at coarser spatial scales, often called the 'grain' of the 133 image, not to be confused with the grain crop. NDVI varies between 0 and 1 in the absence of water bodies 134 and, if unimodal, can be modeled as a Beta distribution [38] as increasingly used for studies of plant cover 135 [39]. We fit Beta distribution parameters using observations from the original images and the spatially-136 averaged images using maximum likelihood methods. We then calculated the change in information content 137 that results from spatial averaging using the Kullback-Leibler divergence (DKL) for the case of a Beta (1) 140 where and are the shape parameters of the Beta distribution of NDVI from the original image, ′and 141 ′are the parameters of the Beta distribution after spatial averaging, B is the beta function, and ( )is the 142 digamma function: 143 where ( )is the gamma function. 145 To quantify scaling relationships within the field on the different measurement days we calculated the 146 radially-averaged power spectral density (Y) of each NDVI image [40,41] with Fatiando a Terra v0.5 for 147 Python [42], and interpreted the resulting spectra in terms of its power law exponent b [43,44]: 148 where k is scale (m −1 ) and c is a normalization constant. relatively high, medium, or low NDVI trajectories across the growing season ( Figure 3). This classification 156 -and the images themselves -reveal NDVI patterns with different characteristic length scales from 157 centimeters to hundreds of meters, with implications for yield, GPC, and within-field management 158 opportunities. 159 160

Relationships between NDVI and wheat yield 161
NDVI measurements from each UAV flyover were significantly related to yield (P < 0.05, Figure 4), but 162 Landsat NDVI observations only explained 1% of its variability. NDVI measurements from June 8 163 (NDVIJune8) and July 12 (NDVIJuly12) explained 20% or more of the variability of wheat yield (Figure 2 top), 164 but NDVIMay19 and NDVIJuly20 explained less than 14%. Linear model selection using AIC indicated that a 165 model that summed NDVI measurements from all periods ( NDVI) explained nearly 25% of the variability 166 in yield ( Figure 5A) and represented 59% of the weight -the relative likelihood -across all models tested. 167 Assuming a linear relationship between each NDVI observation and time, creating a NDVI product for 168 every day, and summing the subsequent interpolated values did not improve the model ( Figure 5B). The 169 model with the highest R 2 , 170 Yield = −11520 + 963.2 ×NDVIJuly1 + 3750 ×NDVIJuly20 + 7254 × NDVIJune8 + 8617 × NDVIMay19, 171 explained 26% of the observed variability in yield, similar to the linear model as a function of NDVI. In 172 other words, a model with four discrete NDVI measurements explained slightly more variability in yield 173 than a measurement that included only their sum but was penalized by the AIC analysis for having more 174 parameters. 175 176

Relationships between NDVI and grain protein content 177
NDVIMay19 explained 30% of the variability in GPC. NDVIJuly19 was also significantly related to GPC (P < 178 0.05) but only explained 6% of its variability (Fig. 6). Model selection using AIC chose a model that 179 includes NDVIMay19, NDVIJuly20, and a negative relationship with NDVIJune8, but not NDVIJuly12: 180 This model explained 40% of the variability in GPC and represented 59% of the weight across all models 182 tested (Fig. 7). The remaining 41% weight was represented by a model that includes NDVI on all dates 183 including a negative term for NDVIJune8, meaning that the most parsimonious model would be represented 184 by a combination of 59% of the model that included three NDVI dates and 41% of the model that included 185 all four. We also explored Red Edge as an alternative to NDVI, but this explained about 1% less of the 186 variability in GPC and likewise did not improve the model for yield. Over 50% (75%) of the total variance of the NDVIMay19 (NDVIJuly20) image was lost when aggregating to 195 the scale of the harvester and Landsat, but only ⅓ of the total variance of the NDVIJune8 image was lost at 196 the 30 m Landsat scale. The earlier NDVI measurements (May 19 and June 8) had substantial negative 197 skew (Fig. 8D), indicating the presence of areas in the field with far lower NDVI than the mean that are 198 likely candidates for management intervention. This skewness was also 'averaged out' at larger spatial 199 scales, especially the NDVIMay19 image whose skewness changed from −4 to −0.5 upon averaging to the 200 Landsat scale. 201 The DKL quantifies the change in information content between the original and spatially-averaged 202 images. It increased rapidly at spatial scales larger than 30 m ( Figure 9A) but was less than 0.15 (0.25) at 203 the harvester (Landsat) scale for the NDVIMay19, NDVIJune8, and NDVIJuly1 images. (The DKL for the 204 NDVIJuly20 image was consistently much larger and is not shown in the figures for clarity.) Changes to the 205 parameter (i.e. ') dominated DKL for the May 19 and June 8 images as spatial grain became larger, and 206 changes to the β parameter (i.e. β') dominated DKL for the July 1 image. 207 The power law exponent (i.e. b) of the radially-averaged power-density spectra was constant at b 208 = 2.3 (2.4) for the June 8 (July 1) images across all scales (Fig. 10) noting that the July 1 image has more 209 total variance than the June 8 image (Fig. 8B). There was notable variability in all spectra and a scale break 210 in the May 19 and July 20 images on the order of 6 m −1 (i.e. ~17 cm) and b decreased faster at spatial 211 frequencies larger than this value, especially in the May 19 image when it decreased from −2 to −3.2 (Fig.  212 10). There was also notable variability in all spectra at 20.6 m −1 , about 5 cm (Fig. 10). Some of the minor 213 peaks at lower spatial frequencies present in the other images were absent in the June 8 image which 214 suffered from less information loss at larger spatial scales than the other images (Fig. 8C). it is essential to explore the boundaries of what is practical and necessary to improve agricultural 220 management and sustainable production. We discuss how the interpretation of NDVI at fine spatial scales 221 can provide producers with the correct amount of information -not too much and not too little -to 222 understand within-field variability. 223

Spatio-temporal patterns of NDVI 224
Areas of consistently higher NDVI values through the growing season were located in the SW portion of 225 the study field in an area of lower topography that likely benefits from water drainage in characteristically 226 dry north-central Montana (Figs. 1 & 3). There was an E-W swath of higher NDVI values that was identified 227 as an old fence line where blowing soil likely accumulated in prior decades and improved fertility. Areas 228 of moderately high NDVI values were widely distributed throughout the field and were clearly observed 229 along thin linear features, especially in the NE portion of the field, thought to be associated with the edges 230 of shale cracks and improved plant access to deeper soils. Areas of consistently lower NDVI values through 231 the growing season were primarily clustered in the northern, higher elevation portion of the field, likely 232 associated with lower water retention and thinner soils. Such observations can guide further soil sampling, 233 which are key to further improve yield prediction [45]. Note that these patterns are not readily apparent to 234 the human eye, to which the field appears largely homogeneous (Fig. 1B). 235 From this analysis it is apparent that NDVI observations provide rich spatial information to 236 producers, but all four UAV flights were necessary to identify key features; note for example that many of 237 the features identified by the unsupervised classification (Fig. 3) were not apparent in the May 19 image 238 ( Fig. 2A). NDVI measured early in the growing season can predict eventual yield [46] but feature 239 identification relied on all of the images, as did the best model for yield prediction (Figs. 4 & 5). NDVI 240 from the May 19 image alone was able to explain 30% of the variability in GPC (Fig. 6A), and additional 241 observations increased predictive power by 10% (Fig. 7). Management interventions during earlier dates, 242 especially during the wheat heading stage, are candidates for N top dressing, the major within-season 243 management correction that producers can take to enhance GPC [47]. In other words, all of the images 244 produced information that can be useful for understanding the idiosyncrasies of an individual field but 245 earlier information can guide management. One potential approach to maximize information and minimize 246 effort is to make multiple flyovers during initial investigations to understand the properties of individual 247 fields, then reserve flights in future years for early periods of the growing season to identify deficiencies 248 from expected crop growth patterns. 249

NDVI as a function of spatial scale 250
It is readily apparent that the high-resolution information from the UAV flyovers greatly exceeds the yield 251 and GPC information that the harvester is able to provide, creating a scale mismatch that can be understood 252 by exploring the consequences of spatial averaging of the NDVI images. At least 22% (June 8) and up to 253 75% (July 20) of the observed NDVI variance is averaged out at the scale of the harvester, 12 m ( Fig. 8B-254 C), which makes much of the information content of the UAV NDVI images irrelevant for understanding 255 yield and GPC collected at coarser scales. Notably, many of the underperforming areas visible early in the 256 May 19 image by its negative skew (Fig. 8D) were averaged out at larger spatial scales. That being said, 257 the practical consequences of high skewness in the case of the study field may be unimportant; less than 258 0.1% (10,000) of the nearly 12.3 million NDVIMay19 observations had an NDVI of less than 0.8 on May 19. 259 Instead of dwelling on information loss with spatial averaging, there are many features of NDVI at coarser 260 spatial scales that might be considered promising for a simpler description of its spatial variability. 261 In addition to the relatively low loss of variance in the June 8 image, the DKL analysis reveals low 262 information loss compared to the other images (Fig. 9A). This means that the shape of the Beta distribution, 263 as defined by its parameters (Fig. 9B-C), was largely maintained upon spatial averaging. In other words, 264 parameters fit from data at coarser spatial scales are a reasonably good approximation for those fit from 265 data at finer scales. It helps that NDVI in our case follows unimodal distributions in all cases. that 'oversamples' within-field wheat spectral reflectance at hyperspectral, 'hypertemporal', and 281 hyperspatial resolution to quantify the information that is necessary to predict yield and GPC, as well as the 282 information that is unnecessary. By quantifying the benefits, but also the costs, of information acquisition, 283 producers can gain a richer understanding of the most cost-effective information to collect to manage wheat 284 yields and GPC and continue feeding a growing populace.