Frequent satellite observations of leaf area index will improve empirical estimation of grain yield

Large-area empirical yield estimation from space has long been constrained by the lack of suitable combinations of spatial and temporal resolutions of satellite image time series. Consequently, it is likely that the selection of metrics, i.e., time series descriptors used to predict grain yield, was driven by practicality and data availability rather than by systematic targetting of critically sensitive periods suggested by knowledge of crop physiology. The availability of data from new satellites is driving a trend towards increased spatial and temporal resolutions. This raises the question of how temporality affects performance and the choice of metric required to achieve optimal performance. To answer this question, we followed an in silico approach and generated daily time series of Leaf Area Index (LAI) using a crop growth model. In comparison with currently limited observational data, this approach can generate any observation frequency, explore a wider range of growing conditions and drastically reduce the cost of measuring yields in situ. Wheat crops were thus simulated under nine management strategies for 30 years at 50 locations representative of the Australian grain zone. Six types of LAI metrics were extracted to predict grain yields. Between 30% to 80% of the yield variability could be explained depending on the metric. The high accuracy achieved by advanced metrics (Senescence fit and Fourier decomposition) is remarkable since it is obtained from linear models calibrated for the whole of the grain zone, contrasting seasons, and multiple cultivars. Adding weather variables doubled the R of models based on simple metrics (peak LAI and Early/Late window) but did not significantly improve models based on advanced metrics. This constitutes strong evidence that metrics intensively exploiting the temporal dimension of LAI already integrate the information content of weather variables. Simple metrics remain nonetheless the best choice when data are sparse. As we progress in a data-rich era, these results provide evidence for a shift towards the inclusion of metrics that truly harness the temporal dimension of LAI time series.


Introduction
Estimating crop production, particularly that subject to international trade, is becoming an urgent imperative for global food security due to the growing world population, shifts in diets, and the development of biofuels. Instead of quantifying production directly, it is common practice to address its constituent terms: crop area and crop yield. The latter can itself be thought of as the product of Genetics, Environment, and Management (G × E × M). Agrometeorological models have long been used with some success to estimate yield on a regional basis. Satellite remote sensing has gradually become instrumental in assessing crop yields because several Vegetation Indices (VIs) derived from spectral data integrate some G×E×M effects, from meteorological factors such as precipitation and solar radiation to cropping practices such as fertilization and irrigation. For instance, Colwell, Rice and Nalepka (1977) concluded that a considerable amount of the variance in individual field yield which is not explainable by meteorological data can be accounted for by multi-spectral satellite data.
The simplest approach to estimate crop yields is to establish empirical relationships between ground-based yields and VIs or metrics describing the VI time series. These empirical models rely on the correlation of spectral bands (and of their combination) with some biophysical properties of the crops, such as Leaf Area Index (LAI), which are themselves related to final yields (Ferencz et al., 2004). A large range of VIs have been tested, with mixed results, in different regions and for different crops including the well-known Normalised Vegetation Index (NDVI; see Labus et al. 2002), the Enhanced Vegetation Index (EVI; see Duncan, Dash and, Atkinson 2015) and the Green Chlorophyll Vegetation Index (GCVI; see Lobell et al. 2015). Capitalising on the ability to retrieve biophysical variables from satellite data (Li et al., 2015;Verrelst et al., 2015), some attempts have empirically correlate biophysical variables to yield (see Baez-Gonzalez et al., 2005 andLambert et al., 2018, for instance). The principle remains the same as with VIs: metrics are extracted from time series of biophysical data which are then related to measured yields.
The lack of sufficient field-or pixel-level yield measurements for model calibration and validation has long hindered attempts to deploy empirical models at scale. In addition, empirical models are specific to the crop cultivars, the crop growth stages, and the geographical regions they were calibrated on (Doraiswamy, 2004;Fang, Liang and Hoogenboom, 2011) and therefore they do not generalise well in data-poor contexts. Recently, Lobell et al. (2015) proposed an elegant solution to solve the problem of calibration data. Their approach calibrates an empirical model off outputs from the Agricultural Production System sIMulator (APSIM; Holzworth et al. 2014), a thoroughlyvalidated crop model, which can then be rapidly applied to the remotely-sensed data. Their model also combines weather co-variates and multiple image dates, which relaxes the constraints of being useful for a single combination of site, year, and image date. This kind of approach was successfully deployed across large areas in the US and India and has explained between 30% and 80% of the yield variability, depending or the crop, the study area, and the implementation (Lobell et al., 2015;Azzari, Jain and Lobell, 2017;Jin, Azzari and Lobell, 2017). As a result, empirical models could be calibrated anywhere in the world based on outputs of locally-calibrated crop models in lieu of costly in situ yield measurements.
Empirical yield estimation from space has also been constrained by the trade-off between the spatial and temporal resolution which restricted the use of high spatial and temporal images for agricultural applications (Inoue, 2003). Data availability hampered multi-temporal analyses or these were limited to time series with coarser spatial resolutions, leading to pixel purity issues (Duveiller and Defourny, 2010) which are particularly challenging in complex landscapes (Waldner, Duveiller and Defourny, 2018). Consequently, it is likely that the choice of metrics, i.e., time series descriptors used to predict grain yield, was driven by practicality and data availability rather than by systematic targetting of critically sensitive periods suggested by knowledge of crop physiology. Constellations of satellites, e.g., Sentinel-2 A and B (5-day revisit, 10-m resolution; Drusch et al. 2012), or the Dove constellation from Planet (daily global coverage at 3 m with 175+ satellites; Butler 2014), have opened an avenue for overcoming these spatio-temporal restrictions. Therefore the advent of hyper-temporal data offers an unprecedented opportunity to revisit empirical yield estimation and explore new alternatives to exploit finer and denser temporal patterns.
Our overarching goal is to advance routine yield assessment across the Australian grain zone based on satellite observations and to do so by leveraging the capabilities of the most recent and upcoming imaging systems. Here we evaluated how the density of LAI observations affects performance and the choice of metric required to achieve optimal performance. We premised our work on three observations widely supported by evidence from the literature: (1) crop growth models accurately model yield and leaf area. For instance, wheat model within APSIM has been extensively validated across Australia and internationally in a range of experimental and farm conditions (Arango et al., 2004;Hochman et al., 2007;Carberry et al., 2009;Zheng et al., 2013;Brown et al., 2014;Holzworth et al., 2014); (2) leaf area index can be retrieved from satellite images (see Verrelst et al., 2015 for a review); (3) empirical yield model calibrated off simulated yield in lieu of measured yield have successfully predicted actual yield based on remotely-sensed data (Lobell et al., 2015;Azzari, Jain and Lobell, 2017). The direct implication is that crop models such as APSIM can be used as data generators to evaluate different forms of empirical models in silico.
This in silico approach has the following advantages: (1) the G×E×M conditions that can be explored in silico are more diverse than what observational data would otherwise allow, which improves generalisation; (2) in silico data can simulate forthcoming temporal resolutions or mimic current imaging systems lacking sufficient archive data, which facilitates systematic comparisons; and (3) in silico testing yields these results for a fraction of otherwise prohibitive costs associated with image acquisition and in situ yield data. We deployed APSIM to simulate wheat growth during 30 consecutive seasons under 10 management scenarios at 50 locations across Australia. These simulation provided time series of LAI and yields which were used to calibrate and evaluate different types and configurations of empirical models. It should be emphasised that our objective was neither to predict past wheat yields nor to apply our findings to remotely-sensed data. Rather we sought to generate likely LAI time series and yields under a variety of growing conditions and evaluate present and forthcoming observation capabilities.
Our main contributions are three-fold: • We provide a systematic comparison of LAI metrics and highlight that linear empirical models with advanced metrics (e.g., Senescence Fit, or Fourier Decomposition) capture up to 80% of the yield variability. This is remarkable because the generalisation of empirical models has often been criticised. Therefore, this suggests that models using metrics that truly harness the temporal dimension of the LAI data can achieve regional to national relevance; • We evaluate the contribution of weather variables to the overall performance and show that they can double the accuracy of models calibrated with simple metrics such as peak LAI. Average and cumulative maximum temperature as well as cumulative rainfall after anthesis are particularly strong predictors. with advanced metrics, there is no significant improvement when adding weather variables because their effects are already captured by the metrics; • We quantify the loss of accuracy that occurs when the temporal resolution decreases. In particular, we show that simple metrics remain competitive in data-poor contexts.
These results can serve as a guideline for selecting an appropriate metric depending on the temporal availability of earth observation data at hand.

Data
Australia is one of the top five wheat exporting countries in the world and accounts for 11% of global wheat trade during 2015 (Workman, 2017). It is estimated that 55% of Australian cropland is occupied by the current wheat area of ̴ 14 Mha. Wheat yields in Australia have experienced substantial increase but evidence suggests that they have stalled at an average of 1.7 t ha -1 since 1990 (Hochman, Gobbett and Horan, 2017). Wheat is sown around mid-May and is harvested from November to January (Hochman et al., 2009). The average field size exceeds 100 ha and irrigation is marginal. We deployed the APSIM-Wheat model Version 7.8 (Holzworth et al., 2014) to obtain LAI time series and their corresponding yields. APSIM simulates growth and development of a wheat crop at a daily time-step in response to weather (radiation, temperature, rainfall), soil water and soil nitrogen, and management practices. It calculates daily biomass accumulation using light interception and radiation use efficiency which is penalised under current water and nitrogen stress. Growth of leaf area is modelled daily using initial leaf area, leaf appearance rate and the relationship between plant leaf area and their processes are sensitive to daily temperatures. Grain yield is a function of grain number and grain weight. Grain number is determined pre-anthesis by stem weight and subject to reduction due to water stress during anthesis. Final weight per grain is determined by carbohydrate remobilization, photosynthesis during grain filling, and the grainfilling period which is accelerated by temperature and water stresses.
Continuous wheat was grown from 1981 to 2015 at 50 high-quality weather stations ( Figure 1) using the dominant soil type and nine management rules proposed by Hochman and Horan (2018). Note that similar model calibrations at these locations explained 67% and 65% of the national and sub-national wheat yield variability, respectively Hochman, Gobbett and Horan, 2017) and therefore no further validation was undertaken here. Therefore the model outputs are reliable. Model runs from 1981 to 1999 were used to reach a credible soil water content and were thus discarded in further analysis. We selected five wheat cultivars spanning a range of maturity types, namely: Bolac, Endure, Wyalkatchem, Derimut, and Correll. The parameterisation of these varieties were kept to their default values.
The management scenarios were built around nine variants of standard simulation rules (Table 1 and Table 2) so as to cover a range of management practices. All sites in Queensland and northern New South Wales above latitude -32.24° were classed as northern sites and used the northern sowing rule, all other sites used the southern sowing rule. If the sowing criteria were not met during the sowing window, a crop was automatically sown on the 15 th of July. Variants of the standard rules included changes in the rate of nitrogen fertilisation (N-fertilisation), plant density (50, 75, 100, and 125 plants ha -1 ), sowing rule, and fallow management. Simulations with a maximum LAI value < 1 were discarded because they were likely associated with simulations of failed crops. The final data set had 7,712 entries.

Southern sites
Sow if rain ≥ 25 mm over 3 days regardless of soil moisture from 26 April to 15 July.

Sow-2 Northern sites
Sow 2 weeks after rain ≥ 15 mm over 3 days and PAW ≥ 30mm from 26 April to 15 July.

Southern sites
Sow 2 weeks after rain ≥ 15 mm over 3 days regardless of soil moisture from 26 April to 15 July.
If sowing criterion is not met by 15 July, sow 2 weeks after 15 July.
Sow-3 Sow using highest yielding sowing date from analysis of crops sown every 7 days from 5 April to 21 June using highest yielding cultivar.

Fallow
To simulate the effect of weeds during the fallow, plant available water was reduced by up to 30 mm on 25 April by removing 70% of the plant available water from each layer starting from the top layer Other data sources included daily weather data and monthly average cloud frequencies.
Three key weather variables driving crop growth were sourced from the Australian Bureau of Meteorology, namely air temperature (T), vapour pressure deficit (VPD), and precipitation (P) (Australian Bureau of Meteorology, 2015). Monthly mean cloud frequencies were sourced from Wilson and Jetz (2016). This data set integrates 15 years of twice-daily remotely sensed cloud observations at 1-km resolution. We applied a linear interpolation to generate daily cloud probability assuming the monthly average was representative of the 15 th of each month. To avoid artifacts, values for December and January were duplicated at the end and the start of the time series, respectively. Therefore, daily cloud probabilities were interpolated based on an input time series of 14 values and the first 16 (December 15 th -December 31 st ) and last 15 values (January 1 st -January 15 th ) were discarded.

Methodology
The empirical yield prediction model followed the following form: where is a vector of LAI metrics, is a vector of weather attributes over the season, and 0 , 1 , and 2 are the associated coefficients. In section 3.1, we retricted the empirical model to LAI metrics ( = 0 + 1 ) and assessed its performance for three time scales (calendar time, thermal time, and phenological time). The addedvalue of weather variables ( ) for yield prediction was evaluated in section 3.2, where the importance of weather variables was quantified by partitioning the coefficient of determintation. In section 3.3, we investigated the loss in accuracy due to reduced observation frequencies. Finally, we mapped optimal metrics for three temporal revisit frequencies (5, 10 and 16 days) across the Australian wheat area.

LAI metrics
To quantify the yield variability that LAI metrics can explain, we extracted a set of metrics from the LAI time series provided by the simulations according to several procedures (Table 3). The first approach (Peak LAI) identifies the maximum LAI value from the time series as it corresponds to the onset of the reproductive stage which is a critical period for the determination of wheat yield (Fischer, 1975). Empirical evidence has also shown that the best singledate correlation between wheat yield and LAI occurs at the time of highest LAI which concurs with the transition from the vegetative stage to grain filling (Lopresti, Di Bella and Degioanni, 2015;Johnson, 2016).
Using the second method (Early/Late Windows), maximum LAI values observed during two windows, one early (day of year 203 -day of year 253) and the other late in the season (day of year 274 -day of year 314) were derived (Lobell et al., 2015).
Integration of seasonal LAI profiles was also examined as a third and fourth methods for feature extraction. The integration of satellite observations over time was shown to represent the intensity and the duration of the photosynthetic activity of the crop throughout the growing cycle well and, as a result, it was highly correlated with the actual yield (Tucker et al., 1981;Rudorff and Batista, 1990;Labus et al., 2002). Benedetti and Rossini (1993) argued that time interval chosen is critical, and recommended to start the integration not from the beginning of the crop cycle, but from the beginning of nutrient substance accumulation in storage organs, which corresponds to flowering in wheat. Therefore we evaluated two integration approaches and computed the area under the curve for the entire LAI profile (Integral) and from peak LAI to harvest (Partial Integral).
The fifth approach estimated wheat yield from the senescence rate and other parameters as suggested by Idso et al. (1980) and Baret and Guyot (1986). We fitted a modified logistic model (Gooding et al., 2000) that describes the senescent part of the curve with three parameters : where refers to the maximum value of LAI; is the position of the inflection point in the decreasing part of the LAI curve; is the relative senescence rate; and is input time.
To further exploit the temporal information, we adopted a Fourier analysis approach which is known to capture the temporal dynamics while reducing the dimension and the noise (Geerken, Zaitchik and Evans, 2005). Fourier or harmonic analysis transforms an input signal from the time domain into the frequency domain. In a closed interval [0; ], this approach assumes that the signal ( ) can be decomposed into a series of sine-waves with increasing frequencies (Schönwiese, 2006): The result of a discrete Fourier transform is a complex number with a real ( ) and an imaginary ( ) part that can be converted to polar form. Then, each harmonic wave can be defined by a phase and an amplitude (Jakubauskas., Legates. and Kastens., 2001): = √ 2 + 2 (4) = arctan (5) Together with the additive term ( 0 ), the harmonic components can together reconstruct the initial signal. By discarding higher order harmonics, it is possible to retrieve lower noise signal. In this study, we kept the additive term and the first two harmonics as predictors of yield. Phase of the two first harmonics

Three time scales
Measuring time in calendar days has been the dominant approach in remote sensing because it matches the acquisition dates of the satellite images. However, this approach might be limited when dealing with large G×E×M variations, e.g., for estimating yield at national scale. There is a considerable advantage in describing crop development based on thermal time units as the duration in thermal time required to reach a certain ontogenetic phase is relatively constant, while that in calendar days may considerably vary (Purcell, 2003). Relying on phenological stages is a further refinement that accounts for vernalisation and/or photoperiod requirements which affect the rate of crop development.
All LAI metrics were derived for these three time scales, i.e., calendar time, thermal time, and phenological stages. Thermal time was computed following Zheng et al. (2013) and the phenological stages were described using Zadok's decimal scale as simulated by APSIM.

Model evaluation
Empirical models were calibrated using 50% of the data set (n = 3,845) and validated with the remaining 50% (n = 3,867). Note that, to avoid any optimistic bias, the split between the calibration and validation sets was done to guarantee that all simulations relative to a station-year would either belong to the calibration or validation set. The performance of the models was quantified using the Root Mean Square Error (RMSE) and the coefficient of determination (R 2 ). The RMSE gives the weighted variations in error (residual) between the predicted and observed yields while the R 2 expresses the percentage of variance explained by the model.

Contribution of weather metrics
First, the accuracy of a yield model only based on weather features ( = 0 + 2 ) was quantified. To that aim, we extracted 14 weather variables by averaging daily observations (VPD, minimum T and maximum T) and summing daily observations (P, VPD, minimum and maximum T) before and after the peak of LAI. Then we evaluated the merit of adding meteorological features to the empirical model in order to boost the prediction accuracy. All models were recalibrated to consider the weather variables described in section 2 and the net effect on the R 2 and the RMSE was measured. To identify the most relevant variables, relative importance metrics for linear models were computed by partitioning the R 2 (Lindeman, 1980;Chevan and Sutherland, 1991).

Robustness to reduced temporal frequencies
So far, all models were calibrated on LAI metrics derived from gap-less daily time series. While these set the upper limit in terms of reachable accuracy, their performances might significantly change with sparser time series resulting from coarser temporal resolutions or missing values due to cloud/cloud shadow contamination. To provide insights on their generalisation potential, we applied the previously calibrated models to LAI metrics extracted from time series with lower temporal resolution, accounting for cloud conditions.
Daily, 5-day, 10-day, and 16-day LAI time series were created to simulate the revisit cycles of the Dove constellation, the Sentinel-2A or -2B, Sentinel-2A and -2B, and Landsat-8. The remaining LAI values were further removed according to their corresponding daily cloud probability. Missing values were then linearly interpolated prior to yield estimation. A Monte Carlo approach was used and this process was repeated ten times. The impact on the prediction was measured using the average RMSE across the ten realisations. As an additional evaluation criterion, the number of times the metrics computation failed due to a lack of input data was computed.
Finally, the optimal strategies were identified for each temporal resolution. These were then interpolated to the Australian wheat production area at a 1-km 2 resolution based on the most similar station in terms of cloud frequency. In other words, pixels were attributed to the best-performing metrics of the station they were the most similar to in the cloud frequency space. Similarity between cloud patterns was measured with the Euclidean distance.

Simulation outputs
We summarised the main characteristics of the simulation outputs in Figure 2. Emergence started as early as April 4 th (Sow-3) and finished as late as August 18 th (Sow-2). Flowering ranged from July 22 nd (Sow-3) to November 17 th (Sow-2) which covers the flowering periods reported in other studies (see for instance Wang et al., 2015;Lawes, Huth and Hochman, 2016;Flohr et al., 2017). Maturity occurred from September 9 th (Sow-3) to December 19 th (Sow-1), with strong differences across treatments. Under the Sow-1 strategy, wheat was sown before the cutoff date of July 15 th in about 50 per cent of the cases. Note that Bolac was never the highest yielding variety so no Sow-3 simulation was available for that variety. Maximum LAIs averaged 3.93 across simulations with maximum values up to 8.34 (Sow-3). Simulated yields averaged 3247 kg ha -1 with a range of 105 kg ha -1 to 5824 kg ha -1 . Harvest indices (the ratio of grain yield and biomass) averaged 0.375 across simulations, spanning from 0.178 (Sow-1) to 0.518 (Plants 100). Yields and harvest indices and within the range of values reported in an exhaustive search of the literature for dryland wheat in Australia (Unkovich, Baldock and Forbes, 2010). Therefore, the simulation outputs produced likely scenarios of dryland wheat growth across the Australian wheat belt.

Accuracy of the prediction models
We relied on the R 2 and the RMSE to evaluate the performance of wheat yield estimation for the six types of LAI models without the weather variables and using calendar, thermal or phenological time (Figure 3 A and B). Between 30% and 78% of the yield variability can be explained by LAI features depending on the choice of features and time scale. Note that the Peak LAI and the Early/Late Windows approaches are by definition insensitive to a change in the time scale.

Figure 3: Performance indicators of the empirical models without weather variables(A and B) and with weather variables (C and D).
The peak LAI approach consistently explained the least variance (R 2 = 0.36, RMSE = 1,100 kg ha -1 ) followed closely by the two windows metrics (R 2 = 0.41, RMSE = 1,032 kg ha -1 ). Using calendar time, the Partial Integral method reached the highest coefficient of determination (R 2 = 0.77, RMSE = 560 kg ha -1 ) followed by the Integral and the Fourier approaches. Interestingly, the ranking of the best performing methods changed with respect to the time scale. For instance, the Senescence Fit and the Fourier Decomposition were both improved when the time series was normalised for thermal time or phenology. R 2 values increased from 0.60 to 0.69 and 0.78 with the Senescence Fit and from 0.72 to 0.82 and 0.80 with the Fourier Decomposition approach. It is worth noting that the best-performing methods (Fourier Decomposition and Senescence Fit) under phenological time are both related to a smoothing of the time series. Switching to phenological and thermal times can lead to worse results than calendar time in some cases, e.g., Integral and Partial Integral. This drop might be partly attributed to the current calculation of thermal time in APSIM, to the LAI features themselves, and to the abrupt transitions inherent at some Zadok stages.

Contribution and importance of weather variables
The R 2 of the linear model based on the weather features only reached 0.36 and the corresponding RMSE was 1,044 kg ha -1 , which was slightly better than the accuracy reached by the peak LAI metric (R 2 = 0.36, RMSE = 1,100 kg ha -1 ). Adding weather features was particularly beneficial to those models using simpler metrics but had little effect otherwise (Figure 3 C and D). They help reduce by half the difference in accuracy between the poorest and best models. For instance, the R 2 of the peak LAI method reached 0.56 whereas the R 2 of the Fourier approach only increased by 0.01.
We evaluated the contribution of the weather variables to the model R 2 (Figure 4). The weather variable with the highest contribution is the cumulative rainfall after the LAI peak. The sum and average maximum temperature post peak are also important. The remaining variables only exhibited a very low contribution to the R 2 (<0.03). The contribution of weather variables decreased as they were combined with LAI metrics derived from more advanced methods. Differences between the contribution of the variables computed for different temporal scales were small.

Robustness to reduced temporal frequencies
Reducing the temporal frequency and accounting for cloud contamination reduced the prediction accuracy and increased the number of predictions with missing values ( Figure 5). However, this effect was metric-specific and three groups of metrics could be defined. The first group contained simple metrics (Peak LAI and Early/Late windows) that displayed robustness to a reduction of the temporal density, with little impact on the accuracy and the proportion of missing values. The second group (Integral and Senescence Fit) maintained a relatively stable accuracy but this was achieved at the expense of a higher rate of missing values. The third group (Partial Integral and Fourier Decomposition) was sensitive to a reduction of the temporal frequency both in terms of accuracy and failed predictions. This underscores that some metrics, in order to explain yield variations, require a higher temporal density, i.e., less signal contamination can be tolerated, while others are more robust and can be applied on sparser time series. This also varied with respect to the time scale: the Partial Integral approach was effectively best for calendar time whereas phenological time was best for the Fourier Decomposition and Senescence Fit approaches when the observation frequency was > 5 days.

Mapping optimal metrics
A nearest neighbour interpolation approach was employed to map the best metrics across the Australian wheat production area ( Figure 6). The optimal metrics varied by location and by time scale and were consistent with the results obtained previously (Figure 3). One can clearly see that, as the temporal frequency becomes sparser, the Peak LAI and the Early/Late season approaches become increasingly the preferred choice. This underscores that, while the metrics derived from the Partial Integral, the Senescence Fit, and the Fourier Decomposition approaches yield higher accuracy, their use can only be recommended when ≤ 5-day time series are at hand.

Predicting yield with hyper-temporal data
In this study, daily LAI time series and their corresponding yields were simulated for 15 years, 10 management practices at 50 locations that are representative of the Australian wheat crop area. Metrics were extracted from the simulated LAI time series to predict grain yield in empirical relationships. We compared their respective prediction power for different time scales. Peak LAI consistently registered some of the worst predictions despite its widespread use in the remote sensing literature. The strength of the peak LAI relationship to yield (R 2 = 0.36) was weaker than what previous studies reported. For instance, Lopresti et al. (2015) and Johnson (2016) reported coefficients of determination of 0.52 and 0.59, respectively. Part of this difference could be explained by the larger range of G×E×M encountered in this study. This approach completely disregards the critical period of grain filling (Lawes, Huth and Hochman, 2016) and therefore cannot capture the impact of post-peak events such as terminal drought, which is often experienced in Australia (Chenu et al., 2011). Besides, large biomass early in the season does not necessarily result in large grain yield. The shortcomings of peak LAI is illustrated in Figure 7 where three time series reach similar peak LAI values but end up with drastically different yields. Therefore, peak LAI on its own is mostly useful to provide early estimates of grain yield. Integrating temporal profiles outperforms the peak LAI approach because the cumulative effect of photosynthetic apparatus efficiency during the entire growing period is taken into account (Benedetti and Rossini, 1993). Conditions affecting the flag leaf and the second leaf, which are the most active parts from a photosynthetic perspective, greatly influence final grain yield.
Wheat yields were accurately estimated from the parameters of a double logistic function fitted to the senescence part of the curve, which partly reflects the conclusions of Idso et al. (1980) and Baret and Guyot (1986). In addition, the prediction error was similar to the one reported by Kouadio et al. (2012) for regional wheat yield estimation (RMSE = 570 kg ha -1 ). The Fourier descriptors consistently ranked among the most accurate metrics tested, especially when using phenological time. They can reconstruct the full temporal profile and provide the most comprehensive description of the time series.
The performances of metrics differed when thermal or calendar time was employed which was expected because thermal time adjusts time series to account for growth rate and phenology, while shifts can occur in calendar time due to differences in growing conditions or sowing date. Considering thermal time can thus result in more coherent time series of canopy biophysical variables (Duveiller, López-Lozano and Baruth, 2013). Consequently, metrics describing the timing of events (such as the Senescence Fit or the Fourier Decomposition) are a priori more explicative of yield when using calendar time, while metrics describing the length of the season are expected to be more reliable using thermal time.
To some extent, the R 2 values obtained here represent the theoretical upper bound of what could be achieved with satellite data because the LAI time series were not affected by retrieval errors from spectral images. Therefore it might be surprising that peak LAI yields less accurate results than previously reported but this can be explained by two reasons. Firstly, studies often report R 2 values at a higher aggregation level which levels out of outliers and therefore increases the model fit (De Koning, Veldkamp and Fresco, 1998). Secondly, empirical models have recurrently been criticised for their lack of generalisation, i.e., they are only applicable for specific crop cultivars, crop growth stages, and certain geographical regions (Doraiswamy et al., 2003;Fang, Liang and Hoogenboom, 2011). Our in silico approach allowed us generating a diverse set of LAI time series and yields from 15 seasons and across the Australian wheat growing area. The low accuracy obtained by simple metrics suggests they cannot capture such diversity with single national-scale models and that locally-tuned models could improve their prediction skills (Labus et al., 2002). Advanced metrics achieved high accuracies with single empirical models, which provides evidence that metrics harnessing the temporality of the data have national relevance.

On the value of weather variables
Weather variables explained 36% of the yield variability and helped increase the accuracy when combined with metrics. Yet the magnitude of the increase was metric-specific. With a two-fold increase in R 2 , simple metrics benefited the most from their inclusion. This suggests that half of the accuracy of the previous studies using SCYM parametrised with the Early/Late Windows approach, e.g., Jin et al. (2017) and Lobell et al. (2015), can be attributed to weather data. This needs to be empirically verified in the corresponding study areas. Furthermore, it confirms that the explicit consideration of weather is the main reason for the better performance of the original SCYM implementation compared to a peak VI approach (Azzari, Jain and Lobell, 2017).
The three most important variables were cumulative rainfall, cumulative maximum temperature and average postpeak maximum temperature when the evaporative demand is higher. They all relate to water and heat/drought stresses and, by extension, to stored soil water which is critical for the growth of rainfed wheat in Australia. During grain filling, high temperature decreases leaf chlorophyll content and accelerates senescence (Zhao et al., 2007), leading to a shorter grain filling duration with an ultimate decrease in individual grain weight and yield that cannot be compensated by the higher grain filling rate under high temperatures (Pradhan et al., 2012). Colwell et al. (1977) argued the appropriate combination of data sources depends on the cost of obtaining and using such data compared to the benefits. The choice of adding weather covariates in the prediction model depends on the availability of such data as well as on the temporal resolution of the LAI time series at hand. This suggests that, if accurate and appropriate weather data are not readily available and if the temporal resolution is sufficient as to derive more advanced metrics (e.g., Integral, Senescence Fit, or Fourier Decomposition), the prediction model may be shrunk to LAI variables.

On the impact of the temporal resolution
In advancing routine yield assessment, our study illustrates the importance of the temporal resolution. Results show that empirical crop yield estimates yield significantly lower accuracies if critical periods are imaged every 16 days. This finding converges with Labus et al. (2002) who underlined that, at the pixel level, biweekly composites cannot adequately characterise crop productivity if crop critical periods are smoothed by the compositing algorithm.
Integrating radar data (Bériaux et al., 2015;Jin et al., 2015) and blending lower resolution time series (Verger, Baret and Weiss, 2011;Löw et al., 2017) are two mitigation options to increase the data frequency in areas affected by dense cloud coverage. Attention should be paid when fusing LAI data because LAI correlates non-linearly with the spatial resolution (Friedl et al., 1995;Liang, 2000). Jiang et al. (2018) and Wu et al. (2016) provide techniques for correcting the spatial scaling bias of LAI.
Another important contribution of this research is the identification of the optimum range of application of different metrics based on the temporal resolution. While advanced methods such as the Senescence Fit or the Fourier Decomposition outperform simple methods with hyper-temporal data, the latter should be preferred when sparse time series are at hand. Not only do these simple methods perform better in data-poor contexts but they are also less prone to missing values. This implies that the use of maximum LAI or Early/Late Windows LAI was driven by practicality and data availability rather than by systematic targetting of critically sensitive periods suggested by crop physiology. With the unprecedented revisit cycle of current observing systems and the prospects of future missions, our modelling suggests that time is ripe for a shift towards the use of data-intensive metrics for empirical yield estimation.

Considerations for satellite-based yield estimation
Some considerations ought to be made before transferring these findings to remotely sensed data. Firstly, satellitederived LAI is expected to be affected by retrieval errors which introduce noise in the time series. Methods have been devised to recover smooth time series of biophysical variables such as double logistics, smoothing splines, interpolation for data reconstruction, adaptive Savitzky-Golay filters (Moreno et al., 2014), or canopy structural dynamic models (Lauvernet, 2005). Some of these methods can robustly reconstruct trajectories, reducing the noise by up to 80%.
Secondly, satellites sense Green LAI not LAI because the electromagnetic radiation reflected from the crop canopy is contributed by all the aerial plant organs (Duveiller et al. 2011). Therefore, even in the absence of noise, there might be a less than perfect agreement between LAI obtained from APSIM and retrieved from satellite images.
A third shortcoming is that LAI is strongly non-linearly related to reflectance, inducing more uncertainty in the retrieval of higher LAI values. This well-known saturation for dense canopies prevents some retrieval methods from accurately discriminating canopies with LAI values > 4 (Weiss and Baret, 1999;Bsaibes et al., 2009). Duveiller et al. (2011) introduced a linearisation approach to correct this saturation effect when ground measurements of LAI are available. Nonetheless, there has been a convergence of empirical evidence emphasising the importance of rededge bands for operational estimation of biophysical parameters (Delegido et al., 2011(Delegido et al., , 2013Dong et al., 2019). Red-edge indices have been shown to reduce impacts due to leaf angle distribution (Dong et al., 2019) and alternative methods and indices based on red-edge bands have been proposed to bypass this saturation effect (Delegido et al., 2013). Therefore LAI estimates from Sentinel-2, which carries three red-edge bands, are expected to improve in the near future.
Finally, the availability of phenological data constrains the choice of the time scale, e.g., to identify the emergence date. Despite some recent contributions to extract phenological growth stages from satellite image time series (Sakamoto et al., 2005;De Bernardis et al., 2016;Gao et al., 2017), there is still a lack of validated approaches. Other data sources include field surveys or growers in the case of precision farming service.

Conclusions
The lack of suitable combinations of spatial and temporal resolutions of satellite image time series has long constrained large-area empirical yield estimation from space. Here we sought to systematically evaluate how the temporal resolution affects the choice of metric and their performance as well as to quantify the benefit of denser time series provided by recent and forthcoming constellations. To that aim, we developed an in silico approach which uses the crop growth model APSIM as data generator. This approach allowed us to explore a wider range of G×E×M combinations than what observational data currently permit, as well as to explore forthcoming temporal resolutions.
Empirical models only based on LAI metrics captured between 30 to 80% of the wheat yield variability, the highest accuracy being achieved by advanced metrics (Senescence fit and Fourier decomposition). This provides evidence that metrics truly harnessing the temporal dimension of LAI data can exhibit national relevance. Adding weather variables doubled the R 2 values of models based on simple metrics (peak LAI and Early/Late windows) but had no significant improvement for those based on advanced metrics. This indicates that metrics intensively exploiting the temporal dimension already reflect most of the influence of weather on crop yield. Finally, simple metrics emerged as the best choice when dealing with sparse (e.g., 16 days) time series but they were gradually outperformed by advanced metrics as the temporal resolution increased.
As we progress in a data-rich era, our findings support a general shift in the use of large-area empirical yield mapping towards the inclusion of metrics that truly harness the temporal dimension.