Back to the fields: COVID-19 impact on agricultural activity detected with satellite data

In response to the COVID-19 pandemic, policymakers worldwide adopted unprecedented measures to limit disease spread, with major repercussions on labour markets and economic growth. Here we provide empirical evidence of their impact on agricultural activity due to sectoral labour reallocation in tourist-dependent regions, a widely covered phenomenon by the international press. Analysing daily satellite data in a nonparametric machine learning statistical framework over cropland in Badung, a highly populated regency of Bali, Indonesia, we generate a counterfactual synthetic EVI based on gradient boosted decision trees trained on a set of environmental variables assuming no lockdown occurrence. Based on the counterfactual, we estimate a significant increase in the EVI vegetation index over cropland after the beginning of a lockdown period. The finding is robust to a placebo test. We then exploit the heterogeneity of the region analysed, where the South is dominated by tourism and the tertiary sector and the North is already mostly agricultural, and we find a stronger effect in the former. This results suggests that the observed increase in remotely sensed vegetation indexes is likely driven by a labour force reallocation towards the primary sector to compensate for the income lost from previous employment. Overall, our results show that statistical analysis of satellite data can be an effective methodology to observe the impact of a labour force crowding into the agricultural sector in response to an exogenous shock in other labour sectors.


Introduction
Tourism-dependent economies -where a large share of the local gross domestic product and of the workforce is employed in the touristic sector -have been among the worst hit by the restrictions imposed during the COVID-19 pandemic. Bali, with a population of 4.4 million, is Indonesia's tourism engine and more than half of its economy depends directly on tourism.
Bali is also an important agriculture hotspot (see Figure Appendix A.1). 72% of the land is used for agricultural activities and the agricultural sector is the second largest contributor to Bali's income in 2020 (BPS, 2020). Aside from rice, the main cultivated crop, farmers cultivate several horticulture commodities such as chayote, cabbage, and chilli (BPS, 2018).
Since COVID-19 outbreak in Bali, Indonesia, the tourism sector has been heavily affected by the restriction measures that the Indonesian government has taken following the rest of the world (BPS, 2020). In April, Indonesia banned all foreign tourists (Minister of Law and Human Rights, 2020). Nonetheless, the number of cases has kept increasing and as per the 27th of November 2020, there have been 13,730 cases reported according to the Bali Provincial Health Office (Kesehatan, 2020).Travel restrictions, limited social activities, and interactions caused a dramatic drop down in the number of tourists around the island (Laula and Paddock, 2020). The number of foreign tourists decreased by roughly 31% as compared with January 2019 and by 16% as compared with the same month one year before (Udayana, 2020).
To cope with the dramatic condition of the tourism industry, formerly the first source of income (Antara and Sumarniasih, 2017), the Balinese labour force reallocated towards the primary sector, making agriculture-related activities central in their daily lives as a financial lifeline. These dynamics have been widely reported by the international press, such as the New York Times (Laula and Paddock, 2020) and the Deutsche Welle (Welle, 2020), as well as local universities (Udayana, 2020). The process also coincided with a quick deurbanization.
Irrespective of the anecdotal and journalistic evidence, there has been no empirical assessment of this sectoral labour reallocation. In addition, there has been no official statistics release as the vast majority of the agricultural employment is either informal or family-run.
To empirically evaluate this shift, here we provide evidence of the impact of the lockdown on agricultural activity on Badung regency by comparing a time series of Moderate Resolution Imaging Spectroradiometer (MODIS) measured Enhanced Vegetation Index (EVI) and a counterfactual synthetic EVI based on gradient boosted decision trees trained on a set of environmental variables assuming no lockdown occurrence.
The key novelty of our analysis lies in the detection and statistical assessment of a labour market reallocation caused by an exogenous shock (the COVID-19 pandemic) through the use of satellite data. The paper contributes to the multidisciplinary literature linking the use of remotely sensed environmental data with machine learning algorithms and the econometric approaches to impact evaluation of policy.

The impact of COVID-19 on labour markets
The COVID-19 pandemic is exerting enormous pressure on the global economy in both higher and lower income countries. The observed labour market responses have however been remarkably different (Bartik et al., 2020): in high income countries a significant share of the population is employed in the tertiary sector and could therefore move to home working without significant changes in the work productivity. Workers of more affected sectors could receive compensation and income support schemes from national governments, which overall mitigated the short-run household income plunge.
Conversely, in middle and lower income countries many people's incomes largely depend on agriculture, handcraft or tourism (Danquah et al., 2020). The latter was one of the most affected sectors: according to recent estimates (WTTC, 2020), up to 75 million tourism sector workers are at immediate job risk as a result of COVID-19, with a travel tourism GDP loss in 2020 of up to $2.1 trillion. Yet, lower income countries' governments often lack the means to support workers who lost their jobs as a consequence of COVID-19 limitations on social interactions. In these regions, populations were thus hit more harshly by the pandemic, and faced the necessity to adopt autonomous adaptation 3 actions to try to compensate for the reduction or lack of their previous income (Ataguba, 2020). Reports have shown that many tourist workers have decided to go back to their towns of origin to pursue agricultural activities and mitigate the labour demand shock in their previous employment sectors (Laula and Paddock, 2020;Welle, 2020).

Estimating agricultural productivity with satellite data
The literature on the assessment of agricultural systems and productivity through the use of satellite data has proven the role that satellite imagery can play at a variety of scales (Jia et al., 2019) and the depth of the insights that they can provide. This is especially true for developing countries (Lobell et al., 2018), such as Indonesia (Yamamoto et al., 2019). For instance, a large body of literature has shown that composite indexes based on multi-spectral satellite observations -such as the NDVI and EVI indexes -can be effective proxies to detect and forecast yield variation of a multitude of crops such as soybeans, rice, and wheat, even in smallholder agriculture, with results consistent with official yield statistics and correlation maps of historical yield data. Vegetation indexes are also well suited to study scattered agricultural topography and to inform management decisions on the optimal timing for planting and harvesting. (2011) appraised the potential of MODIS EVI for modeling gross primary production across African Savannah ecosystems. In their study, EVI was found to correlate well with estimated GPP on a site-by-site basis.

Son et al. (2014) carry out a comparative analysis of multitemporal MODIS EVI and
NDVI indexes data for large-scale rice yield estimation in the Mekong River Delta. They find that the approach could be used to estimate rice production prior the harvesting period through the use of physical crop yield models, and that EVI-based models were more accurate than those from NDVI-based models. Their correlation coefficients (R 2 ) range from 0.62 to 0.71 for spring-winter and 0.4 to 0.56 for summer-autumn rice crops, respectively. Figueiredo et al. (2016) estimated correlation maps to assess historical yield during the soybean crop cycle from EVI data in Paraná State, Brazil. Their model was able to explain 96% to 98% of the variance in estimated yield from correlation maps. Duveiller et al. (2012) demonstrated how it is possible to monitor the growth and evolution of winter wheat over a decade using the green area index (GAI) even on a highly fragmented topography characterized by non-wheat areas within the spatial resolution of MODIS. Their results are congruent with ground trough data at both the regional and the filed level with a relative root mean square error (RRMSE) of 25.7% and 37.6%, respectively.
(Jain et al., 2017) developed a 30-meter resolution yield maps to explore the causes of yield gaps of smallholders in the Indo-Gangetic Plains, India. Using a crop model simulation, the authors were able to rely only on satellite vegetation indices without the need for any ground data. Finally using Random forest the authors proved that roughly 80% of the variance in yields is explained by differences in the weather conditions and management practices (such as the saw timing)Their results suggest that yield could be increased by 11 to 32% if more focus is given to the later.

Study area
Badung is one of the 9 regencies in the island of Bali, Indonesia. It spans over    (2020) Badung, and in particular its northern districts, is also an important agriculture hot spot: 45% (190.49 km 2 ) of its territory (418.52 km 2 ) is devoted to agricultural activities (see Figure Appendix

Conceptual framework
The empirical analysis presented in this paper builds on the following underlying conceptual framework. This is grounded on basic macroeconomic principles of labour demand and supply, factor productivity, and wage.
As a consequence of an exogenous shock (the travel restrictions imposed by the COVID-19 pandemic), individual i faces a reduction in labour demand L d from sector C where (s)he is currently employed: In the case under analysis, this is the tourism-related sector. This reduction in turn implies a decrease in wages in the sector w C ∆w C < 0 because labour supply L s is constant over the short run: In response, individual i decides to engage in the agricultural sector A, where in the short-run labour demand is constant: Agricultural productivity AP , defined as the quantity of output Y over cultivated area A at time t, is a function of labour L, capital K, and technology T : The increase in labour supply to the agricultural sector determines a short-run increase in the productivity because in general holds, although at a decreasing pace because of the law of decreasing returns to scale: As discussed in Section 2.2, there is significant empirical evidence that a change in AP t can be anticipated and proxied by vegetation indexes such as the EVI (see the relevant literature above and the Materials and Methods section) calculated from satellite data multispectral data.

Data sources and processing
The analysis is based on remotely sensed environmental and atmospheric data detailed in Table 2. The data includes the MODIS EVI and the variables used as predictors in the model. A time series is assembled based on data from the beginning of 2010 to 9th of July 2020. Raw data is processed in Google Earth Engine (Gorelick et al., 2017), where each remotely-sensed variable is first masked over cropland pixels (as from the GFSAD1000 gridded cropland product (Teluguntla et al., 2015); then, the mean value of each variable is extracted for each of the 6 districts of Badung for each day of the period considered.
The processed data are imported into the R scientific programming environment, where the data.table (Dowle et al., 2019), caret (Kuhn et al., 2008), and Xgboost (Chen et al., 2015) packages are used for statistical analysis.
Before the model training stage, missing data (19%) are imputed through a chained random forest Stekhoven and Buhlmann (2011) as implemented in the package Mis-sRanger Mayer (2019) where each variable containing missing values are imputed by using all the other variables as predictors iteratively until the mean out-of-bag (OOB) error reach its minimum value. Finally, a Savitzky-Golay filter is applied to remove the noise that characterises satellite data (see Chen et al. (2004)).

Agricultural indexes considered
The technological advances and the need to account for the specific features of a given study area have brought researchers to develop a multitude of vegetation indexes (Viña et al., 2011). For example, the EVI (Enhanced Vegetation Index, Equation 8) was developed to better account for atmospheric conditions and light reflection from the ground that causes several issues with the traditional NDVI (Normalized Difference Vegetation Index) measurement (Equation 9.
where NIR, RED, and BLUE represent the wavelength bands measured by a satellite.
In particular, the EVI includes additional wavelengths and adjustment parameters to correct for the inaccuracies of NDVI (Huete et al., 2002). Variations in solar incidence angle, atmospheric conditions like distortions in the reflected light caused by the particles in the air, and signals from the ground cover below the vegetation are corrected for using EVI. Moreover, the NDVI is mostly chlorophyll sensitive, while the EVI is more responsive to canopy structural variations (Morcillo-Pallarés et al., 2019).

Time-Series Prediction of EVI in a machine learning framework
In order to estimate the causal effect of the lockdown on agriculture activity, we compare the factual EVI time series of the study area of interest with a simulated EVI time series based on gradient boosted decision trees. In other words, we use historical data to build synthetic counterfactual. The difference between observed values of EVI and the synthetic counterfactual is the effect of the lockdown on agriculture activity.
Using the aforementioned data, we applied the extreme gradient boosted decision trees algorithm (Friedman, 2001) on the first 9 years (from the 1st of January 2010 to 1st April 2019) of data for each of the 6 districts of Badung. To enhance the accuracy of the model and to account for trends and seasonality, we generate several lagged features for each predictor and calculated the average values of temperature and humidity over the previews 15 and 30 days and the precipitation accumulation for the same time period.
We also include year, month, day, week of the year, day of the year, and sine and cosine function of time (for week, month, and year) as additional predictors.
We use a 10 fold cross-validation with a random search over 100 parameters to find the best hyperparameters (Wright and Ziegler, 2015) and evaluate the model performance with the root mean square error (RMSE). Finally, the year preceding the lockdown (from the 2nd of April 2019 to the 1st of April 2020) is used as a test set.

Lockdown effect on EVI estimation
After model training and testing to generate the synthetic projection, we estimate the effect of the lockdown as the difference between the synthetically estimated time series and the actual data (∆ Observed,Counterf actual ) before and after the lockdown. We then subtract our prediction to the observed value of EVI and we estimate as per Equation

10
: where y it is EVI measured on district i on day t, ρ it is the predicted value, and T is a dummy equal to 1 during the lockdown and 0 prior to it. We weight each observation based on the cropland area (measured in square meters) for each district to account for differences in cropland surfaces. An analogous methodology is applied in Granella et al. (2020) to estimate the effect the lockdown on pollution levels.

Robustness checks
To ensure that our results are not driven by a different treatment than the lockdown, we use a placebo test with the treatment period set one year before the actual lockdown (from the 2nd of April 2019 to the 9th July 2019). To carry out the test we: (i) drop all the outcomes for treated observations after they receive treatment (the beginning of the lockdown); (ii) insert a phantom treatment event in the middle of the remaining data (we select one year before the lockdown); (iii) run the linear estimation model and evaluate the significance of the β coefficient from Equation 10. Table 3 presents the R-squared (R 2 ), root mean square error (RMSE), and mean absolute error (MAE) for both the training dataset (from the 1st of January 2010 to 1st April 2019) and test dataset (from the 2nd of April 2019 to the 1st April 2020) for the whole study area as an average of the 6 districts. Figure 2 illustrates the average observed and predicted value of EVI for the whole regency of Badung before and after the lockdown.  The results show that our model is able to closely learn the historical EVI variation over cropland in Badung based on our set of atmospheric and seasonal predictors, with a training accuracy of 93%. When testing the trained model on unseen data (the future, compared to the training data time span), the model performs relatively well, with a testing accuracy of 74%.

ML model training and testing
13 5.2. Lockdown impact on agricultural activity  To explore the heterogeneity effect of the lockdown on the north and southern districts we run separate regression models for the two areas. It must be remarked that in the study area, the South is dominated by tourism and the tertiary sector while the North is mostly agricultural. Figures 3-4 illustrate the average observed and predicted value of EVI for the two geographical areas before and after the lockdown, In the north the EVI has increased by 1.06% while in the south 1.15%. The coefficient for the north model has higher statistical significance, but both coefficients are significant at a 5% level. The stronger effect in the south can be justified by a stronger labour force reallocation towards the primary sector to compensate for the income lost from previous employment. used to estimate the lockdown effect alongside the result of the placebo test in which the pseudo treatment is set one year before the real lockdown period.  The results of the robustness check show that the placebo estimation is not statistically significant, while the true lockdown impact has a positive and highly significant impact on the measured EVI index over cropland in Badung. The finding thus allows as to further strengthen the causal interpretation of the analysis.

Discussion and conclusions
According to our analysis, the observed increase in the EVI index over Badung, Bali, following the COVID-19 lockdown would not have occurred in the absence of the lockdown. The combination of a counterfactual synthetic EVI based on gradient boosted decision trees with high level of training and testing accuracy and of a placebo test allows confidently linking the two events.
We estimate that the lockdown has on average increased the value of EVI on the whole Badung regency by 1.17%, from the average values that would have been observed had the epidemic not occurred. We also find that in the norther part of the region the EVI has increased by 1.06% while in the south 1.15%, suggesting a stronger labour market shift towards agriculture in the more touristic south.
But what does a 1% increase in the EVI index over a short period of time represent?
It certainly highlights a significant growth in the canopy structure, such as leaf area and plant physiognomy. Yet, given the large areal extent and heterogeneity in cultivated crops of the region inquired, a direct interpretation in terms of agricultural yield change is not possible. On the other hand, the results have important implications for carbon sequestration, given the strong correlation of EVI with carbon stocks (Situmorang et al., 2016;Dai et al., 2020). As an additional remark, in our study we estimate the short run impact of the lockdown (from May to July). Longer run analysis could shade light on more structural changes in agricultural production system.
Overall, while our approach enables estimating the effect of the lockdown on EVI and linking it with a labour force shift, it does not uncover the increase in the vegetation index's underlying mechanism. In terms of plant phenology, several concurrent reasons could explain this variation such as a decrease in bare soil; an increase in plants health due to greater plant care and a decline in pollution levels or a reduction in the amount of harvested crops due to reduced food demand.
The key limitations of our analysis and its findings consist of the potential uncertainty and error in remotely sensed observations and the scale of analysis in relation to data granularity.
Overall our analysis contributes to the rapidly growing stream of literature linking remotely sensed data, machine learning modelling, and econometric causal inference. This is particularly relevant to evaluate policy impacts in near-real-time -when field or conventional data collection has not yet occurred -or in developing regions.

Code and data availability
The

Conflict of interest
Declaration of interest: none.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.