Bridging knowledge gaps with hybrid machine-learning forest ecosystem models (ML- FEMs): inferential simulation of past understory light regimes

Soil moisture is a key limiting factor of plant productivity in boreal and montane regions, producing additional climate feedbacks through evaporation, regeneration, mortality, and respiration. Understory solar irradiation – the primary driver of surface temperature and evaporative demand – remains poorly represented in vegetation models due to a lack of 3-D canopy geometry. Existing models are further unable to represent processes lacking sufficient parameterization and/or knowledge, with no land model to date utilizing machine learning (ML) to represent vegetation processes. Here, we developed the first hybrid forest ecosystem model using ML (ML-FEM), a specific case of hybrid AI land model (a concept also invented here). In this approach, ML models are trained and validated with a ground-truth dataset, whether observations or high-fidelity simulations, before being applied to vegetation model parameters for inference, internally or externally to the model. Using this approach, we simulated annual understory global solar irradiation (Iu) across 25.2 Mha in southwestern Canada at 1-ha resolution under historical climate and fire scenarios. In cross-validation, we found that linear and ML regression models performed comparably well in the prediction of angular canopy cover (ACC), due to the linearity of its relationship to predictors (linear R = 0.938, RMSE = 0.079; ML R = 0.939, RMSE = 0.074). Reduced area burned, increased ignitions, and reduced regeneration potential for recent periods resulted in stable or reduced Iu. This suggests that diminished disturbance may reduce Iu through forest aging, masking latent regeneration decline. Only in the most extreme and unconstrained scenarios did Iu increase. In these experiments, conducted in late 2015, we demonstrated an entirely new class of hybrid models that we anticipated to be of vital importance to understanding and representing pattern-based processes in Earth system models. 2 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 Introduction Light is a primary source of life for plants, as its physical energy drives the process of photosynthesis, making light a focus of plant resource competition (Hikosaka & Hirose, 1997; Katahata, Naramoto, Kakubari, & Mukai, 2005; Ruban, 2009). This include phototropism and a range of life history strategies linked to metabolic limitations per the leaf/plant economics spectrum (Enquist & Niklas, 2001; Wright et al., 2004; Enquist, West, & Brown, 2009). While plants have evolved adaptations that enable them to tolerate fluctuations in the understory light environment (Chazdon & Pearcy, 1991; Way & Pearcy, 2012), long-term light changes may affect succession through game-theoretic shade tolerance, growth, and regeneration strategies. Understory global solar irradiation (Iu) (i.e., the kinetic energy of photons incident across the sky hemisphere integrated over time and space) exerts a control on biogeochemical and energetic budgets through its effects on evaporative demand and soil moisture (Farquhar & Roderick, 2009). Global solar irradiation (I) represents the sum of direct, diffuse, and reflected solar irradiation components. Direct and diffuse radiation comprise the majority of the insolation budget (Iqbal, 1983). While direct radiation theoretically reaches the surface unimpeded, diffuse radiation is scattered by molecules in the atmosphere, and reflected radiation is returned by surface features. Although only a fraction of incident radiation can be used by plants in photosynthesis, known as the fraction of photosynthetically active radiation (fPAR or fAPAR), full-spectrum changes to radiation regimes are important in determining changes to energy balance (Paul M Rich, 1990) and thus changes to evaporative demand and soil water. Understory plants are believed to play a central role in processes from tree regeneration (Greene et al., 1999) and nutrient cycling to fire frequency, with some suggesting that understory 3 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 dynamics drive stand succession (Nilsson & Wardle, 2005). Previous studies have shown the importance of Iu in maintaining the diversity and productivity of understory plants in boreal forests (Aubin, Beaudet, & Messier, 2000; Grandin, 2004; Bartemucci, Messier, & Canham, 2006; Beaudet et al., 2011; Reich, Frelich, Voldseth, Bakken, & Adair, 2012; Pec et al., 2015), making Iu critical to the habitat of brown bear (Ursus arctos) and other boreal fauna. Improving our understanding of understory light dynamics is a key area of inquiry for scientists and managers (Lieffers, Messier, Stadt, Gendron, & Comeau, 1999). Canopy geometry, topography, and seasonality strongly affect understory light conditions. High latitude forests are characterized by narrow tree crowns, likely a population-level evolutionary adaptation to low solar elevations. While forest structure or geometry (e.g., due to thinning or disturbance) exerts direct influence on canopy light transmission (Lieffers et al., 1999; Beaudet & Messier, 2002; Bartemucci et al., 2006), or T, variation may be generalized to species-age cohort classes (Canham, Finzi, Pacala, & Burbank, 1994). While increased understory solar irradiation and temperatures may be beneficial to boreal understory plant production, given parallel increases to precipitation (Trenberth, 2011) and atmospheric CO2, long-term increases in evaporative demand may diminish soil water, limiting understory regeneration and growth potential (Adam M. Erickson, Nitschke, Coops, Cumming, & Stenhouse, 2015; D’Orangeville et al., 2018). Although boreal understory dynamics remain poorly understood, recent work in the Swedish boreal attributed an observed reduction in soil water to increased Iu (Grandin, 2004). Yet, the quality rather than quantity of light may be more important to long-term growth (Dengel & Grace, 2010). An improved understanding of Iu will facilitate the prediction of evaporative demand and thus soil water levels (Farquhar & Roderick, 4 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 2009), a primary limiting factor of productivity in the southern boreal (Adam M. Erickson et al., 2015; D’Orangeville et al., 2018). While passive spaceborne remote sensing shows promise for large-area characterization of soil moisture (Laskin, Montaghi, Nielsen, & McDermid, 2016), it remains difficult to simulate as it is a pattern-based physical process sensitive to scale effects and difficult-to-map variation in belowground composition. While physical-geometric models with detailed canopy geometries – such as 3-D procedural or stochastic L-systems tree models – may be ideal for physical models of soil moisture, their computational expense and parameterization requirements remain inhibitive for large-scale simulations. Fire plays a primary role in regulating forest structure and composition in circumpolar boreal forests (Rowe & Scotter, 1973). The evolution of boreal ecosystems was shaped by large stand-replacing fires, temperature extremes coupled to strong seasonality of the light environment, and geomorphological processes related to glaciation (Rowe, 1973; He, Pausas, Belcher, Schwilk, & Lamont, 2012). Warming has produced complex interactions between fire, productivity, and regeneration in regions of the boreal region of Alberta, Canada (Adam M. Erickson et al., 2015). Here, we simulate the combined effects of fire and climate on understory light conditions across western Alberta. We hypothesized that a climatically-driven reduction in tree regeneration potential (Adam M. Erickson et al., 2015) will reduce the forested area and increase understory global solar irradiation (Iu) given the persistence of observed 20 century climate and fire trends. Our experimental design reduces uncertainty by discarding climate projections and instead focusing on observed historical patterns applied to an initial state of year 2000 conditions. This serves as a Gedankenexperiment regarding the stability of a year 2000 landscape given the persistence of 20 century trends. 5 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 Materials and methods We modeled the combined effects of canopy structure, topography, and Earth-sun geometry on understory solar irradiation (Iu), demonstrating a new class of pattern-based hybrid vegetation model based on machine learning (Adam Michael Erickson, 2017). We developed and applied statistical or empirical regression models of canopy gap fraction (Po = 1 – ACC) to simulate a complex pattern-based process poorly represented in the LANDIS-II model: Iu. We multiplied the resulting values for Po at each simulation time-step by corresponding physicaltopographic model bare-Earth insolation values to dynamically simulate changes to Iu, providing a hybrid statistical-physical model of understory light that can be used in forecasting applications. This work began with an exploration of regression models of angular canopy closure (ACC) developed with 1 ha (100 m) airborne laser scanning (ALS) plot data at field inventory sites in western Alberta (n = 100), established for brown bear (Ursus arctos) habitat research (Nielsen, 2005). Ground measurements of ACC recorded with a convex spherical densiometer were used to train and cross-validate ALS linear and machine-learning regression models of ACC. We decided that including ALS models of ACC was an unnecessary step, as ground-based models of ACC alone would suffice for our task. Thus, we pursued an alternate methodology that only relied upon ground-truth measurements to reduce error propagation in ACC models, even if the ALS data provide greater sampling density than convex spherical densiometers. We leave discussion of implicit biases toward ground-sampled data as the ‘ground-truth’ for another paper. We simulated landscape Iu using a hybrid modeling approach combining the LANDIS-II forest landscape model (Scheller et al., 2007), the TACA biophysical tree regeneration model 6 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 (Nitschke & Innes, 2008; Adam M. Erickson et al., 2015), a physically-based solar radiation model (Fu & Rich, 1999), and two types of regression model: multiple-linear and machine learning. Models of processes learned from data require that the parameter vector of predictors be available in both the field data and the simulation model, or that these variables can be modeled using surrogates predicted with other variables; the former is needed for training while the latter is needed for inference (i.e., applying trained or calibrated models to generate predictions). Thus, a key practical challenge in applying our proposed ML-FEM approach is finding sets of field and model data that intersect for the process and spatiotemporal resolution of interest; this includes any potential remote sensing data streams. We trained and cross-validated multiple-linear and machine-learning regression models of ACC using convex spherical densiometer measurements (n = 950) for the Rocky Mountain Foothills region near Hinton, Alberta, Canada. We used 10-fold cross-validation repeated three times for each model, randomly selecting 75% of the data for model training and 25% for model testing in cross-validation. Root-mean-squared error (RMSE) and coefficient of determination (R) metrics were used to select final regression models. We ran four model scenarios to simulate landscape-level changes to ACC: Pre-suppression Era (1923-1952); Early Suppression Era (1953-1982), Global Change Era (1983-2012); and, Most Recent Decade (2003-2012). Landcover classification was performed on LANDIS-II model outputs using the ABMI Landcover 2010 scheme (Alberta Biodiversity Monitoring Institute, 2012) for the application of the trained ACC models in inference. Next, we applied a physical-topographic global solar irradiation model to simulate landscape-level changes to insolation Iu by multiplying modeled Po (1 – ACC) and bare-Earth insolation maps. 7 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 We assessed the effects of historical climate and fire patterns on landscape-scale Iu using a factorial experiment that included results from two contrasting classes of fire model: empirical and semi-mechanistic. For both fire models, we applied a new type of approximate stochastic gradient descent (Widrow & Hoff, 1960) that we previously proposed (Erickson et al., in review), characterized by the optimization of low-resolution model runs followed by fullresolution optimization for final parameter refinement, markedly improving model fit (R= 0.96; ΔR = +0.14; Supplementary Materials). Model simulations were run for a 50-year duration at annual resolution, treating the first 10 years of the simulation as the model spin-up period. We made no assumptions regarding the equilibrium state of existing vegetation communities; vegetation communities were not assumed to be in equilibrium and the model was not used to estimate equilibrium vegetation, as our study focuses on the 50-year period in order to balance initial conditions and model behavior. Further methodological details are provided below. Data Plot data used for the development of regression models include area-based canopy and terrain airborne laser scanning (ALS) metrics calculated with USDA Fusion (McGaughey, 2014), 30-year normal climate variables output from ClimateWNA (Wang, Hamann, Spittlehouse, & Murdock, 2011), Alberta Wet Areas maps derived from ALS data (Arp, Castonguay, Campbell, & Hiltz, 2009), Alberta Biodiversity Monitoring Institute (ABMI) Landcover 2010, Canada Land Inventory (CLI) forest site index, bare-Earth insolation calculated in ArcGIS, NASA SRTM digital elevation model (DEM), and ground-level GPS coordinates and vegetation survey data (Nielsen, 2005). The overall plot data included the following 58 variables: 8 160 161 162 163 164


Introduction
Light is a primary source of life for plants, as its physical energy drives the process of photosynthesis, making light a focus of plant resource competition (Hikosaka & Hirose, 1997;Katahata, Naramoto, Kakubari, & Mukai, 2005;Ruban, 2009). This include phototropism and a range of life history strategies linked to metabolic limitations per the leaf/plant economics spectrum (Enquist & Niklas, 2001;Wright et al., 2004;Enquist, West, & Brown, 2009). While plants have evolved adaptations that enable them to tolerate fluctuations in the understory light environment (Chazdon & Pearcy, 1991;Way & Pearcy, 2012), long-term light changes may affect succession through game-theoretic shade tolerance, growth, and regeneration strategies.
Understory global solar irradiation (I u ) (i.e., the kinetic energy of photons incident across the sky hemisphere integrated over time and space) exerts a control on biogeochemical and energetic budgets through its effects on evaporative demand and soil moisture (Farquhar & Roderick, 2009).
Global solar irradiation (I) represents the sum of direct, diffuse, and reflected solar irradiation components. Direct and diffuse radiation comprise the majority of the insolation budget (Iqbal, 1983). While direct radiation theoretically reaches the surface unimpeded, diffuse radiation is scattered by molecules in the atmosphere, and reflected radiation is returned by surface features. Although only a fraction of incident radiation can be used by plants in photosynthesis, known as the fraction of photosynthetically active radiation (fPAR or fAPAR), full-spectrum changes to radiation regimes are important in determining changes to energy balance (Paul M Rich, 1990) and thus changes to evaporative demand and soil water.
Understory plants are believed to play a central role in processes from tree regeneration (Greene et al., 1999) and nutrient cycling to fire frequency, with some suggesting that understory dynamics drive stand succession (Nilsson & Wardle, 2005). Previous studies have shown the importance of I u in maintaining the diversity and productivity of understory plants in boreal forests (Aubin, Beaudet, & Messier, 2000;Grandin, 2004;Bartemucci, Messier, & Canham, 2006;Beaudet et al., 2011;Reich, Frelich, Voldseth, Bakken, & Adair, 2012;Pec et al., 2015), making I u critical to the habitat of brown bear (Ursus arctos) and other boreal fauna. Improving our understanding of understory light dynamics is a key area of inquiry for scientists and managers (Lieffers, Messier, Stadt, Gendron, & Comeau, 1999). Canopy geometry, topography, and seasonality strongly affect understory light conditions. High latitude forests are characterized by narrow tree crowns, likely a population-level evolutionary adaptation to low solar elevations.
While increased understory solar irradiation and temperatures may be beneficial to boreal understory plant production, given parallel increases to precipitation (Trenberth, 2011) and atmospheric CO 2 , long-term increases in evaporative demand may diminish soil water, limiting understory regeneration and growth potential (Adam M. Erickson, Nitschke, Coops, Cumming, & Stenhouse, 2015;D'Orangeville et al., 2018). Although boreal understory dynamics remain poorly understood, recent work in the Swedish boreal attributed an observed reduction in soil water to increased I u (Grandin, 2004). Yet, the quality rather than quantity of light may be more important to long-term growth (Dengel & Grace, 2010). An improved understanding of I u will facilitate the prediction of evaporative demand and thus soil water levels (Farquhar &  2009), a primary limiting factor of productivity in the southern boreal (Adam M. Erickson et al., 2015;D'Orangeville et al., 2018).
While passive spaceborne remote sensing shows promise for large-area characterization of soil moisture (Laskin, Montaghi, Nielsen, & McDermid, 2016), it remains difficult to simulate as it is a pattern-based physical process sensitive to scale effects and difficult-to-map variation in belowground composition. While physical-geometric models with detailed canopy geometriessuch as 3-D procedural or stochastic L-systems tree models -may be ideal for physical models of soil moisture, their computational expense and parameterization requirements remain inhibitive for large-scale simulations.
Fire plays a primary role in regulating forest structure and composition in circumpolar boreal forests (Rowe & Scotter, 1973). The evolution of boreal ecosystems was shaped by large stand-replacing fires, temperature extremes coupled to strong seasonality of the light environment, and geomorphological processes related to glaciation (Rowe, 1973;He, Pausas, Belcher, Schwilk, & Lamont, 2012). Warming has produced complex interactions between fire, productivity, and regeneration in regions of the boreal region of Alberta, Canada (Adam M. Erickson et al., 2015). Here, we simulate the combined effects of fire and climate on understory light conditions across western Alberta. We hypothesized that a climatically-driven reduction in tree regeneration potential (Adam M. Erickson et al., 2015) will reduce the forested area and increase understory global solar irradiation (I u ) given the persistence of observed 20 th century climate and fire trends. Our experimental design reduces uncertainty by discarding climate projections and instead focusing on observed historical patterns applied to an initial state of year 2000 conditions. This serves as a Gedankenexperiment regarding the stability of a year 2000 landscape given the persistence of 20 th century trends.

Materials and methods
We modeled the combined effects of canopy structure, topography, and Earth-sun geometry on understory solar irradiation (I u ), demonstrating a new class of pattern-based hybrid vegetation model based on machine learning (Adam Michael Erickson, 2017). We developed and applied statistical or empirical regression models of canopy gap fraction (P o = 1 -ACC) to simulate a complex pattern-based process poorly represented in the LANDIS-II model: I u . We multiplied the resulting values for P o at each simulation time-step by corresponding physicaltopographic model bare-Earth insolation values to dynamically simulate changes to I u , providing a hybrid statistical-physical model of understory light that can be used in forecasting applications.
This work began with an exploration of regression models of angular canopy closure (ACC) developed with 1 ha (100 m 2 ) airborne laser scanning (ALS) plot data at field inventory sites in western Alberta (n = 100), established for brown bear (Ursus arctos) habitat research (Nielsen, 2005). Ground measurements of ACC recorded with a convex spherical densiometer were used to train and cross-validate ALS linear and machine-learning regression models of ACC. We decided that including ALS models of ACC was an unnecessary step, as ground-based models of ACC alone would suffice for our task. Thus, we pursued an alternate methodology that only relied upon ground-truth measurements to reduce error propagation in ACC models, even if the ALS data provide greater sampling density than convex spherical densiometers. We leave discussion of implicit biases toward ground-sampled data as the 'ground-truth' for another paper.
We simulated landscape I u using a hybrid modeling approach combining the LANDIS-II forest landscape model (Scheller et al., 2007), the TACA biophysical tree regeneration model  (Nitschke & Innes, 2008;Adam M. Erickson et al., 2015), a physically-based solar radiation model (Fu & Rich, 1999), and two types of regression model: multiple-linear and machine learning. Models of processes learned from data require that the parameter vector of predictors be available in both the field data and the simulation model, or that these variables can be modeled using surrogates predicted with other variables; the former is needed for training while the latter is needed for inference (i.e., applying trained or calibrated models to generate predictions). Thus, a key practical challenge in applying our proposed ML-FEM approach is finding sets of field and model data that intersect for the process and spatiotemporal resolution of interest; this includes any potential remote sensing data streams.
We trained and cross-validated multiple-linear and machine-learning regression models of ACC using convex spherical densiometer measurements (n = 950) for the Rocky Mountain Foothills region near Hinton, Alberta, Canada. We used 10-fold cross-validation repeated three times for each model, randomly selecting 75% of the data for model training and 25% for model testing in cross-validation. Root-mean-squared error (RMSE) and coefficient of determination (R 2 ) metrics were used to select final regression models. We ran four model scenarios to simulate landscape-level changes to ACC: Pre-suppression Era ; Early Suppression Era (1953-1982), Global Change Era (1983and, Most Recent Decade (2003. Landcover classification was performed on LANDIS-II model outputs using the ABMI Landcover 2010 scheme (Alberta Biodiversity Monitoring Institute, 2012) for the application of the trained ACC models in inference. Next, we applied a physical-topographic global solar irradiation model to simulate landscape-level changes to insolation I u by multiplying modeled P o (1 -ACC) and bare-Earth insolation maps. We assessed the effects of historical climate and fire patterns on landscape-scale I u using a factorial experiment that included results from two contrasting classes of fire model: empirical and semi-mechanistic. For both fire models, we applied a new type of approximate stochastic gradient descent (Widrow & Hoff, 1960) that we previously proposed (Erickson et al., in review), characterized by the optimization of low-resolution model runs followed by fullresolution optimization for final parameter refinement, markedly improving model fit (R 2 = 0.96; 11N (meters) coordinates, used for its high positional accuracy at regional scales

Linear and machine learning regression models of P o
Multivariate linear regression follows the classical form: Where T denotes the transpose, such that X i T β is the inner product between x i and weight vector β. The ordinary least squares method was used to solve for weights and the intercept term

The Random Forest algorithm
The Random Forest algorithm (Leo Breiman, 2001) builds on the bagging procedure (i.e., bootstrap aggregation), or the averaging of many noisy unbiased models to reduce variance by building a large collection or forest of de-correlated regression trees before performing averaging (L Breiman, 1996). Decision trees are ideal for bagging procedures, as they capture complex interactions and, have low bias and high noise (Hastie, Tibshirani, & Friedman, 2009). The bias of bagged trees is identical to that of individual trees, making variance the focus of improvement.
Random Forest was designed to improve the variance reduction of bagging by minimizing the correlation between trees without substantially increasing the variance. This is achieved by randomly selecting input variables during the tree-growing process. The Random Forest algorithm is further described below (Algorithm 1), adopted from Hastie et al. (2009). Following model training, to make a prediction at a new point x: be the class prediction of the bth Random Forest tree. Then, In short, the Random Forest algorithm creates n-trees decision trees from randomly selected variables with mtry splits at each node. Each of these trees is a weak predictor, combined through averaging to produce predictions. Here, we focus on the regression case.

Landcover classification of LANDIS-II species-age cohorts
To simulate P o (1 -ACC) at the landscape scale using the LANDIS-II model, we classified simulated annual species-age cohorts into landcover classes per the ABMI Wall-to- The following section describes the lookup table (Table 1) and algorithm (Algorithm 2) used for classifying simulated species-age cohorts into ABMI landcover classes for pixels/sites, providing landcover maps at annual resolution.

Bare-Earth global solar irradiation model
We used ArcGIS Spatial Analyst solar radiation tools (Fu & Rich, 1999) with an SRTM RADAR digital elevation model (DEM) processed using standard correction techniques to compute bare-Earth global solar irradiation across the 25.2 Mha study area at 1 ha resolution. In the following text, we provide a description of the solar radiation model used. Based on previous work (Paul M Rich, 1990;P. M. Rich, Dubayah, Hetrick, & Saving, 1994;Fu & Rich, 2002) parallel to GRASS r.sun algorithm development (Šúri & Hofierka, 2004), global solar radiation was calculated per the following: 4. Calculate half-hourly diffuse solar radiation for sectors in a 2-D polar chart 5. Calculate total direct solar radiation by masking sky sectors of (3) with pixels of (1) 6. Calculate total diffuse solar radiation by masking sky sectors of (4) with pixels of (1) 7. Calculate global solar radiation for the cell as the sum of (5) and (6) Each 2 The hemisphere calculations used were originally developed for hemispherical photography vegetation studies (Paul M Rich, 1990;Fu & Rich, 1999). In the viewshed calculation, twelve equal azimuth angles are searched from the pixel center for computation of the maximum horizon angle (unobstructed zenith). The horizon angles are then converted into a hemispherical coordinate system as zenith (θ) and azimuth ) angle sectors of a polar plot. Each cell within the hemisphere sectors takes one of two binary values, visible or occluded.
The half-hourly sun position is calculated using standard equations (Iqbal, 1983), used for calculating direct and diffuse radiation components. The calculation of direct, diffuse, and global radiation for a given sun position follows previous work (Paul M Rich, 1990; P. M. Rich et al., 1994;Fu & Rich, 2002). Global solar irradiation ( I global ) is the sum of direct I direct and diffuse I diffuse components, ignoring reflected irradiation: Direct solar irradiation I direct is computed as the sum of irradiation for each sector defined by zenith (θ) and azimuth (ϑ ) angles for each hour and month: The direct solar irradiation for a given zenith and azimuth angle sector is calculated as the solar constant for the mean Earth-sun distance ( S const ), equal to 1367 W m -2 , multiplied by the atmospheric transmissivity for the shortest path raised to the relative optical path length (β m θ ), the sky sector sun duration ( t θ ,ϑ ), equal to monthly and half-hourly intervals or spherical geometry, the gap fraction for the sun map sector ( P θ ,ϑ ), and the cosine of the angle of incidence between the sky sector centroid and the surface normal ( γ θ ,ϑ ): Diffuse solar irradiation I diffuse is computed as the sum of irradiation for each of 128 sectors, given 8 zenith (θ) and 16 azimuth (ϑ ) angle divisions: Unlike direct irradiation, I diffuse θ,ϑ sectors are calculated as the rolling sum of half-hourly values for a given time interval, due to the multi-directional nature of diffuse radiation, with each sector predefined rather than based on modeled solar position. The diffuse solar irradiation for a given zenith and azimuth angle sector is calculated as the global normal radiation ( R glb ) multiplied by the proportion of diffused global radiation flux ( p diffuse ), time interval (t), sky sector gap fraction ( P θ ,ϑ ), weighted proportion of diffuse radiation originating from a sector ( w θ ,ϑ ), and cosine of the angle of incidence ( γ θ ,ϑ ): Global normal radiation ( R glb ) is calculated as the solar constant ( S const ) multiplied by the sum of the atmospheric transmissivity for the shortest path raised to the relative optical path length (β m θ ), divided by one minus the proportion of diffused global radiation flux ( p diffuse ) to correct for direct radiation: The weighted proportion of diffuse radiation originating from a sector ( w θ ,ϑ ) is calculated as the zenith angle range for a sky sector ( cos θ 2 − cos θ 1 ) divided by the number of azimuth divisions in the sky map ( N ϑ ): w θ ,ϑ = ( cos θ 2 − cos θ 1 ) / N ϑ Each of these calculations was performed for each cell in the NASA SRTM DEM using ArcGIS solar analyst tools (Fu & Rich, 1999). For more details on model parameterization, please refer to the Supplementary Materials.

Results
Our hybrid model simulation results showed that I u levels increased as the forested area Site Index. Importantly, both variables contain latent information on disturbance legacies as well as regional climate and soil patterns.
Using only the above two predictor variables, multiple-linear regression showed model performance comparable with substantially more complex models (Table 2). Multiple-linear regression model robustness was tested for the two predictor variables by performing 10-fold cross-validation repeated three times (R 2 = 0.938; RMSE = 0.079), yielding only marginally diminished model performance compared to step-wise AIC or BIC model selection models using many variables.  (Strobl, Boulesteix, Zeileis, & Hothorn, 2007), which may be corrected with one-hot encoding, a binary class membership scheme. We proceed by applying models using only CLI site index and ABMI landcover class as predictors. The final twoparameter Random Forest model showed reliably low error using an n-tree parameter of 500, or a forest of 500 decision trees for averaging ( Figure 3).  severe disturbances. Absent disturbance, the effects of changes in regeneration on stand composition remain a latent process. Simulated reductions to the mean area burned and total area burned, due to fire suppression in recent decades, reduced mean understory light by a maximum of 8%, attributable to a demographic shift toward older stands. Meanwhile, higher burn rates generally produced higher landscape levels of I u .
Base Fire (bf) model simulations, which lack any realistic physical constraints, are notable for showing the highest landscape levels of I u . Meanwhile, semi-mechanistic Dynamic Fuels and Fire System (dffs) model simulations produced substantially lower levels of I u even when parameterized with the same empirical fire regimes. This is due to process constraints built into the dffs fire model; large fires were followed by fuel limitations. The dffs model simulations yielded reduced mean I u compared to age-only succession (ao) scenarios, while the lowest simulated levels of I u were found in the ao-dffs-extremes scenario ( Figure 5). Reduced landscape I u produced in the ao-dffs-extremes scenario may be explained by forest expansion following initial large disturbances, after which fire regimes were strongly constrained. Fuel-constrained disturbance regimes are apparent for all dffs scenarios ( Figure 6).
The bf scenarios, which forced the application of historical disturbance regimes without fuel or weather limitations, showed an increase in I u for all model scenarios, except for the Early Suppression (1953Suppression ( -1982 and Global Change (

Discussion
In this study, ALS plot data were discarded as predictors of P o due to the temporal mismatch between ALS sorties and ground validation data collection. Due to this mismatch, forest disturbance and subsequent recovery broke down the correlation structure between the two datasets. Nevertheless, ALS remains an important predictor of P o due to its broad sampling capabilities, which are estimated to provide a more accurate and complete depiction of forest geometry. Where high point-density or waveform ALS data is available, such data is preferable to coarse traditional ground measurements. The two extreme scenarios yielded divergent responses in landscape I u depending on the fire model used, due to the inclusion of constraints in the dffs fire model. It is logical that a decline in forest cover may drive a long-term increase in landscape I u , if stands fail to regenerate and/or disturbance regimes become more severe under warming. In the absence of fire-related mortality, a long-term decline in regeneration rates may overcome stand aging to shift forest composition. As the simulations do not include harvest, its contribution to mortality may energetically balance the recent decline in area burned. The interaction of harvest, fire, and biological disturbance is the subject of future research. In our simulations, conversion from forestland to grasslands/shrublands due to reduced regeneration rates was caused by modeled soil water limitations (Adam M. Erickson et al., 2015). Given the importance of regeneration to our study results, the TACA model would benefit from more extensive regional validation in future studies. Given the complexity of the TACA model, requiring many difficult-to-source speciesspecific parameters, it would greatly benefit from model reduction strategies, as has been done for the SORTIE model in work on the Perfect Plasticity Approximation (Strigul, Pristinski, Purves, Dushoff, & Pacala, 2008).
Annual bare-Earth global solar radiation and CLI forest site index were static for each site, making I u variation purely a function of simulated landcover change and modeled P o . Forest demography is not explicitly modeled in the calculation of P o . Immature trees less than ten years of age were omitted, due to a negligible effect on overstory P o conditions and no effect of Modeled landscape I u showed divergent responses to changing fire and climate conditions. Modeled I u indicated that understory light levels were highest under greater burn rates and warmer climatic conditions, where regeneration rates were lowest. Yet, this result depends on the type of fire model applied. We suggest applying empirical fire models for simulating well-described historical fire regimes, particularly if there is an absence of empirical support for the application of complex semi-mechanistic fire models. Studies concerned with forecasting into novel conditions may benefit from the mechanistic constraints of complex fire models, allowing theoretically robust extrapolation.
Here, our primary concern was replicating the continuation of recent historical fire patterns for modeling changes to canopy light transmission (T), a task for which both fire models provide useful information. Future studies should extend forest ecosystem simulations over century timescales to test for forest cover or compositional change, as model behavior may overcome initial landscape parameterization at century timescales, resulting in equilibrium or stability conditions. Yet, model uncertainty also increases with longer simulation timescales through error propagation, motivating our use of half-century simulations. Regardless of temporal scale, most critical are the simulation time-points where regime shifts are likely to occur, which signify transitions in the state-space of forests. By combining remote sensing with simulation models, dedicated state-space models designed for linear systems with random disturbances, such as the Extended Kalman filter (Kalman & Bucy, 1961), may be used to better understand the recent historical state of forest ecosystems. Evidence is provided that a diminished rate of burning likely decreased I u in recent years, attributable to a demographic shift occurring through stand development processes in the absence of fire-related mortality. This is supported by a precursory analysis of Alberta Permanent Sample Plot data for the region (Adam Michael Erickson, 2017), which showed a reduction in regeneration and mean tree height -inferred to correspond to a reduction in mean tree ageacross the Global Change Era. Future studies should incorporate the effects of harvest and biological disturbance agents with more sophisticated succession models to estimate the effects of each on understory light levels, while further incorporating remote sensing data through a state-space modeling framework.

Limitations
Here, a physical solar radiation model was combined with a regression model of P o using forest site index and simulated landcover as predictors. The layering of these models may produce error propagation, common to complex models lacking global parameter optimization (Pacala et al., 1996;Arras, 1998;Larocque, Bhatti, Boutin, & Chertov, 2008). These uncertainties were not explicitly represented given the complexity of the models and scope of this research. Additionally, the solar radiation model used assumes constant solar output, which is known to be false, but is a reasonable assumption given that work is not concerned with temporal variation in solar activity. Other limitations of the solar radiation model include its reliance on simple geometric relationships and lack of radiative transfer functions related to turbidity or cloud cover.
Of these shortcomings, the absence of cloud cover information is expected to have the largest effect on modeled radiation, as clouds may be the largest source of radiation attenuation in the atmosphere (Hammer et al., 2003). Cloud cover indices derived from geostationary weather satellite data may be used to generate atmospheric clearness indices. Such indices facilitate a simple but effective method of integrating spatiotemporally resolved atmospheric conditions with models of clear-sky solar radiation and LiDAR canopy light transmission (Tooke, Coops, Christen, Gurtuna, & Prévot, 2012). Finally, while changes to landcover were dynamically simulated, forest site index was static (Agriculture and Agri-Food Canada, 2016).
Future studies should test the application of NDVI, NIR V , or SIF for incorporating dynamic changes to stand productivity or site index.

Conclusion
Here, we developed and demonstrated the first hybrid vegetation model using machine The simple statistical fire model consistently produced the parameterized fire regime without constraint, while the semi-mechanistic fire model was strongly constrained by fuel and weather limitations. The appropriate choice of fire model depends strongly on the research question. This work applied both types of fire model in an effort to better understand forest ecosystem trajectories using ensembles based on unique scenarios. Applying the statistical fire model with empirical parameters, it was clear that weakened disturbance regimes reduced modeled I u across the landscape. However, the past decade showed an increase in the rate of burning and thus in I u , attributable to exponentially increased fire frequency linked to an increase in human activity in previously remote areas (Adam Michael Erickson, 2017). These results may be indicative of future national fire regimes as population levels and temperatures continue to rise throughout the circumpolar boreal zone, with the southern reaches of these forests likely the first to experience compositional and anthropogenic changes.

Acknowledgements
We thank funders and collaborators for making this work possible. A.E. and N.C. were funded by Foothills Research Association's Grizzly Bear Program and an NSERC grant to N.C.
A.E. proposed and implemented the presented methodology, and, led the writing of the manuscript. N.C. managed the overall research and contributed to the writing. C.N. provided TACA model parameters to conduct the regeneration modeling work and contributed to the writing. G.S. provided the necessary regional expertise and support to conduct this research.