A priori parameterisation of the CERES soil-crop models and tests against several European data sets

Mechanistic soil-crop models have become indispensable tools to investigate the effect of management practices on the productivity or environmental impacts of arable crops. Ideally these models may claim to be universally applicable because they simulate the major processes governing the fate of inputs such as fertiliser nitrogen or pesticides. However, because they deal with complex systems and uncertain phenomena, site-specific calibration is usually a prerequisite to ensure their predictions are realistic. This statement implies that some experimental knowledge on the system to be simulated should be available prior to any modelling attempt, and raises a tremendous limitation to practical applications of models. Because the demand for more general simulation results is high, modellers have nevertheless taken the bold step of extrapolating a model tested within a limited sample of real conditions to a much larger domain. While methodological questions are often disregarded in this extrapolation process, they are specifically addressed in this paper, and in particular the issue of models a priori parameterisation. We thus implemented and tested a standard procedure to parameterize the soil components of a modified version of the CERES models. The procedure converts routinely-available soil properties into functional characteristics by means of pedo-transfer functions. The resulting predictions of soil water and nitrogen dynamics, as well as crop biomass, nitrogen content and leaf area index were compared to observations from trials conducted in five locations across Europe (southern Italy, northern Spain, northern France and northern Germany). In three cases, the model’s performance was judged acceptable when compared to experimental errors on the measurements, based on a test of the model’s root mean squared error (RMSE). Significant deviations between observations and model outputs were however noted in all sites, and could be ascribed to various model routines. In decreasing importance, these were: water balance, the turnover of soil organic matter, and crop N uptake. A better match to field observations could therefore be achieved by visually adjusting related parameters, such as field-capacity water content or the size of soil microbial biomass. As a result, model predictions fell within the measurement errors in all sites for most variables, and the model’s RMSE was within the range of published values for similar tests. We conclude that the proposed a priori method yields acceptable simulations with only a 50% probability, a figure which may be greatly increased through a posteriori calibration. Modellers should thus exercise caution when extrapolating their models to a large sample of pedo-climatic conditions for which they have only limited information. soil-crop models / water balance / nitrogen dynamics / extrapolation Communicated by Jean-François Ledent (Louvain-La-Neuve, Belgium) Agronomie 22 (2002) 119–132 119 © INRA, EDP Sciences, 2002 DOI: 10.1051/agro:2002003 * Correspondence and reprints Benoit.Gabrielle@grignon.inra.fr Résumé – Paramétrisation à priori des modèles CERES de culture-sol et tests sur des jeux de données européens . L s modèles déterministes de simulation des systèmes sol-plante sont un outil puissant et parfois exclusif pour étudier l’effet des pratiques culturales sur la productivité et les impacts environnementaux des cultures. Parce qu’ils simulent les principaux phénomènes en jeu, ces modèles peuvent en principe être appliqués à tous types de situations agronomiques ou pédo-climatiques. Dans la pratique cependant, un calage local des paramètres de fonctionnement du système sol-culture s’avère nécessaire. Cette étape constitue un frein à l’extrapolation des modèles qui est trop souvent négligé par les modélisateurs. Dans cet article nous abordons la question sous-jacente à l’extrapolation de l’estimation a priori des paramètres des modèles en testant une procédure standardisée pour les modèles CERES. La vraisemblance des jeux de paramètres ainsi inférés est évaluée en confrontant des résultats de simulation avec des observations issus d’un réseau d’essais sur 5 sites Européens (sud de l’Italie, nord de l’Espagne, nord de la France, nord de l’Allemagne). Sur trois sites, l’erreur commise par CERES s’est avérée comparable à celle sur les mesures. Des écarts significatifs ont toutefois été notés pour différentes variables de sortie sur tous les sites. Ils ont pu être attribués à la simulation du bilan hydrique, de la matière organique du sol ou de l’absorption d’azote par la culture, et corrigés en partie par un ajustement des paramètres en jeu. Nous concluons que la méthode de paramétrisation proposée a une probabilité de seulement 50 % d’aboutir à des résultats réalistes, et que CERES n’a pas pu s’adapter à toutes les situations testées dans sa forme actuelle. L’extrapolation d’un modèle sur un large domaine de conditions pédo-climatiques nécessite donc beaucoup de précautions. modèles sol-culture / paramétrisation / bilan hydrique / bilan azoté / cycle de l’azote / extrapolation


INTRODUCTION
Deterministic models of soil-crop systems have become indispensable tools to generalise results obtained locally under particular field conditions, whether in agronomic or environmental studies.In many instances they even play an exclusive role because direct experimental monitoring is too costly to be carried out under a wide range of pedo-climatic conditions.Examples of model applications on a large scale (whether time or space) include: regional and national inventories [12,27,36], the impact of climate change [10,30], integrative assessment of agricultural practices [28,41], land-use change scenarios [27,33], or precision agriculture [32].
Because they simulate the major processes occurring within the bio-geochemical cycles of interest, such models may claim to be universally applicable.However, because they deal with complex systems and uncertain phenomena, site-specific calibration is usually a prerequisite to ensure realistic predictions [7,15,16].This obviously hampers a priori extrapolation of the model to other sites, which is of prime importance in the abovementioned applications.
There are two major reasons for which model extrapolation may fail: (i) the model's structure (i.e. its set of equations) does not apply to the particular soil type, climatic conditions or agricultural practices tested, or (ii) the model is supplied with incorrect parameter values.When faced with a failure of the model, users commonly try the second route (parameter fitting) before taking the 'structural' route.For instance, Quemada and Cabrera [29] modified the crop residues decomposition routine of CERES after realising that, even when provided with laboratory-obtained decay rates for the residues CERES could not mimic them in the field.However, in many instances it is difficult to decide between the effect of wrong values and that of unfit model structure, because both have a similar influence on the outcome of prediction.Previous comparison studies in which several N models were tested against independent data sets showed that models achieved various degrees of success, and that their errors could be attributed to both causes [6,8,20].Thus, the issue of their errors remained to be investigated per se.One problem with isolating the role of supplied values is that different models will use sets of parameters variable in nature and definition.To overcome this, Gabrielle et al. [15] proposed comparing models using the same basic information on soil and crop characteristics.They reached the conclusion that the effect of values was predominant over that of structure for three models of varying complexity, albeit for a single site in France.This paper therefore focuses on the issue of estimating correct values when extrapolating models to sites with contrasting climate and soil characteristics.
Usually, model extrapolation follows a test phase involving only a few sets of management / soil / climate conditions compared to the number of combinations considered in the extrapolation.The sizes of the test and extrapolation samples typically follow a ratio of 1 to 100 [4,31,38].Higher ratios are usually associated with the prediction of more limited sets of parameters.For instance, the size of the inert organic matter pool in the RothC model was assessed based on 28 different data sets worldwide [11], prior to extrapolation to 275 representative soil profiles occurring in Central Hungary [12].This trade-off between the size of the test sample and the number of parameters addressed originates from the high number of parameters involved in soil-crop models and the scarcity of data to estimate them.Even though parameters may be screened a priori through sensitivity analyses [25,39], the remaining set commonly comprises parameters relevant to various routines within the model (e.g., water balance, N turnover or crop phenology).Several categories are thus seldom dealt with simultaneously.Within a given category of parameters, it is in addition a general rule that the prediction of parameters is disconnected from model evaluation.This applies to the body of literature on pedo-transfer [2], with the notable exception of the 'functional' approach to water balance simulation in the Netherlands [40].
In this paper, we address the above limitations to model extrapolation by testing an a priori parameterisation procedure under a wide range of conditions in Europe.The network of trials covers a broad climatic gradient, extending from southern Italy to northern Germany, and a range of soil types.As to the procedure, it converts routinely-available soil properties (particle-size distribution, gravel content, bulk density, total soil carbon and nitrogen content) into functional characteristics involved in the simulation of water movement and soil biological transformations (Gabrielle et al., unpublished data).
Our primary objective was thus to assess the reliability of a soil-crop model in a case where no data are available to calibrate model parameters.In a second step, the model prediction errors, as revealed by the comparison against field-observations, were analysed and corrected by tuning the parameters associated with the processes responsible for the discrepancies.This adjustment aimed at quantifying the distance between the a priori set and the resulting quasi-optimal set.

MATERIALS AND METHODS
The steps involved in testing the procedure a priori in the various sites are diagrammed in Figure 1, and described in the paragraphs below.

Model description and parameterisation
CERES comprises sub-models for the major processes governing the cycles of water, carbon and nitrogen in soil-crop systems.A physical module simulates the transfer of heat, water and nitrate down the soil profile, as well as soil evaporation, plant water uptake and transpiration in relation to climatic demand.Water infiltrates down the soil profile following a tipping-bucket approach, and may be redistributed upwards after evapotranspiration has dried some soil layers.In both of these equations, the generalised Darcy's law has subsequently been introduced in order to better simulate water dynamics in fine-textured soils [16].
Next, a microbiological module simulates the turnover of organic matter in the plough layer, involving both an immobilisation of inorganic N, along with the transformations of inorganic N (denitrification and nitrification).In this version, the NCSOIL model [26] was substituted for the original module.NCSOIL comprises three OM pools, decomposing at a fixed rate and recycling into the microbial biomass.Nitrification and denitrification follow zero-order kinetics, which are modulated by soil temperature and water content.
Lastly, crop net photosynthesis is a linear function of intercepted radiation according to the Monteith approach, with interception depending on leaf are index based on Beer's law of diffusion in turbid media.Photosynthates are partitioned on a daily basis to currently growing organs (roots, leaves, stems, fruit) according to crop development stage.The latter is driven by the accumulation of growing degree days, as well as cold temperature and day-length for crops sensitive to vernalisation and photoperiod.Lastly, crop N uptake is computed through a supply/demand scheme, with soil supply depending on soil nitrate and ammonium concentrations and root length density.Crop demand is a function of the distance between actual and critical nitrogen content in the aerial and below-ground tissues.Critical nitrogen is defined as the optimum concentration for biomass production, as evidenced from field studies for various crops [5,23].It is a decreasing power function of crop dry matter.
CERES runs on a daily time step, and requires daily rain, mean air temperature and Penman potential evapotranspiration as forcing variables.The models are available for a large number of crop species, which share the same soil components.Readers may refer to [22] for a more complete description of CERES.
The soil parameters of CERES which were deemed site-specific pertained to either the water balance or biological transformation routines.The former category includes: wilting point, field-capacity and saturation water contents, saturated hydraulic conductivity (layerwise), and two coefficients describing the water retention and hydraulic conductivity curves.These parameters were calculated from soil properties (namely particle-size distribution, bulk density and organic matter content) by means of several pedo-transfer functions [9,22,37].
Soil biological parameterisation transformation amounts to breaking down the total soil organic matter (SOM) present in the plough layer into several pools featuring distinct decomposition rates and C:N ratios.Within NCSOIL, the SOM sub-model in our version of CERES, the pools comprise: crop residues, microbial biomass, actively decomposing humus and 'passive' humus.Here, we used the breakdown and pool settings proposed by [19], which is dependent on carbon management.More information on the parameters and their calculation may be found on the Internet at http: //www-egc.grignon.inra.fr/ecobilan/cerca/intjavae.htm,where the estimation procedure has been implemented within an on-line front-end.
As regards the crop growth component of CERES, cultivar-related parameters were either derived from the DSSAT v3 database of varieties [18], calibrated against field observations of phenological development, or based on the dynamics of dry matter accumulation in the various plant compartments.

Field data
The trials were conducted in four European countries and included four crop species (Tab.I).Experiments  Treatments: 2 fertiliser N rates: +100 and +120 kg N .ha -1     were set up in replicate blocks in all sites except at the Kiel site which had no replicates.Soil and crops were sampled every one to three months, and standard weather data as required by CERES were taken from meteorological stations located within 1 km of the experiments.In Candasnos, the solar radiation data were from a station 20 km from the site.
Soil was sampled to a depth of 60 to 120 cm by hand or using automatic augers, in 3 to 8 replicates which were pooled layer-wise in ten to thirty-cm increments.Soil samples were analysed for moisture content and inorganic N using colorimetric methods.In Candasnos, test strips were used for nitrate determination after a comparison with standard colorimetric techniques showed a good agreement between both methods.In Barrafranca, soil nitrate was monitored through its concentration in soil water using suction cups.
Individual plants were sampled in each block over areas of 0.25 to 1.00 m 2 , and subsequently separated into leaf, stem, ear (or panicle) and grain compartments.When monitored, leaf area index was measured using an optic area-meter, after which biomass samples were oven-dried for two days for dry matter determination.Lastly, biomass N content was analysed using combustion or digestion techniques except in trials where this variable was not monitored.

Model evaluation
The simulations of CERES were compared to field observations (means and standard deviations of the replicates) using graphics to capture dynamic trends, and statistical indicators gave an idea of the model's mean error.We used two standard criteria [34]: the mean deviation (MD) and the root mean squared error (RMSE).Here, they are defined as: MD=E (Si -Oi) and RMSE=(E [(Si -Oi) 2 ]) 1/2 , where Si and Oi are the time series of the simulated and observed data, and E denotes the expectancy.MD indicates an overall bias with the predicted variable, while RMSE quantifies the scatter between observed and predicted data, which is readily comparable with the error on the observed data.The significance level of both statistics was also determined, based on the standard deviations of the observed data [34].RMSE was thus compared with the average measurement error, calculated as: , where i denotes the standard deviation over replicates for sampling date number i.

Model calibration
When discrepancies between model predictions and field-observations occurred, their source was sought stepwise according to heuristic knowledge on the workings of the model.Errors were assumed to propagate from physical to chemical and biological processes.Therefore, we first checked the simulation of soil temperature and water balance, and then soil nitrate movement, crop dry matter accumulation and nitrogen uptake.The parameters associated with the routine appearing to cause the deviations were visually adjusted by trial-anderror, by looking at comparison charts (see Fig. 2 for an example).
Prior to fitting, a large sample of parameters were screened on the basis of the sensitivity of model deviations to their variations.The total set of parameters considered is presented in Appendix 1.

Model performance a priori
When parameterised a priori, CERES achieved an acceptable accuracy in a majority of sites and for most of the variables tested (Tab.II, and Figs. 2 to 6).This may be judged from the fact that in those cases the model's RMSE fell within the experimental error on the measurements with a 95% to 98% probability.At Kiel, observed standard deviations were not available, but the performance indicators were still within the range of published values for other models undergoing similar tests.Cited ranges for model RMSEs include: 0.02-0.08cm 3 .cm -3  for water content, 10-40 kg N .ha -1 for topsoil nitrate content [8], for several models in Germany); 0.8 tons of dry matter .ha -1 for crop biomass, 0.60 m 2 .m -2 for LAI, and 14 kg N .ha -1 for crop N uptake ( [1], with the APSIM model in Australia); 3.4-3.9tons .ha -1 for crop biomass and 1.26-1.7 for LAI ( [3], with CERES-Maize in Italy).Thus, there was only one site (Candasnos) in which CERES could be rejected with its default parameterisation.
As regards individual variables, there were no consistent patterns across sites for those that CERES failed to predict correctly.Significant deviations occurred for all the variables in at least one of the sites, and no particular routine could be singled out as intrinsically at fault.Crop nitrogen was the most difficult to simulate, with no Table II.Statistical indicators for the goodness of fit of CERES in the simulation of soil and crop variables in the various European sites.MD and RMSE stand for the model's mean deviation and root mean squared error, respectively, and were calculated for the baseline and calibrated scenarios.The hypothesis that MD is zero was tested using a two-tailed t-Test (p = 0.95).RMSE values were compared with the mean standard deviation of the measurements (RMSE ERR , see text).The hypothesis that model and experimental errors were equivalent was tested at two levels (p = 0.95 and p = 0.98).

Statistics
Leaf The extent to which the match against observed data improved through the calibration procedure varied from site to site, as may be seen by comparing the continuous and dashed simulation lines in Figures 2 to 6. Overall, most of the problems associated with the uncalibrated simulations tended to persist.Sorghum biomass was underestimated late in the season, due to a wrong timing of leaf senescence by CERES.In Châlons, although LAI dynamics were correctly simulated throughout the season, CERES underestimated final crop biomass and N content.During the second growing season in Candasnos, CERES over-predicted crop nitrogen and biomass, and the reason for it was unclear since similar discrepancies did not occur for soil water and nitrogen.Despite the change in the nitrification kinetics, CERES could not simulate the nitrate concentration peaks measured after fertiliser application in Villamblain (Fig. 2).It is likely that these discrepancies should be ascribed to a failure in some of the routines rather than to a wrong setting of their parameters.Thus, the statistical indicators of Table II may be considered as representing a structural limit of CERES in its current state, with the exception of Barrafranca where the parameterisation of leaf senescence should definitely be revised based on more thorough experimental work.

Model calibration for the various sites
In all situations, significant deviations occurred between simulated and observed data for at least one of the state variables monitored (Figs. 2 to 6).
The calibration procedure described in the Materials and Methods section was therefore undertaken to correct these biases.Its results are given in Table III, in terms of processes involved and associated parameters.Soil and crop water balance appeared to be the most critical routine, which is a logical consequence of the postulated error propagation scheme.Related parameters had to be adjusted in most sites, to improve the simulation of either downward water movement (through the field-capacity water content) or root uptake of water and nitrogen.The former process predominated under temperate climates (in the northern sites), whereas the latter prevailed under semi-arid conditions.This distinction illustrates the influence of climate type on model performance, through its effect on model sensitivity to the parameters of its various routines.
Conversely, little could be done to improve the simulation of soil N turnover.Related observed data (measurements of topsoil inorganic N) were either too infrequent over the season (in Kiel or Barrafranca), or the model was not sensitive to the associated parameters (Villamblain).In Châlons, a numerical optimisation of these parameters led to a set of values close to the default set used [14], prompting us to keep the latter.The Spanish site (Candasnos) turned out to be the exception to this rule, with simulations of topsoil nitrate improving when the size of the microbial biomass was increased from 0.9 to 2.3% of total soil carbon.With the default parameterisation, the low C:N ratio of soil organic matter resulted in high levels of simulated immobilisation of inorganic nitrogen and a systematic underestimation of topsoil nitrate.
Apart from those setting the duration of crop development phases whose effect could be readily assessed, crop parameters were deemed too numerous and their structure too complex to be calibrated against our limited sets A priori extrapolation of soil-crop models 125 of observations.In some instances this conservative option caused important biases.Most notably, simulated leaf senescence began too early at Barrafranca and Villamblain.There might have been some interference of model errors in the simulation of crop growth with the calibration procedure.Indeed, we focused on the sole soil parameters in the calibration and we adjusted them to variables which may have been influenced by crop processes and associated parameters.However, the fact that model errors on crop growth occurred late in the season supports our underlying assumption that they did not impact the calibration of soil parameters.Lastly, in one site (Villamblain) we decided to alter the nitrification equation by substituting the zero-order kinetics with a first-order scheme.Only through this modification could the dynamics of nitrate and ammonium be simulated within the range of concentrations observed (Fig. 2).This choice was in accordance with other similar models [21], but nevertheless goes somewhat beyond the scope of this paper.

Performance of the calibrated model
It is noteworthy that in the calibrated scenarios the accuracy of CERES did not improve greatly, overall.In many instances, the improvement for one variable resulted in a decreasing accuracy for the other variables.For example, fitting the crop biomass data in Barrafranca caused greater errors in the simulation of crop nitrogen.In Châlons, the visual calibration of microbial biomass against topsoil nitrate data was even associated with a higher RMSE than with the baseline set.This illustrates the limits of such a fitting procedure, although we favoured it because it relates to processes more directly than numerical adjustments do.Another rationale for that 126 B. Gabrielle et al. is the fact that CERES was poorly sensitive to some of its parameters, probably because it involves too many of them compared to the total number of model outputs.This makes the fitting of one parameter against one variable dependent on a number of other parameters.

DISCUSSION
In this extrapolation exercise, a first conclusion may be that a priori parameterisation resulted in a reasonable accuracy of CERES since its error proved acceptable in more than half of the cases tested.Thus, the procedure proposed should be considered as having a 50% probability of yielding acceptable values when employed in a new situation.
For the remaining cases, two routes may be investigated to explain the failure of CERES, as suggested in the introduction.Either the principles and equations within CERES were inadequate for the particular site considered, or the structure applied but model parameters were poorly estimated by the standard procedure.Of the two routes, we only investigated the parameterisation one here, assuming it was responsible for most of the discrepancies observed.
Calibration of the parameters which were detected as causing the discrepancies yielded slightly more acceptable simulations, with model error falling below experimental error for about 70% of tested variables in all sites.However, despite numerous attempts involving a dozen parameters, model calibration could not correct some of the deviations observed, such as the erroneous simulated spring peak in Châlons.One could object that only a thorough, multi-variable search of the minimum model error in its parameter space (through numerical optimisation techniques) would have enabled us to rule out parameterisation in the failure of CERES.In this work, however, we did not make use of such rigorous methods since they have proved difficult to apply to soil-crop models.These are indeed complex, highly non-linear and involve too many parameters to allow the automatic search of a global optimum [39].If we trust that our 'expert-guess' calibration yielded results close to the true statistical optimum of the model, two conclusions arise: (i) the a priori error of CERES is close to its structural (calibrated) error, since the performance indicators of Table II differ by at most 30%; however, (ii) in a minority of cases the structural error is too large and adjustments in model structure are warranted.
In future work on the role of structure vs. parameterisation in determining model accuracy a priori, two lines of work may be pursued.First, the influence of structure may be further investigated by comparing the performances of different models using the same basic information for parameter estimation.Previous work on model comparisons against the same data sets have shown that predictions vary greatly between models, or even between users for a given model, and that all models featured their own domains of validity [8,35].However, because they focused on the elusive issue of model validation rather than extrapolation they allowed some degree of site-specific calibration which prevents the identification of pure 'structure'-related effects.Comparison exercises where modellers would be forced to make use of a given set of soil and crop properties should therefore be encouraged.This would also help delineate the respective validity domains of models, which could be made use of by adjusting model structure to soil and Secondly, the outcome of various procedures (e.g., pedo-transfer functions) may be compared for a given model.Although it is known that such procedures are all the more relevant as they are applied to pedological conditions similar to those on which they were established [2], it would be interesting to check whether their predictions (input to the model as parameters) may be applied to new conditions.
Whatever the outcome of the above studies, there is a need to extrapolate the test presented in this paper to improve our confidence in large-scale model results.To facilitate the extension of such tests to a wide range of models and soil/crop conditions, we urge the community of model developers and users to organise itself so as to share both models and data sets to test them.

Acknowledgements:
The authors are indebted to the staff who contributed to the collection of the field data presented in this work.Special thanks are expressed to J.C. Germon (INRA, Dijon) who coordinated the research programme in Villamblain Financial support from Gessol (French Environment Ministry), the EC FAIR project CT-96-1913 and the Comisión Interministerial de Ciencia y Tecnología (CICYT, contract AGF94-019) is acknowledged.We would also like to thank two anonymous reviewers for their valuable comments on the manuscript.

Figure 1 .
Figure 1.Diagram of the parameterisation and evaluation steps of the CERES model.

Figure 2 .
Figure 2. Simulated (lines) and observed (symbols) time course of leaf area index and aerial dry matter (left) and surface (0-30 cm) moisture and nitrate content (right) for the winter wheat crop in Villamblain.The simulation lines are dashed for the baseline parameterisation, and solid for the calibrated parameter set.

Figure 3 .
Figure 3. Simulated (lines) and observed (symbols) time course of surface (0-30 cm) soil moisture and nitrate content (right) and crop dry matter and nitrogen uptake (left) for the unfertilised control crop in Kiel.The simulation lines are dashed for the baseline parameterisation, and solid for the calibrated parameter set.

Figure 4 .
Figure 4. Simulated (lines) and observed (symbols) time course of total crop dry matter and nitrogen uptake (left) and surface (0-30 cm) soil moisture and nitrate content (right) for the moderately-fertilised winter oilseed rape crop in Châlons.The simulation lines are dashed for the baseline parameterisation, and solid for the calibrated parameter set.

Figure 5 .
Figure 5. Simulated (lines) and observed (symbols) time course of total crop dry matter and nitrogen uptake (left) and surface (0-30 cm) soil moisture and nitrate content (right) for the moderately-fertilised sweet sorghum crop in Barrafranca.The simulation lines are dashed for the baseline parameterisation, and solid for the calibrated parameter set.

Figure 6 .
Figure 6.Simulated (lines) and observed (symbols) time course of crop aerial dry matter and nitrogen uptake (left) and surface (0-25 cm) soil moisture and nitrate content (right) for the non-fertilised winter barley crop in Candasnos.The simulation lines are dashed for the baseline parameterisation, and solid for the calibrated parameter set.

Table I .
Selected data for the field experiments used to test the parameterisation of CERES.

Table III .
Calibrated parameters for the various experiments simulated with CERES.