Prediction of crude protein and classification of the growth stage of wheat plant samples from NIR spectra

Near-infrared spectroscopy (NIRS) was used to analyse the crude protein content of dried and milled samples of wheat and to discriminate samples according to their stage of growth. A calibration set of 72 samples from three growth stages of wheat (tillering, heading and harvest) and a validation set of 28 samples was collected for this purpose. Principal components analysis (PCA) of the calibration set discriminated groups of samples according to the growth stage of the wheat. Based on these differences, a classification procedure (SIMCA) showed a very accurate classification of the validation set samples: all of them were successfully classified in each group using this procedure when both the residual and the leverage were used in the classification criteria. Looking only at the residuals all the samples were also correctly classified except one of tillering stage that was assigned to both tillering and heading stages. Finally, the determination of the crude protein content of these samples was considered in two ways: building up a global model for all the growth stages, and building up local models for each stage, separately. The best prediction results for crude protein were obtained using a global model for samples in the two first growth stages (tillering and heading), and using a local model for the harvest stage samples.


INTRODUCTION
Near-infrared spectroscopy (NIRS) has been widely used in quality control in plant and grain material for different purposes due to its properties: speed, accuracy, precision and non-destructiveness (Williams & Sobering 1992;Fahey & Hussein 1999). One of the main fields of application of NIRS technology is the quality determination of forages (Murray 1993;Shenk & Westerhaus 1994). This technique has been widely used in forage analysis since the 1970s (Norris et al. 1976), growing because of the development of chemometric procedures used for the calibration of these instruments (Martens & Naes 1989;Osborne et al. 1993).
Grain wheat (Triticum aestivum L.) is the most important food consumed directly by humans, but a significant amount of the crop is also used for animal feeding, either as a grain or as a forage crop (Heyne 1987). Wheat growth can be divided into several physiological growth stages (Zadoks et al. 1974) : from seeding to tillering, from tillering to heading and from heading to grain ripening.
Agronomists and plant physiologists have studied the different aspects of nitrogen (N) in plants in order to understand the effects of the crop management or the environment (Melaj et al. 2003;Vetsch & Randall 2004). In wheat, N content is not only a valuable nutrient parameter per se but it is also usually well correlated with digestibility. In other cases, the influence of different fertilization timings on N accumulation and losses (Woolfolk et al. 2002;Abad et al. 2004), a subject that plays an important role in wheat breeding (Capper 1988;Noaman & Taylor 1990), is of interest. Many studies compare and follow the N contents of the plant throughout the growing cycle (Cherney & Marten 1982 a, b ;Noaman et al. 1988;Noaman & Taylor 1990), especially in later stages where the plant translocates part of the N accumulated in the plant leaves and stems to the grain where it becomes an important element that greatly influences the grain quality (Demarquilly 1970;Harper 1994).
It is, therefore, useful to obtain NIR calibrations for predicting N content as well as to determine, if possible, the growth stage of a particular sample in order to facilitate the agronomic and physiological studies. Crude protein calibrations of wheat samples at different growth stages in different plant parts have been reported (Noaman et al. 1988;Noaman & Taylor 1990), although a systematic analysis of the prediction ability of calibrations restricted to a particular growth stage with respect to the global calibrations has not been reported.
From a chemometric point of view some authors claim that populations that split into patterns are better described by local calibrations (Cleveland & Devlin 1988;Naes et al. 1990). However, the success of inverse least squares (ILS) methods are closely related to their ability to give accurate results in the presence of unknown interferents or matrix effects. Local calibrations have been used in problems with high non-linearity between X and Y variables. When large databases of samples are available, local calibrations are developed using some similarity measurements to define the calibration set (Sinnaeve et al. 1994;Dardenne et al. 2000). In the most common case, local calibrations are dressed to reduce the range of the Y-values of the samples of a calibration so that non-linearity effects are reduced. But this procedure is sometimes excessively arbitrary in terms of the ranges selected and the equation applied to a given sample. Conversely, if classification procedures to a given population can be applied successfully, the application of local calibrations appears to be meaningful.
In the present case, the practical use of calibrations for particular growth stages requires the availability of classification procedures to recognize the development stage of the samples. Classification or discrimination procedures based on NIR are scarce in agronomical work. Kallenbach et al. (2001) have reported good estimates of autumn dormancy of alfalfa using NIR, a work that indicates the ability of NIR to recognize growth characteristics of alfalfa. Delwiche & Graybosch (2002) have proved the usefulness of NIR in the identification of waxy wheat flours, or of blends of waxy and non-waxy cultivars.
The aim of the present work is to evaluate the ability of NIR spectra to classify wheat plant samples according to the growth stage as well as to discuss the best strategy to predict protein content of these samples from NIR spectra.

Sample description and management
The material used for the current study consisted of 100 wheat plant samples collected in Lleida, north-east of Spain (41x39kN, 0x51kE), from different locations where N fertilizer treatments varying from 0-200 kg/ha have been applied in order to have enough reference property range for the study. The samples covered the three different stages of the wheat plant growth : tillering (A), heading (B) and harvest (C). Wheat was also obtained from a wheat nitrogen fertilization trial from the UdL (Universitat de Lleida) -IRTA (Institut de Recerca i Tecnologia Agroalimenta`ries) research fields. Fresh samples (200 g) were dried in an oven at 65 xC for 48 h. The dried material was milled and sieved to a particle size of 1 mm before obtaining the N content by a Kjeldahl method. The near-infrared (NIR) spectrum of each sample was recorded in a Bran+Luebbe's Infra-Alyzer 500, a spectrometer that reads absorbances from 700-2500 nm, recording the spectra every 4 nm and giving a total set of 451 absorbances. All the instrument management and spectral data recording was controlled with the software SESAME (Bran-Luebbe 1998). After recording the NIR spectrum of the samples, crude protein values were obtained by multiplying the N content given by the Kjeldahl method by the factor 6 . 25. Table 1 shows the main statistics associated with the crude protein content of all the samples of wheat plant as well as the main statistics of the subsets corresponding to the different growth stages.

Multivariate models
A principal components analysis (PCA) model of the samples characterized by the NIR spectrum was used to give an overview of the whole sample set and to detect groups of samples. A classification model was developed by the Soft Independent Modelling of Class Analogy methodology (SIMCA : Brereton 2003). SIMCA is based on making a PCA model for each class of samples to be modelled in order to characterize the class. Unknown new samples are compared with the PCA-class models and assigned or not to these classes, according to their similarities to the class model measured in terms of the residual and the leverage of the sample to be classified for a given significance level. Calibration models of crude protein were calculated using Partial Least Squares (PLS) regression (Geladi & Kowalski 1986;Martens & Naes 1989). For this purpose, the full set was divided into two subsets, one for calibration and the other made up of 28 samples for validation purposes. Unscrambler software version 7.6 SR-1 (CAMO 2001) was the practical tool used for all the multivariate models applied (for analysis, classification and regression). Spectra of the samples were centred for developing all the models of the work. No other pre-treatment of the data was applied.

Classification analysis in terms of the stage growth
A PCA model of the centred spectroscopic data for the 72 samples of the calibration set has been calculated considering a maximum number of 10 principal components (PCs). Table 2 presents the explained validation variance as proportion of the total data variance. Two PCs were enough to explain 0 . 976 of the total data variance. This is a high value, and the score plot PC1 v. PC2 shown in Fig. 1 shows that samples fall into discrete groups. In order to facilitate the study of patterns in spectroscopic data, only the character concerning the growth stage (A, B or C) is presented. Samples from the harvest stage (C) cluster separately on the left of the plot, while the first two growth stages (A and B) both appear on the right, split into two different subgroups (according to the growth stage). The distance between these two groups is smaller, indicating a higher similarity among these samples in comparison with the samples collected at the harvest stage. This is consistent with the wellknown property that the main changes appear at the end of crop growth. Figure 2 plots the mean NIR spectra of the samples of each growth stage. In agreement with the PCA results, the mean spectra at growth stages A and B are much more similar than the mean spectra of samples at growth stage C.
These results suggest the ability of NIR to discriminate between at least some growth stages and the convenience of studying the performance of a global calibration model as well as particular calibration models for each growth stage in order to determine the best strategy in the prediction of the crude protein of wheat samples.
A classification procedure based on the application of the SIMCA method has been developed in the present work. PCA models for each class of samples used to build the classification procedure were made using the samples belonging to each growth stage. A brief description of the resulting PCA models is given in Table 3 in relation to the number of samples in the calibration set for each growth stage, the optimum number of PCs for each model and the resulting proportion of the explained validation variance. The percentage of samples successfully classified into the proper growth stage is shown in Table 4. The applied classification criterion is based on the comparison of the residual of a sample with the average residual of the samples of a given class to the corresponding class model. All samples are correctly classified except one sample belonging to growth stage A that is classified as belonging to both A and B stages. The classification is thus adequate because at a 5 % level of  confidence, less than 5 % of the samples (1 out of 28) of a given class appear incorrectly classified. If the leverage as well as the residuals are taken into account in the classification criterion, the classification results can be improved still further. Figure 3 plots the distance from each sample to the A-stage model (Si) v. the leverage (Hi). The distance of a sample to a class, Si, is defined as the square root of the residual variance for the sample once projected into the model class, while Hi equals the distance from the projected sample to the model centre. The A-stage model has been chosen as the origin of distances in Fig. 3 to clarify the classification results for samples belonging to A or B classes, the closest ones. Figure 3 shows a well supported pattern between classes, all A-stage samples being at the lower left corner of the figure, where both leverage and residuals to the A-class model are lower than a value determined according to the significance level. Assignation of any sample to the appropriate class appears accurate and unambiguous.
The distance between classes in the model is used to summarize the ability of a procedure to classify samples between two classes. This parameter is calculated as the ratio between the averaged residual of a class of samples to a model and the averaged residual of the proper samples of the class. It is usually said that a good separation between models is found when the model distance is higher than 3. Thus, the model distance relative to the class of growth stage A, shown in Fig. 4, indicates a good separation between all three growth stages and, especially, a very large distance between the harvest and the two first stages (notice that, according to the definition of the model distance, the distance value for the proper class, A in Fig. 4, is 1). Although the projected A samples appear close to the B samples in the PC1 v. PC2 score plot of the model (Fig. 1), the distances seen in Fig. 4 are in agreement with the results obtained in the initial PCA model. The model distance found between the growth stages A and B also reflects the fact that not only PC2 but also PC4 is able to discriminate well between these two stages. Also, the optimum number of PCs in the model that defines the samples of the growth stage B for the classification procedure is 4.
The discrimination power can be used to estimate the importance of the variables in the classification procedure. Like model distances, variables with a numerical value of the discrimination power higher than 3 can be considered important for the classification. Figure 5 shows the discrimination power between the PCA models used for the classification procedure. Variables from 1860-2060 nm discriminate best between the first two stages, while almost all  variables are able to discriminate well between either of these first two stages and the third one.

Calibration of crude protein
Once the classification procedure was available, two possibilities were considered in the calibration stage : the building up of a global calibration model taking into account all the growth stages (A+B+C), and the building up of local calibration models for each growth stage separately. All these models were developed using the PLS algorithm and validated using both Full Cross Validation and a validation set made up with eight new samples of the tillering stage (A), 10 new samples of the heading stage (B) and 10 new samples of the harvest stage (C).
A brief description of the main statistics of these models is presented in Table 5, which shows the number of samples involved in each calibration, the optimum number of PCs for each PLS model and, for the optimum number of PCs, the correlation coefficient (r) of the predicted v. measured diagram. The prediction ability of the model is also presented in the table, measured as the Root Mean Square Error of Full Cross Validation (RMSECV) or the Root Mean Square Error of Prediction (RMSEP).
For each model, the RMSECV and RMSEP values are very similar, as can be seen in Table 5. However, due to the reduced number of samples of the validation sets the discussion that follows uses the RMSECV values of the different models, although similar results are obtained using the corresponding RMSEP values. Figure 6 shows the evolution of the RMSECV with respect to the number of PCs for all the regression models. The harvest stage (C) reaches the lowest RMSECV values of all models, indicating that the local model is the best one for samples of this harvest stage. However, the RMSECV of this local model should be compared with the RMSECV of the global model (labelled A+B+C) but calculated using only the harvest-stage samples in the validation step. The new RMSECV value of the global model, restricted to the C-stage samples, is 1 . 02. Compared to the RMSECV for all samples found in Table 5 (value 1 . 14), this shows a similar accuracy of the global model in the prediction of the C-stage samples or in the prediction of samples belonging to stages A and B. Accordingly, the best model for prediction of crude protein in the harvest stage of wheat samples is the local model, suggesting that the matrix effect cannot completely be modelled, or at least not with the sample set available in the current study. In fact, the harvest stage has been found as the most different subset of samples appearing in all the steps involved in the current study.  Conversely, for both of the first two growth stages, A and B, the global model and the local model yield similar values of RMSECV. This result indicates the successful reduction of the matrix effects corresponding to the stages A and B by the calibration procedure. As seen in Fig. 1, the distance between classes A and B is very much smaller than the distance from either of these classes to C.
The groups of samples that appeared in the score plot PC1 v. PC2 of the PCA global model also appear in the corresponding diagram of the PLS global model (Fig. 7). Actually, both Figs 1 and 7 show more or less the same distribution, but in Fig. 7 stages A and B appear well defined and not overlapped. This indicates that the main sources of variance in the raw spectra, PC1 and PC2, are closely related to the PLS Factors 1 and 2, and that the content of crude protein and related properties are the main source of variance in the NIR spectra.
Because the obtained results suggested that local calibrations for classes A and B were not required, the accuracy of a global model for A and B stages built with 25 samples of the whole A+B calibration group was checked. This number of samples is similar to that used in the local models. In this way, a possible effect of the number of samples when comparing this A+B model with the local models was avoided. The new model (labelled as model A+B) was also validated using both Full Cross Validation and a Test Set Validation made up with 10 new samples of stages A+B. The results obtained showed that this model had a similar prediction ability to that of the global or the local calibrations made for each stage (Table 5 and Fig. 6). Thus, these groups can be modelled together, being the matrix effects around the first two growth stages of the wheat modelled successfully by PLS. Finally, it can be noted that considering a global model for the A+B stages, any sample classified as belonging to both stages in the classification step does not introduce any ambiguity as to which is the convenient model to be used in the crude protein prediction. The prediction errors obtained for crude protein of these samples with the local models A or B have been checked and are below the RMSEP value considered for these local models.

DI SCU SSI ON
Although the potential of NIR in the quality prediction of wheat grain or in the forages is well established (Fahey & Hussein 1999), the results reported in the present work provide evidence that NIR can be used not only to predict quality parameters within growth stages with accuracy similar to the reference methods but NIR can also be used to recognize the growth stage through the classification procedure developed.
Samples of the different wheat growth stages considered, from seeding to tillering, from tillering to heading and from heading to grain ripening, showed clear grouping when a PCA model was calculated from the raw NIR spectra of these samples. This pattern allowed an accurate classification of samples in terms of the growth stage. In the present work all of the samples were successfully classified by means of the SIMCA method (built from PCA models of the NIR spectra of the samples belonging to each growth stage) in the corresponding stage.
A PLS regression model of crude protein content was developed for all the population of wheat plant samples (stages A+B+C). The same groups detected in the score plot PC1 v. PC2 of the global PCA model appear in the score plot defined by the first two factors of this PLS model, indicating the relevance of the protein content as a source of variance of the NIR spectra and the evolution of this content along the vegetative cycle. The results of this work indicate a decrease of the mean N content in the aerial tissue along the vegetative cycle as has been previously shown (Cherney et al. 1982 a, b ;Noaman et al. 1988;Noaman & Taylor 1990).
Local PLS models of crude protein for each stage were also calculated. The comparison of these models indicate that the global (A+B) model seems to be the best alternative for crude protein prediction of samples belonging to the first two growth stages A or B, considering both the prediction error and the correlation coefficient between predicted and measured values of the property for the samples. For samples of the harvest stage the best results are obtained with a local calibration model built from only the samples corresponding to this stage. Noaman et al. (1988) also reported a good ability of the NIR for prediction of crude protein along the different growth stages. The application of their strategy requires a priori knowledge of the growth stage of the sample and no comparison with results corresponding to a global model was reported. The present work shows that NIR can be successfully used for the automatic detection of the growth stage of the sample, this procedure being an important aid in minimizing errors of the sample management in the laboratory. The present work has also addressed the assessment of the error that would arise if a sample was predicted with an improper equation (an equation corresponding to another growth stage). Samples located in the limits of two classes (which might be classified as belonging to both classes) can be predicted without a significant increase of the prediction error by the equations of both classes. This result is expected to hold for the growth stages reported in Noaman et al. (1988), which cover a lower N content range as compared with the samples used in the present study. However, the error increases when samples from one growth stage are predicted with calibration equations of other clearly distinct stages (the distance between classes A and B is much smaller than the distance from either of these classes to C).
It is concluded that NIR can be a valuable tool to discriminate the growth stage of wheat plant samples and to predict protein content of these samples. Its use should facilitate the assessment of the effects of fertilization and plant management strategies and provide wheat breeders with a powerful technique to follow the use of different cultivars.