Determination of the Recovered-Fiber Content in Paperboard Samples by Applying Mid-Infrared Spectroscopy

Paperboard is widely used in different applications, such as packaging and graphic printing, among others. Consumption of recycled paper is growing, which has led the paper-mill packaging industry to apply strict quality controls. This means that it is very important to develop methods to test the quality of recycled products. In this article, we focus on determining the recovered-fiber content of paperboard samples by applying Fourier transform mid-infrared (FT-MIR) spectroscopy in combination with multivariate statistical methods. To this end, two very fast, nondestructive approaches were applied: classification and quantification. The first approach is based on classifying unknown paperboard samples into two groups: high and low recovered-fiber content. Conversely, under the quantification approach, the content of recovered fiber in the incoming paperboard samples is determined. The experimental results presented in this article show that the classification approach, which classifies unknown incoming paperboard samples, is highly accurate and that the quantification approach has a root mean square error of prediction of about 4.1.

I. INTRODUCTION 22 The use of recovered paper products has expanded considerably over the last decades 1 , mainly because of 23 environmental, economic and social benefits 2 . According to the European Recovered Paper Council 3 , 1 Europe reached a recycling rate of 71.7% in 2012. In addition, paperboard is the most recycled packaging 2 in Europe, exceeding the recycling rate of steel, glass or aluminum. Because of the economic crisis, paper 3 consumption in Europe has been reduced by 13% since 2007 whereas recovering has dropped by only 4 3.5%. Therefore, it is essential to ensure the quality of the recycled material in order to guarantee the 5 sustainability of the recycling proces 2 . 6 According to the U.S. Environmental Protection Agency 4 , paper fiber types are usually defined as either 7 recovered or virgin. Virgin fibers are defined as cellulosic elements obtained directly from trees (hardwood 8 and softwood) and other plants. It is worth noting that virgin fibers are newly pulped, so they never have 9 been previously used. Instead, recovered fibers are defined as post-consumer fibers derived from diverse 10 origins, including paper, paperboard and other fibrous materials which have been collected mainly from 11 manufacturing processes or municipal solid waste. Recovered fibers can also include pre-consumer 12 material, i.e. waste material recuperated from a manufacturing process. 13 The use of recovered fiber has several environmental benefits, since it reduces the demand of virgin fiber 14 from forest products, thus putting less pressure on the forests. This also allows saving energy, reducing 15 greenhouse gas emissions and extending available fiber supply. Recycling also minimizes landfill disposal 16 of a valuable resource, since it allows reducing the amount of waste and rejected materials. too higher amounts of recovered fiber can reduce environmental returns beyond a threshold percentage. It 1 means that the final manufactured product determines the maximum amount of recovered fiber in their 2 formulations. 3 Manual and automatic paper sorting systems are being commonly applied in many countries to recover 4 usable fibers from a waste stream 5 . Sorting methods pursue to recover the highest purity raw material from 5 the waste stream, since by this way chemicals addition and energy requirements are minimized while 6 facilitating the manufacture of high quality products 6 . However, manual sorting often faces several 7 drawbacks including unpredictable end product quality, relatively high costs (especially in developed 8 countries) or the exposition to dust, microorganisms or other pathogenic agents which may cause infections 9 to the work team 7 . Therefore automated paper sorting systems are acquiring importance in the paper 10 industry today and are constantly subjected to technological improvements. 11 There is a growing interest to develop automatic sorting systems. For example, a sorting system based on 12 NIR spectral imaging has been described for paper classification of different paper types, i.e. raw and 13 colored cardboard, newspaper and printer paper 8 . In Rahman et al. 5 it is described a paper sorting technique 14 based on image processing combined with statistical reasoning and machine learning systems to identify 15 different paper grades. A review of sorting methods for the paper industry can be found in Rahman et al. 6 16 However, available sorting systems either don't provide information about the composition of the analyzed 17 samples or haven't been applied to determine the recovered fiber content in paper samples. This paper 18 makes a contribution in this area since, as far as we know, it is the first attempt to automatically determine 19 the recovered fiber content in paperboard samples. 20 There exist different analysis methods to identify paper products containing recovered fiber in their 21 formulations. For example in Holik 9 it is described a system to determine the amount of damaged fibers. 22 Other methods are based on the analysis of chemicals and products that remain in recovered paper fiber 23 which inform of the content different from virgin fiber 10 . However these methods are time-consuming since 24 they require sample preparation. 25 In this paper the recovered fiber content in different paper samples is determined by analyzing the 1 spectral data provided by a mid-infrared spectrometer. Mid-infrared spectroscopy has been applied to 2 analyze pulp composition and paper structure 1,11 and it is known to be very fast and non-destructive 12 . In 3 this paper the Fourier transform mid-infrared (FTIR) spectrum of a given sample is further processed by 4 applying multivariate feature extraction algorithms combined with classification and statistical regression 5 methods. Unlike other approaches, the FTIR spectrum provides information about the paperboard sample 6 compositions instead of the external physical appearance. 7 To determine the content of recovered fiber in a given paperboard sample, this paper applies two 8 approaches. In the first one or classifier-based approach, unknown paperboard samples are classified into 9 two groups, namely low and high recovered fiber content according to their composition by applying two 10 feature reduction methods, namely principal component analysis (PCA) and canonical variate analysis 11 (CVA) as well as the k-nearest neighbor (kNN) classifier. In the second approach or quantification-based 12 approach, the content of recovered fiber is determined by applying a multivariate regression method, in this 13 case the partial least squares (PLS) algorithm. 14 The proposed system for determining the recovered fiber content of an unknown sample has several 15 appealing features including very fast response, it can be applied in situ, it doesn't require the use of 16 chemicals and reagents thus minimizing costs because both a chemical laboratory and a specialized 17 technician are avoided. It is worth noting that recovered paperboard samples present a particularly varied 18 diversity. Due to the wide range of compositions, i.e. the heterogeneity of the samples dealt with, this is a 19 highly complex problem. 20 It is worth noting that the proposed quantification system, which is fast and easy-to-use, may be highly 21 valuable for paperboard manufacturers since they need to check the quality of their incoming stock. It is 22 also useful for packaging industries and especially for food packagers since they need to implement very 23 strict quality controls to ensure that the content of recovered fiber is below a certain threshold value to 24 avoid health related problems due to chemicals migration to foodstuffs. final product may include two other layers (they are not included in this work) which are composed of 10 white recovered fibers (top side) and recovered paperboard (back side). When containing these two layers, 11 the final product is designed as fully coated white lined chipboard with grey back and it is mainly applied 12 for packing in the food industry, textiles, beverages, detergents and cleaning products among others. 13 It should be pointed out that the analyzed samples were prepared in two different time periods, therefore 14 increasing the heterogeneity of the overall sample set since the incoming stock presents different origins 15 and compositions. All samples were prepared in the facilities of Reno De Medici Ibérica. 16 A total amount of 31 paperboard samples were made by following the above mentioned manufacturing 17 procedure. As explained, since the analyzed samples are made of recovered fiber with different proportions, 18 this group of samples is highly heterogeneous. Therefore the automatic quantification of the recovered fiber 19 content is a highly challenging problem. 20 The whole set of 31 samples was split into a training and a prediction set to evaluate the performance of 21 the statistical models proposed in this paper 13 . The samples of the prediction set are different than those of 22 the training set. Whereas the samples of the training set are required to calibrate the statistical classification 23 and quantification models, the prediction set samples are used to predict the content of recovered fiber 24 using different samples than those used in the calibration stage. 25 Table I shows the paperboard samples dealt with, their origin, and the set in which they are assigned. To acquire the spectral data, a FTIR spectrometer model IR Spectrum One (S/N 57458) from 4 PerkinElmer equipped with an attenuated total reflectance module (ATR) and a lithium tantalate (LiTaO 3 ) 5 detector. The 45º ATR top-plate module has a clamping system to ensure an adequate contact between the 6 solid sample and the single reflection diamond crystal. 7 The spectra of the raw paperboard samples were acquired at 25±1ºC by using an ATR cuvette over the 8 wavenumber range 4000-650cm −1 by averaging four scans, with a resolution of 1 cm -1 . Three readings 9 were done in different parts of each sample, which were averaged. 10 It is well known that by analyzing the ATR spectrum of a particular material, different types of 11 components such as organic, inorganic and polymeric molecules among others may be identified. In the 12 case of analyzing the ATR spectrum of a paperboard sample, most of the spectral bands are due to the 13 cellulose 14 .
14 In this paper the ATR spectra of 31 paperboard samples is acquired (one per sample), transformed to 15 absorbance spectra and further analyzed by applying multivariate mathematical methods. The spectrum of 16 each sample consists of 3351 data points (x,y), x being the wave-number and y the absorbance. This large 17 amount of variables per sample combined with the inherent difficulty of the studied problem makes it is 18 very difficult to determine the recovered fiber content of a given paperboard sample directly from the raw   Fig. 3. Link between the PCA and CVA algorithms.

Canonical variate analysis (CVA)
Dispersion map  Fig. 4. The two approaches applied to determine the recovered fiber content of the analyzed samples. 14 All the multivariate statistical methods explained in this section have been programmed by the authors of 1 this work using the Matlab® programming language. 2

3
In this section the results attained with both analyzed approaches, i.e. the classification and quantification 4 approaches are presented. All results are based on the analysis of the ATR spectra after suitable 5 preprocessing, which includes baseline correction, smoothing, transformation to absorbance spectra, and 6 analysis of the first and second derivatives with or without mean centering or unit variance scaling. 7 All results shown in this section are based on the 31 paperboard samples, which recovered fiber content 8 is known, since they were expressly prepared for this research work in the Reno De Medici Ibérica 9 facilities. These samples were split into two groups, i.e. the training and the prediction sets. Whereas the 10 training set contains 21 samples, the prediction set includes the remaining 10 samples. Therefore the 11 prediction set contains approximately one-third of the total set of samples. 12 The whole absorbance spectrum (4000-650 cm -1 ) for the 31 paper samples provided a data matrix with 31 13 rows and 3351 columns, from which a first-derivative matrix of 31×3341 components was obtained as well 14 as a 31×3331 second-derivative matrix by applying the Savitzky-Golay algorithm, which are shown in Fig.   15 5. It is worth noting that prior to calculating the derivatives, spectra were preprocessed by applying the 16 baseline correction and smoothing operations. However, a prospective analysis showed more accurate 17 results for both the classification and quantification approaches when dealing with the first derivative of the 18 spectra with mean centering, so all results presented in this paper are based on this preprocessing method. Paper industry market often demands paperboard products with either high or low recycled fiber content, 4 which depends on the specific application of the final product. In these cases, a screening tool such as the 5 one developed in the next subsection, based on PCA + CVA may be suitable. However, when a 6 quantification of the recovered fiber content is required, the former method is not suitable. When applying 7 the quantification approach based on the PLS algorithm, it is mandatory to prepare a calibration set of 8 paperboard samples (the recovered fiber content of each sample must be known accurately) containing all 9 the interval of recovered fiber content. Therefore, this strategy requires a more complex and accurate 10 preparation of the pattern samples in the whole interval of concentrations dealt with.

11
A. Classification approach (PCA + CVA + kNN) 12 In some cases, manufacturers need a fast method to determine if the incoming paper samples contain or 13 not contain a high percentage of recycled fiber. This is the case, for example, of the packaging industry, 14 where the use of recovered fiber in paperboard formulations used as packaging materials to be in contact 15 with foodstuffs is of special concern. Therefore, in such applications it is highly desirable to dispose of a 16 fast and nondestructive screening tool for discriminating between incoming samples with high and low 17 content of recovered fibers.
Under the classification approach, paperboard samples were split into two groups, namely low and high 1 recovered fiber content, as shown in Table II. Therefore, this approach classifies unknown incoming 2 paperboard samples within one of these two classes by applying the feature extraction methods PCA + 3 CVA in combination with the kNN classifier. 4  As detailed in Section II, the PCA is applied before the CVA algorithm. Therefore it is mandatory to 7 select a reduced number of PCs arising from the PCA. Although there is not any standard method to select 8 the appropriate number of PCs, in this paper those explaining at least the 97 % of the overall variance were 9 retained. Fig. 6 shows that this condition is accomplished when retaining the first 10 PCs. Afterwards the 10 CVA algorithm was applied to the 10 retained PCs.  Since the content of recovered fibers in the incoming stock has a profound impact on the quality and final 12 properties of the manufactured paperboard products, in many applications it is highly desirable to dispose 13 of a fast and nondestructive tool to determine the approximated content of recovered fibers. 14 The second approach to determine the recovered fiber content of the analyzed paperboard samples is 15 based on the PLS regression algorithm. Similarly as in the case of the PCA algorithm, it is required to 16 select the appropriate number of latent variables to avoid over fitting the prediction model. To this end, the 17 mean squared error of cross-validation (MSECV) of the calibration sample set was calculated as a function of the number of PLS components retained, which is shown in Fig. 8. Note that the MSECV was