On the Compositional Analysis of Fatty Acids in Pork

Fatty acid (FA) composition of pork is an important issue for the pig industry and consumers. Fatty acid composition is commonly described as the percentages of a set of FA relative to total FA and therefore should be statistically treated as compositional data. To our knowledge there is no reference in the literature where specific methods for compositional data analysis have been applied to analyze FA composition in meat quality research. The purposes of this study were (1) to present an overview of compositional data analysis techniques, (2) to apply them to the analysis of the FA composition of muscles and subcutaneous fat from 941 pigs as a case study, and (3) to discuss and interpret the results with respect to those obtained using standard techniques. Results from both approaches indicate that FA composition differed across tissues and muscles but also, for a given muscle, with the intramuscular fat content. It is concluded that FA composition in pork did not display enough variability to become critical for standard statistics, particularly if the individual FA parts remain the same across experiments. However, even in such case, compositional analysis may be useful to correctly interpret the correlation structure among FA.


INTRODUCTION
The quality of fat is a feature becoming increasingly important for both the industry and consumers. Currently, there is enough evidence indicating that fat quantity and quality affect the nutritional, sensory, and technological properties of animal products, particularly pork (Wood et al. 2003;Schmid 2010). Fat quality is chemically defined in terms of fatty acid (FA) composition, which is commonly presented as a set of percentages corresponding to the relative content of each individual FA (or the sum of some of them) with respect to the total content of the FA that had been determined, i.e., as a vector of positive values whose sum is a constant. Technically, this sort of data is what in statistics is known as compositional data, i.e., multivariate data where the variables represent parts of a whole (Pawlowsky-Glahn and Egozcue 2006). Compositional data are intrinsically multivariate because each component cannot be interpreted without relating it to any of the other components. They only represent relative information and therefore standard statistical techniques, which were conceived to deal with variables measured on an absolute scale, are inappropriate. Consequently, specific methods for compositional data analysis have been developed since the 1980s (Aitchison 1982(Aitchison , 1986Aitchison and Egozcue 2005;Bacon-Shone 2011). To our knowledge there is no reference in the literature where compositional data analysis had been applied to meat quality research.
Much research has been undertaken in recent years to assess the effect of influential factors (such as diet, genotype, gender, body weight, age, or fat content, among others) on the FA profile of pork fat and meat, mostly sampled from backfat and loin chops. However, it is also known that the pattern of FA deposition differs not only between the adipose and muscle tissues (Franco et al. 2006;Duran-Montgé et al. 2008;Yang et al. 2010) but also among muscles (Sharma, Gandemer, and Goutefongea 1987;Leseigneur-Meynier and Gandemer 1991). The University of Lleida has assembled a biorepository of pig fat and muscle specimens for conducting research studies on meat quality, including samples from a Duroc genetic line used for producing premium quality pork cuts. Currently, the associated dataset to this line, with around 1700 FA profiles from different muscles and backfat locations (Section 2), provides a valuable resource for revisiting the pattern of FA deposition in pork under a compositional data analysis setting. The purpose of this study was (1) to review the fundamentals of the compositional data analysis techniques (Sections 3-4), and then (2) use this approach to examine the variations in the FA profile of pork meat and fat as a case study (Section 5). The utility of adopting the compositional data approach in the statistical analysis of FA compositions in meat products is discussed in light of the results of the case study.

DESCRIBING THE CASE STUDY
The case study comprises data from 971 purebred Duroc barrows used and referenced elsewhere (Bosch et al. 2012;Ros-Freixedes et al. 2012). The pigs were raised at a carcass market weight of around 95-100 kg (Table 1) in 12 commercial batches from 2001 to 2008. All pigs had ad libitum access to a commercial feed and were slaughtered at the same abattoir. There, a sample of the muscle gluteus medius (GM) was collected from the left ham of all pigs. Moreover, in randomly chosen subgroups of them, additional samples of the muscles longissimus dorsi (at the level of the third and fourth last ribs; LD), semimembranosus (SM), and latissimus dorsi (LT) were also taken, as representative muscles of the loin, ham, and shoulder, respectively. Finally, two samples of the subcutaneous backfat (SF) were obtained at the positions where GM (SFGM) and LD (SFLD) muscle samples were taken. The samples of SM, SFGM, and SFLD were collected immediately after slaughter and frozen in liquid nitrogen until required for analysis. The samples of GM, LD, and LT were collected after chilling for about 24 h at 2°C, vacuum packaged and stored in deep freeze until analysis. The number of samples per muscle and backfat location by batch is detailed in Table 1. Once defrosted, a representative aliquot from pulverized freeze-dried samples was used for fat analysis. The intramuscular fat (IMF) content and FA composition were determined in duplicate by quantitative determination of the individual FA by gas chromatography (Bosch et al. 2009). Fatty acid methyl esters were directly obtained by transesterification using a solution of 20 % boron trifluoride in methanol (Rule 1997). Methyl esters were determined by gas chromatography using a capillary column SP2330 (30 m × 0.25 mm; Supelco, Bellefonte, PA) and a flame ionization detector with helium as carrier gas at 1 ml/min. The oven temperature program increased from 150 to 225°C at 7°C/min and injector and detector temperatures were both 250°C. The quantification was carried out through area normalization with an external mixture of FA methyl esters (Sigma, Tres Cantos, Madrid). The internal standard was 1,2,3-tripentadecanoylglycerol. The FA composition was expressed as the percentage of each individual FA relative to total FA. The complete profile for each sample included saturated (SFA; C14:0, C16:0, C18:0, and C20:0), monounsaturated (MUFA; C16:1n-7, C18:1n-9, and C20:1n-9), and polyunsaturated (PUFA; 18:2n-6, C18:3n-3, C20:2n-6, and C20:4n-6) FA ( Figure 1). The IMF content in the four muscles was calculated as the sum of the individual FA expressed as triglyceride equivalents (AOAC 1997) on a dry tissue basis. Because C20:0 is present at very low levels, it was not detectable in a few samples. The zero values represent a mathematical challenge for compositional data, which only represent relative magnitudes. To solve this problem several replacement strategies have been proposed (Martín-Fernández and Thió-Henestrosa 2006;Palarea-Albaladejo, Martín-Fernández, and Gómez-García 2007). For its simplicity, and because the proportion of zero values is low in this dataset, we followed the strategy in Sanford, Pierson, and Crovelli (1993) and replaced the zeros by 0.55 times the lowest measured value in each tissue before calculating the FA percentages.

SETTING THE PROBLEM
One of the drawbacks of analyzing compositional data with conventional methods is that the results can be subcompositionally incoherent (Aitchison 1986, Chapter 3;Pawlowsky-Glahn and Egozcue 2006). This becomes particularly evident in correlation analyses, where the correlation coefficient between two given components can differ depending on whether they are expressed relative to a set of components or another. In order to highlight this problem we calculated the correlation between pairs of FA under two different compositional settings. In the first one, the correlation matrix among the complete 11-part FA profile of GM was calculated (Table 2, rows a), while, in the second, the correlation was calculated between each SFA, MUFA, and PUFA expressed relative to the total SFA, MUFA, or PUFA, respectively, in such a way that, for instance, C14:0, C16:0, C18:0, and C20:0 summed up to 100 % (i.e., the SFA subcomposition was closed). Then, the correlations among the FA in each subcomposition (SFA, MUFA, and PUFA) were recalculated (Table 2, rows b, c, and d, respectively). As can be seen in Table 2, the two correlations were not consistent, with the discrepancy being particularly relevant for those between C16:0 and C18:0, C16:1 and C18:1, and C18:2 and C20:4, which changed, respectively, from 0.80 to −0.91, 0.11 to −0.98, and 0.59 to −0.89. These changes, both in magnitude and sign, are due to the fact that components in compositional data do not vary independently. It can be proven that for a D-part composition Table 2. Correlations among raw fatty acid percentages in gluteus medius when expressed relative to either the full fatty acid composition (rows a) or the corresponding saturated (SFA, rows b), monounsaturated (MUFA, rows c), and polyunsaturated (PUFA, rows d) subcompositions.
where κ is a constant, often 1 or 100 %), then cov(x 1 , x 2 ) + cov(x 1 , x 3 ) + · · · + cov(x 1 , x D ) = − var(x 1 ). Therefore, at least one of the covariances of x 1 with the other components must be negative (Pearson 1987;Aitchison 1986, Chapter 3;Filzmoser and Hron 2009). This negative bias causes that an increase in one of the components results in the decrease in, at least, another one. Hence, the correlations are not free to range over the interval [−1, 1]. The distribution of the bias over the covariance terms, along with the subsequent changes in the correlation matrix among components, depends upon which parts are included in the composition. As a consequence, the above correlations do not have any neat interpretation. This simple example highlights that the analysis of compositional data using standard techniques may lead to spurious and inconsistent results across subcompositions.

OVERVIEW OF COMPOSITIONAL ANALYSIS
Compositional data need to be statistically treated considering that they only carry relative information. Two general approaches have been developed to deal with them. The first is known as staying-in-the-simplex approach. It operates in the so-called simplex space (S D , for D-part compositions) and uses the Aitchison geometry (Aitchison 1986, Chapter 2). The second approach resorts to log-ratio transformations (Aitchison 1986, Chapter 7;Egozcue et al. 2003) to map the simplex to the real space, where the more familiar Euclidean geometry is used and standard statistics methods can be applied. Both approaches can be used complementarily depending on which geometrical framework is preferred. A brief description of both approaches is given below. Some software has been developed to easily process and analyze compositional data, such as the freeware CoDaPack (Thió-Henestrosa and Martín-Fernández 2005; Comas-Cufí and Thió-Henestrosa 2011a, 2011b) and the R packages 'compositions' (van den Boogaart, Tolosana, and Bren 2011) and 'rob-Compositions' (Templ, Hron, and Filzmoser 2011).

STAYING-IN-THE-SIMPLEX
The simplex vector space is defined by the internal simplicial operation of perturbation, the external operation of powering, and the simplicial metric. The operations of perturbation, (4.1) and powering, , a is a scalar, and C is the closure operator to constant κ (rescaling through division of each part by their total sum), are the equivalent to translation and scalar multiplication in the real space, respectively. The staying-in-the-simplex approach requires an algebra that differs from the one used in standard statistics. An example of this algebra is found in the calculation of descriptive statistics. The mean and the variance are not suitable statistics for compositional exploratory analyses (Daunis-i-Estadella, Barceló-Vidal, and Buccianti 2006) and therefore they are replaced in the Aitchison geometry by the center (g) and the variation matrix (T), respectively. The center or geometric mean is defined as: where x ij are the percentages for each part (i = 1, 2, . . . , D) in sample j , and n is the number of samples. Moreover, the compositions can be centered, i.e., moved to the barycenter of the simplex, using x ⊕ (−1 g) = x ⊕ g −1 (Pawlowsky-Glahn and Egozcue 2006). Centering is equivalent to subtracting the arithmetical mean in the Euclidean space. The variation matrix is defined as where X i and X j are the data vectors for the parts i and j across samples. Low variance of a log ratio indicates proportionality between the parts involved. The total variability of the dataset is the sum of the variances of all log ratios divided by 2D: (4.4)

LOG-RATIO TRANSFORMATIONS
The two first log-ratio transformations were introduced by Aitchison (1986, Chapters 4 and 6) and the third by Egozcue et al. (2003). These log-ratio transformations make it possible to work on compositional data in the real space using Euclidean geometry.

Additive Log Ratio
The additive log-ratio (alr) transformation is written in terms of log ratios of D − 1 components relative to an arbitrary D component: This transformation has the obvious disadvantage that the results are dependent on the chosen divisor component, which in turn does not have an equivalent for further analyses. But, most importantly, the alr-transformation is not isometric, i.e., distances are not preserved in the new metric space (Filzmoser and Hron 2009).

Centered Log Ratio
The centered log-ratio (clr) transformation is written in terms of the log ratio of each component relative to the geometric mean of all the components of an individual: In the z = clr(x) transformation all parts of the composition have a direct equivalent, so that transformed variables can be easily traced back to the originals. Although the clr transformation is isometric, it is subcompositionally incoherent. Moreover, the covariance matrix of the clr-transformed variables is singular, which makes difficult the use of the clr transformation in multivariate statistical analysis requiring the inversion of this matrix. The clr transformation is mostly used in exploratory analysis. The so-called clr-biplots allow for a graphical representation of the distribution of the samples based on their composition. Moreover, the depiction of links (i.e., the vectors connecting the apexes of two variable rays) provides an easy-to-interpret representation of the log ratios between the two involved components, where their length represents the standard deviation of the corresponding log ratios and the cosine of the angle between two links the correlation between the two involved log ratios. A complete description of clr-biplots and their interpretation is given in Aitchison and Greenacre (2002) and Daunis-i-Estadella, Barceló-Vidal, and Buccianti (2006). Conclusions only should be drawn from biplots that explain a large percentage of the total variance. An example is presented in Section 5.1.

Isometric Log Ratio
The isometric log ratio (ilr) transforms the raw composition to its coordinates in an orthogonal system based upon an orthonormal basis ( ) (Egozcue et al. 2003). If is chosen following a sequential binary partition (Egozcue and Pawlowsky-Glahn 2005), the ilr-transformed components are called balances (b k , where k = 1, 2, . . . , D − 1). In a sequential binary partition, is constructed by successive divisions of the set of parts into two mutually exclusive groups (parts in one group are marked with the symbol +, and parts in the complementary group with the symbol −) until only one part per group is left (see Table 3. Sequential binary partition of the 11-fatty acid composition for ilr-transformation. Balance C14:0 C16:0 C18:0 C20:0 C16:1 C18:1 C20:1 C18:2 C18:3 C20:2 C20:4 Table 3 for an example). To be interpretable, partitions should be based on previous knowledge and experience. Then, is derived replacing the symbols + and − by 1 r rs r+s and − 1 s rs r+s , respectively, where r (s) is the number of parts marked with + (−) in each balance, with blanks being zero. Then, the balances w = ilr(x) are calculated as w = z T , or directly, in terms of normalized log ratios between the geometric means of the two groups, as where x + k and x − k represent the subsets of r k and s k parts in group + and − of the kth balance, respectively.
Note that, as happens with the alr transformation, there are only D − 1 balances for a D-part composition, and that the balances may be different for each . The balances are isometric and subcompositionally coherent and, as a result, they can be analyzed using standard statistical techniques. However, because they do not have a one-to-one relation to the original components, their interpretation is not straightforward. This can be overcome by choosing, if it exists, a sequential binary partition leading to interpretable balances or, alternatively, back-transforming them into interpretable D-part compositions lying in the simplex. Because compositions are intrinsically multivariate, estimates on the full set of D − 1 balances (for instance, either least squares means or regression coefficients) must be jointly back-transformed as x = C(e w ) (Tolosana-Delgado and van den Boogaart 2011). In Sections 5.2 and 5.4 examples on the application of ilr-transforming and back-transforming are presented. However, it is not possible to back-transform the standard errors associated with least square estimates, but they can be substituted by the corresponding back-transformed confidence intervals. The use of balances is the best choice for correlations (Filzmoser and Hron 2009), but they cannot be back-transformed either. If the sequential bipartition used does not lead to the desired balances, additional log ratios can be calculated as linear combinations of the initial D − 1 set derived from . For example, apart from the balances derived from the sequential bipartition in Table 3 (b 1 to b 10 ) we could be interested in the log ratios of C18:1 and C18:0: The inclusion of more log ratios can enrich the interpretation of the results but then it should be noted that the covariance matrix including the new log ratios will be singular. An example of correlation analysis using balances is given in Section 5.5.

ANALYZING THE CASE STUDY
The basics of compositional analysis are illustrated in five examples using the pork FA composition as a case study. The first is an exploratory analysis conducted to examine the differences between IMF and backfat for FA composition (Section 5.1). The second and third introduce the procedures to compare the distinct tissues and muscles in terms of centers (Section 5.2) and variation matrices (Section 5.3). In Section 5.4 a linear regression is used to assess the effect of IMF content on FA composition. Finally, Section 5.5 illustrates how to interpret correlations among biologically meaningful balances. In Sections 5.2 and 5.4 the compositional and the standard approaches are compared.

EXPLORATORY ANALYSIS
The distribution of FA composition across muscles and backfat locations was first explored depicting the whole set of observations on a joint biplot (Figure 2). To this purpose the dataset X was clr-transformed to Z, and then singular value decomposed using standard procedures (Daunis-i-Estadella, Thió-Henestrosa, and Mateu-Figueras 2011). The two first components accounted for 76 % of the total variation. The projection of the samples (Figure 2a) in the biplot showed that IMF can be clearly discriminated from SF based on FA composition. More specifically, the first component, which explained 56 % of the total variation, was enough to separate IMF from SF samples. The most important FA affecting this component was C20:4, whose ray was opposite to those of the other PUFA and formed with them a long link along the first component (as an example, the link of ln(C20:4/18:2) is represented with a discontinuous line in Figure 2b). The length of these links, which relate to the standard deviation of the log ratio of the two FA involved, indicates that the log ratio between C20:4 and other PUFA (C18:2 and C18:3) displayed a great variation along the gradient separating IMF and SF. The SF samples were allocated in a cluster at the left side of the biplot and the IMF samples were clustered at the right side, indicating that the ratios C20:4/C18:2 and C20:4/C18:3 were greater in IMF than in SF. Despite some overlapping, the samples from each muscle can also be singled out (Figure 2a), especially within batch (Figure 2b). In doing so, SM samples were mostly found in the upper region of the IMF cluster whereas those from GM (left), LT (middle), and LD (right) were in the lower. This could not be done for SF, where only one backfat location was analyzed per batch. The distribution pattern of the batch centers suggested that the effect of the batch on the FA composition of IMF could be, at least partially, explained by differences in the age at slaughter (Table 1). Because IMF increases with age and saturation with IMF, pigs slaughtered at later ages are expected to have more saturated fat (Bosch et al. 2012). Accordingly, within muscle, the samples from pigs slaughtered at later ages (Table 1; batches 5-7 and 10-11) should tend to show greater SFA/PUFA ratios and therefore appear preferentially lower-left in the biplot relative to those from pigs slaughtered at earlier ages (Table 1; batches 1-4, 8-9, and 12).
A biplot for each muscle was also set up. The effect of the batch was removed centering the data by batch (which is the equivalent in the simplex to subtract the mean of the batch) before they were clr-transformed and singular value decomposed. The IMF content was included in the biplots as a supplementary variable (Daunis-i-Estadella, Thió-Henestrosa, and Mateu-Figueras 2011) to assess the relationship between IMF content and composition. The loading plots of the two first components by muscle are given in Figure 3. The two first components explained from 67 % (GM) to 74 % (SM) of the total variance. The loading plots showed a similar pattern among muscles, with SM being the most different. In all muscles, SFA and MUFA were in the opposite side to PUFA for the first component. The cosine of the angle between two links refers to the correlation between their log ratios. In general, the angles between links involving two SFA (C16:0, C18:0), two MUFA (C16:1, C18:1), or a SFA with a MUFA, were small, indicating high correlations among them. Because C18:0 can be synthesized from precursor C16:0 by an elongase, and both C16:1 and C18:1 are synthesized by the same 9 desaturase from C16:0 and C18:0, respectively (Cook and McMaster 2002; Figure 1), the product/substrate ratio C18:0/C16:0 is frequently used as an indicator of the elongase activity, and ratios C16:1/C16:0 and C18:1/C18:0 of the 9 desaturase activity. Thus, the high correlations among ratios of these four FA are biologically consistent and in line with the correlations found by other authors (Ntawubizi et al. 2010). The links involving C14:0, in all the muscles, and C20:0, in SM, had much greater angles, and thus lower correlations, with the other links. This might be because C14:0, unlike other SFA, is mainly of dietary origin (Wood et al. 2008; Figure 1) and because C20:0 is subjected to relatively larger instrumental error and greater number of zeros. Small angles, and thus high correlations, were also found between links corresponding to log ratios of PUFA. However, in all the muscles, the links involving two SFA, two MUFA, or a SFA with a MUFA, on one side, and the links involving PUFA, on the other side, were almost perpendicular to each other. This indicates low correlations between these two groups of log ratios, in accordance with the low association of PUFA with SFA and MUFA reported in literature (Cameron and Enser 1991;Zhang et al. 2007;Ntawubizi et al. 2010;Yang et al. 2010). Overall, the results indicate that SFA and MUFA behave similarly to each other but differently from PUFA, in line with their different deposition patterns. Fat depots, IMF and SF, can be divided into two fractions: phospholipids and neutral lipids. Phospholipids have structural functions and have abundant PUFA, particularly C20:4, which is the major PUFA in cell membranes (Larsson et al. 2004), whereas neutral lipids, mainly composed of SFA and MUFA, have storage functions. It means that IMF increases with neutral lipids while phospholipids remain relatively constant (Cameron and Enser 1991;De Smet, Raes, and Demeyer 2004), which is the reason for the positive relationship of IMF with SFA and MUFA, but negative with PUFA (Cameron and Enser 1991;Zhang et al. 2007;Yang et al. 2010). The IMF content displayed a negative collinearity with C20:4 in all the muscles, supporting that increased IMF is associated with decreased C20:4, namely phospholipids, and PUFA, as well as to increased SFA and MUFA (Cameron and Enser 1991;De Smet, Raes, and Demeyer 2004;Bosch et al. 2012).

DIFFERENCES AMONG TISSUES AND MUSCLES
The centers of the FA composition of IMF and SF (Equation (4.3)) established that the most abundant FA were C18:1 (44.0-46.1 %), C16:0 (21.2-24.3 %), C18:2 (9.2-16.2 %), and C18:0 (10.6-12.1 %) in all the studied muscles and backfat locations, in agreement Adjusted centers were calculated following the compositional approach (i.e., using balances followed by backtransformation). The least squares means for each fatty acid based on the raw percentages are not shown because on average they only differed from the compositionally adjusted centers by 0.1% (SD 0.1). 2 See footnote in Table 1 for abbreviations.
A-F Differences tested on ilr-transformed variables. Within a row centers without a common superscript letter differ (P < 0.05). a-e Differences tested on raw percentages. Within a row means without common subscripts differ (P < 0.05).
Subscripts are given only for comparison purposes with superscripts.
with the general knowledge on meat FA composition (Valsta, Tapanainen, and Männistö 2005). The centers revealed differences of FA composition among the muscles and backfat locations. These differences were estimated and tested using the balances described in Table 3. The balances were analyzed using a linear mixed model, in which fixed effects included the batch (1 to 12), tissue (the four muscles and the two backfat locations), and carcass weight as a covariate. The pig and the residual were the random effects. Variances were estimated by restricted maximum likelihood and fixed effects were tested following a Kenward-Roger approach. The differences between tissues were contrasted with the Tukey HSD test at a significance level of 0.05. The analyses were performed using JMP 8 software (SAS Institute Inc., Cary, NC). The least squares means and confidence intervals for the balances were back-transformed as indicated in Section 4.2.3. Results were compared with those obtained using the same model for raw FA percentages instead of balances. The centers adjusted for batch and carcass weight are given in Table 4. The ordinary least squares means differed from the compositionally adjusted centers on average only by 0.1 % (SD 0.1), with a maximum of 0.8 % (C18:1). Significant differences among muscles and backfat locations were found, with compositional and standard approaches leading to similar conclusions. The two backfat locations showed greater contents of the PUFA C18:2, C18:3, and C20:2 than IMF in all muscles, but lower of C20:4. By contrast, IMF was more saturated and monounsaturated, although for some FA the differences between IMF and SF were not significant. These findings were in line with the well-known result that essential PUFA, C18:2 and C18:3, which are from dietary origin (Figure 1), are preferentially deposited in SF (Kloareg, Noblet, and van Milgen 2007;Duran-Montgé et al. 2008). That the C20:4 displays an opposite trend to other PUFA (see Figure 2b) could be explained by the much greater fraction of phospholipids in IMF as compared to SF. Among muscles, SM had higher concentrations of C18:2 and C20:4 than GM, LD, and LT, and lower of the main SFA and MUFA. The observed differences in muscle composition can be partly attributed to IMF content (Table 4).

VARIATION WITHIN TISSUE AND MUSCLE
The variation arrays and the total variances (Equation (4.4)) were calculated for each muscle and backfat location. The total variance of the composition of IMF in GM was 0.57. After adjusting for batch (i.e., centering by batch), the total variance decreased to 0.32. This indicates that around one half of the variability of the muscle FA composition is due to common environmental effects in a batch. The adjusted total variance was higher for IMF in SM (0.97) than in GM, LD, and LT, which were very similar to each other (0.27-0.32) and to SFLD (0.37). The total variance for SFGM was much lower (0.10). In general, the log ratios involving C18:1 were the ones displaying the lowest variances (0.01-0.33) in all cases. Interestingly, the log ratios involving C20:4 showed the highest relative variability in all cases (0.02-0.73), except for IMF in SM and SFLD, where C20:0 was the most variable FA. Nonetheless, the high variability of C20:0 could be due, because of its low content, to the relatively large analytical errors and replaced zeros. The variability of C20:4 is partly due to the variance of the phospholipids fraction in the IMF content, which, as it will be shown in Section 5.4, is not neutral with respect to IMF content. Overall, the variation of FA composition in pork is low. The largest element of the variation matrix of IMF in GM was 0.48 and the maximum across tissues was 1.13 for SM. These values are, for example, 10-fold and 4-fold lower than those reported by Daunis-i-Estadella, Barceló-Vidal, and Buccianti (2006) for geological compositional data, the area of expertise where compositional data techniques have been mostly applied.

REGRESSION ON INTRAMUSCULAR FAT CONTENT
Results in Section 5.2 support that fat content influences fat composition (Wood et al. 2008;Bosch et al. 2012). This relationship can be assessed by performing a compositional regression analysis of FA composition on IMF content (Aitchison 1986, Chapter 7;Egozcue and Pawlowsky-Glahn 2011;Egozcue et al. 2012). The 109 samples of GM in batch 1 were used for this purpose. The 10 balances described in Table 3 were compositionally regressed on IMF content (JMP 8 software, SAS Institute Inc., Cary, NC) and then the results were compared with the simple regression of the raw FA percentages on IMF content. The vectors of estimated intercepts (i) and slopes (s) in the ilr-setting were back-transformed to the simplex as i = C(e i ) and s = C(e s ). Then, the FA composition at a given IMF content (x) can be predicted operating either in the simplex, with x = i ⊕ (IMF s ), or in the real space, with w = ilr(i ) + IMF × ilr(s ) = i + IMF × s and then back-transforming w to x = C(e w ).
The balances more influenced by IMF content were balances 1 and 8 (R 2 = 0.23 and 0.20, respectively). The R 2 associated to the other balances was lower than 0.08. The balance 1 was built to represent the ratio PUFA vs. SFA + MUFA, while balance 8 was associated to the ratio n-6 vs. n-3 PUFA (i.e., C18:2 + C20:2 + C20:4 vs. 18:3). This is consistent with results discussed in Section 5.1, where PUFA and, particularly, C20:4, more abundant in phospholipids, decrease as IMF content increases. Similar results were found for raw percentages, with C18:2 and C20:4 showing the highest R 2 (0.34 and 0.14). The relationship between FA and IMF content is displayed in Figure 4. For simplicity, only three FA are displayed, although the analyses were done using the whole 11-FA composition. A relevant difference between compositional and standard regression is that in the latter case, at extreme values of the covariate, the predicted values can be non-sense. Thus, at high IMF contents negative percentages are predicted for C18:2 (IMF > 65 %) and C20:4 (IMF > 35 %). This does not happen in the compositional analysis. The backtransformed regressions of the 10 balances on IMF content were non-linear and asymptotically bounded, with predicted values always lying within the [0, 100] range. However, within the expected range of values for IMF, from 5 % to 30 % on dry matter basis (equivalent to approximately 1 % to 10 % of fresh meat), the compositional regression is almost linear, overlapping with the standard regression. Predicted values, even using validation samples from other batches, were almost identical under the two approaches. In the expected range of values for IMF the standard regression led to similar results to the compositional analysis. A similar conclusion is reached in models other than the regression used here, which is deliberately simple for illustrative purposes.

CORRELATIONS AMONG ENZYMATIC INDICES
The correlations between balances for GM are given in Table 5. The balances described in Table 3 were established in accordance with known metabolic pathways for FA synthesis in pigs (Figure 1). Because they are regulated by specific enzymes the balances can be thought in terms of enzymatic activity. The first balance can be interpreted as a polyunsaturation index (PUFA vs. SFA + MUFA), which separates the PUFA and the SFA and MUFA pathways. Balances 2 to 7 are associated to SFA and MUFA metabolism, where balances 2, 3, 4, and 7 can be interpreted as indexes of elongase activity, and balances 5 and 6 of 9 desaturase activity. Note that although they are aimed at representing different elongation or desaturation steps, in general they are not ratios between single products and substrates. For instance, balance 3 accounts not only for the elongation of C16:0 to C18:0, but also for the amount of C16:0 that has alternatively been desaturated to C16:1 and the amount of C18:0 further transformed into C20:0, C18:1, and C20:1. The balances can be an interesting alternative to elementary indexes between only two FA because they also include further or alternative products derived from the same substrate ( Figure 1). However, because they are designed based upon a sequential bipartition, some balances cannot include all the desired FA (e.g., balance 6 does not include C20:0, which can be elongated from C18:0). As expected, all the elongase balances were positively correlated among them, as well as the two desaturation indexes. However, interestingly, the correlation among the desaturation and the elongase indexes was negative. The polyunsaturation index was negatively correlated to the elongase activity but positively to the 9 desaturase activity. Balances 8, 9, and 10 are associated with PUFA metabolism. Balance 8 is the ratio between n-6 and n-3 FA, which is known to play a crucial role in the nutritional quality of fat (Schmid 2010). The positive correlation between balance 1 and balance 8 indicates that the n-6/n-3 ratio increased with polyunsaturation. Balance 9 reflects the total efficiency of biosynthesizing C20:4 from any of the two pathways using C18:2 as a precursor, while balance 10 only accounts for the intermediate elongation step from C18:2 to C20:2 carried out in one of the two pathways ( Figure 1). The positive correlation between balances 9 with balances 1 and 8 confirmed that the percentage of C20:4 increases with PUFA and with the n-6/n-3 ratio. A correct interpretation of the balances may help to gain new insight into FA metabolism. Note that in this example we used only the D − 1 balances described in Table 3, which derived from a unique sequential bipartition. More log ratios could be calculated and added as discussed in Section 4.2.3. For example, the correlation between the log ratios of C18:1/C18:0 and MUFA/SFA (Equations (4.8) and (4.9)) was 0.70.

CONCLUSIONS
Fatty acid compositions, which by nature are compositional data, should be statistically treated as such. There are two complementary approaches to analyze compositional data: either operate in the simplex space or make use of log ratios to operate in the real space. The ilr transformation allows for a straightforward handling of geometric elements in the simplex using standard statistical procedures. Nonetheless, for the case study considered here we found that the inferences drawn from compositional analysis did not substantively differ from those obtained using standard statistics techniques on raw data. The low variability of FA composition across fat pork depots may explain why the standard approach, although methodologically inconsistent, is robust enough for practical purposes. This is likely to happen to other unprocessed raw food products, where natural variability is subjected to homeostatic biological constraints. Results evidenced that IMF and SF behave differently in terms of FA composition, with IMF showing more SFA, MUFA, and C20:4, and that FA composition differs among muscles, with SFA and MUFA increasing with IMF. Compositional analysis proved to be useful in correctly interpreting the correlation structure among FA components. Choosing an appropriate set of balances may help not only to avoid spurious results but also to better address the biological mechanisms involved in FA deposition. Careful attention is recommended in cases of higher expected variability, such as when comparing differentiated processed products, where a compositional analysis may lead to more dramatic changes.