Comparison of 3D scan matching techniques for autonomous robot navigation in urban and agricultural environments

Abstract. Global navigation satellite system (GNSS) is the standard solution for solving the localization problem in outdoor environments, but its signal might be lost when driving in dense urban areas or in the presence of heavy vegetation or overhanging canopies. Hence, there is a need for alternative or complementary localization methods for autonomous driving. In recent years, exteroceptive sensors have gained much attention due to significant improvements in accuracy and cost-effectiveness, especially for 3D range sensors. By registering two successive 3D scans, known as scan matching, it is possible to estimate the pose of a vehicle. This work aims to provide in-depth analysis and comparison of the state-of-the-art 3D scan matching approaches as a solution to the localization problem of autonomous vehicles. Eight techniques (deterministic and probabilistic) are investigated: iterative closest point (with three different embodiments), normal distribution transform, coherent point drift, Gaussian mixture model, support vector-parametrized Gaussian mixture and the particle filter implementation. They are demonstrated in long path trials in both urban and agricultural environments and compared in terms of accuracy and consistency. On the one hand, most of the techniques can be successfully used in urban scenarios with the probabilistic approaches that show the best accuracy. On the other hand, agricultural settings have proved to be more challenging with significant errors even in short distance trials due to the presence of featureless natural objects. The results and discussion of this work will provide a guide for selecting the most suitable method and will encourage building of improvements on the identified limitations.


Introduction
More critically, in underground mining sites, there is a complete absence of the GNSS signal. Additionally, the costs associated with GNSS antennas with centimeters (or even millimeters) of accuracy are usually considerably higher than standard portable receivers. Therefore, alternative solutions to the localization problem are necessary for replacing or complementing GNSS-based approaches.
To this aim, several simultaneous localization and mapping (SLAM) methods have been proposed in recent years (see Refs. 5 and 6 and the references therein). Nevertheless, the fact that the GNSS error is absolute still represents an advantage when compared with SLAM based or similar approaches, in which the error may accumulate over time or the accuracy depends on the loop closure (see Ref. 5). For example, for dead-reckoning localization, which is particularly useful for short-path navigation, the error comes from different sources (e.g., wheel slippage, misalignment, and terrain perturbations), causing its unbound growth over time. 7,8 Similarly, inertial navigation systems suffer from integration drift: small errors in the measurement of acceleration and angular velocity are integrated into progressively larger errors in velocity and position, 9 making inertial navigation challenging to use in the long path. 10,11 Range sensors, such as light detection and ranging (LiDAR) has been brought to the attention of autonomous vehicles development, mainly for their decreasing costs and high accuracy. [12][13][14] The LiDAR-based point cloud information can be used to accomplish localization purposes when using scan matching techniques. 15 These techniques aim to estimate the rigid motion transformation, which maximizes the overlap between two frames obtained at different times. Scan matching techniques have been widely used as a localization system either with 2D (3 DOF) or 16,17 3D laser scanners (6 DOF), 18,19 showing suitable solutions for indoor environments when 14 assuming objects of only polygonal shapes. 17 However, when applying scan matching to unstructured environments, the performance of the registration techniques degrades. 20 For example, a comparison of scan matching techniques in real-world data sets showed the limitations of several scan matching methods when applied to unstructured environments. 21 To improve registration in a specific environment, works such as Refs. 22 and 23 proposed the extraction of specific patterns (normals from the point cloud or reflectance information from the LiDAR readings). However, these variations are not expected to work in all environments because normals are not always available or interpretable and reflectance varies significantly for each environment. Figure 1 shows the two cases examined in this work: an urban and an agricultural setting, with their corresponding point clouds obtained using a LiDAR. The urban scene has distinguishable planar surfaces, such as walls, windows, and cars. In contrast, the agricultural setting has irregular point clouds on the trees of the orchard. Figure 2 shows the scan matching process for vehicle localization, assuming that the sensor is mounted on the vehicle (which is the case in this work). The key limitation of applying scan matching to estimating the pose of the vehicle lies in the dependence on the previous state. Once a position and orientation are computed, the vehicle does not revisit the place, and it cannot recover from possible registration error. 8 Typically, a loop closure approach is used to mitigate this error in SLAM algorithms. In this work, we consider the scan matching techniques a selfpositioning system without the need to close the loop; therefore, cumulative errors are expected.
Registration algorithms consider deterministic and probabilistic data association approaches. The former is a particular case of probabilistic methods, where uncertainty is zero. For example, the iterative closest point (ICP) 19 and the polar scan matching find 24 correspondences among points using the Euclidean distance in Cartesian and polar coordinates, respectively. Similarly, the iterative dual correspondence incorporates 25 a matching range point to define correspondences and to improve data association.
On the other hand, the probabilistic approaches take into account the uncertainties in the sensor measurements to implement a maximum likelihood estimator (MLE). However, when using a probabilistic approach with the wrong parameters, the uncertainty can produce worse results than deterministic approaches. Some examples of these approaches are the generalized-ICP (GICP) 26 and the normal distribution transform (NDT). 27 The latter describes the point cloud by a set of local probability density functions using a voxel-based structure. This, however, is one of the main disadvantages because there is not a validated method of selecting the right voxel size. When the cell is significantly big, the computational time is low, and the accuracy decreases. In contrast, when the voxel size is small, the accuracy increases, but it comes with Fig. 2 The scan matching takes as input two scans obtained at different times, and they are aligned using a registration algorithm with output HðθÞ that represents the rigid transformation matrix associated with the sensor displacement. The latter is then associated with the vehicle motion. 15 The position, X k , where suffix k stands for sampling time, of the sensor (and therefore, of the vehicle) is obtained by concatenating the transformation matrices causing the error in localization to be cumulative. a high computational cost. Variations of NDT include a pre-processing stage, 28,29 in which the objects in the scene are grouped according to their similarities. For example, in Ref. 28, edges and planes acquired from the scene are used to differentiate objects. Further, in Ref. 29, the differentiation is improved by incorporating the POINTNET++ network, 30 which is used for applying semantic segmentation in the scene. After the label assignment, the NDT is performed individually to the same label objects, and the rigid transformation is obtained by minimizing the sum of all rigid transformations.
Regarding probabilistic approaches, a Gaussian mixture model (GMM) representation could also be used for performing registration, as shown in Ref. 31. The GMM approach can deal with noise and outliers to some extent, 32 unlike the previously mentioned approaches (NDT and GICP). Different variations of GMM registration algorithms rely on the need for a pre-processing stage, which is the case of the support vector-parametrised Gaussian mixture (SVR). 33 Other probabilistic registration algorithms are based on filtering theory: they can use prior information in a maximum posterior (MAP) sense. 34 The registration methods that use the MAP estimator often use Bayesian filters, such as a Kalman or Bingham filter. 35 A significant disadvantage of these filtering-based methods is the requirement to tune several parameters, which can be counterintuitive.
This work aims to examine the different registration algorithms existing in the literature and show how suitable they are for addressing the positioning problem for reliable autonomous navigation in two specific and yet different environments: urban and agricultural, assuming that exteroceptive sensors are mounted on the vehicles.
Although several scan matching techniques have been proposed during the past decade, we try to cover some of the most representative deterministic and probabilistic approaches in this work. To this aim, we have selected a variety of open-source approaches (see Appendix) that have been widely used by the scientific community because of their easy implementation. We begin by describing the well-known ICP and its variations of point-to-point and point-to-plane. Then, we describe the GICP, the NDT, the GMM, the coherent point drift (CPD), the SVR, and the particle filter registration (PF) algorithm. We analyze the performance of the previously mentioned techniques under different navigation trials. The comparison of all of the algorithms is made with real data from urban and agricultural environments under real field conditions, acquired by two vehicles (a car and an autonomous agricultural platform, respectively), with a 3D LiDAR sensor, thus offering in-depth insight into such techniques with long path field results.

Mathematical Background
Let S and T be a source point cloud and a target point cloud, respectively. The goal is to find the rigid transformation matrix that aligns S with T as shown in Fig. 3. For this purpose, a transformation matrix, H, with parameters θ ¼ ½x; y; z; roll; pitch; yaw T is applied to the source point cloud, as HðS; θÞ. Such a transformation is obtained through an iterative procedure that consists of a maximum number of iterations, I max , and an error threshold, ϵ. The inputs are two point clouds, a source and a target, and the aim is to find a correspondence among them and minimize their discrepancy. The process is repeated until the per-residual error among points, e, is less than ϵ or a maximum number of iterations, I max , has been reached. Figure 3 shows the registration procedure for obtaining the transformation matrix, H, with rotation R and translation t r . The procedure takes as input two point clouds: a source, S ¼ fs i g i¼1;: : : ;m ; S ∈ R 3 , and a target, T ¼ ft i g j¼1;: : : ;n ; R ∈ R 3 . Herein, we consider M and N the number of points in the source and target point clouds, respectively. The first step is to find the correspondence between the two point clouds; then, an optimization procedure finds the registration parameters θ. Then, the procedure iterates until the registration error e is less than ϵ or I max is reached. This applies to all scan matching techniques analyzed here.

Iterative Cloud Point and Its Variants
The ICP considers the raw data from both point clouds. Different metrics can be used as a distance between the point sets, e.g., point-to-point and point-to-plane. 21,36,37 The point-to-point method minimizes the sum of squared distances between each corresponding pair of points. On the other hand, the point-to-plane method minimizes the distance between the points and the tangential planes at the corresponding nearest points. The correspondences between HðS; θÞ and T are computed using the nearest neighbor criteria Y j ¼ ClosesPointðHðS; θÞ; TÞ. For computing the closest point, the metric used is the Euclidean distance. Further, the transformation matrix is obtained with the parameters that minimize the L 2 error, as shown in Eq. (1). (1) where η i is the surface normal of the neighborhood of each point and is used in the point-to-plane registration. The minimization of Eq. (1) is done using a least-squares approach. 19 When the set of points are far away from each other, the nearest neighbor point does not correspond, in general, to the same point on the target point cloud, especially when using the Euclidean distance. 38 Other approaches use different distance criteria, for example, the Mahalanobis distance 34,39 and the most likely criteria, 34,40 or assign a weight to each correspondence alternative. 41,42 Therefore, the points further apart have lower weights than points with close neighbors. However, it is still not possible to get precise correspondences even after reaching convergence. 43 In Ref. 26, a generalization of the ICP, named GICP, is presented. The GICP incorporates a probabilistic framework in which the covariance matrices are associated with each point from both point clouds. Such an approach considers that s i ∼ N ðμ s i ; Σ s i Þ and t i ∼ N ðμ t i ; Σ t i Þ are drawn from independent Gaussian distributions and the correspondences are computed using Euclidean distance. For an arbitrary rigid registration H, d ðHÞ i The latter can be simplified to E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 2 . 1 ; 1 1 6 ; 2 2 7

Coherent Point Drift
The CPD is highly suitable for accurate point cloud registration. 44 However, its computation complexity is extremely high, which is a problem in large-scale point clouds. The CPD describes the registration as a GMM problem, where S considers the GMM centroids and T the data points generated by the GMM. Therefore, the probability density function is described as 1 M pðtjmÞ, where pðtjmÞ ∼ N ðμ s m ; ΣÞ and w is the weight of the uniform distribution. This approach uses expectation maximization to find the parameters of the rigid transformation. The expectation can be seen as the correspondence matching, which is based on the posterior probability of the GMM centroid of the data, described in Eq. (2). E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 1 1 6 ; 7 1 1 The new parameters' values are found by minimizing the expectation of the complete negative log-likelihood function, as shown in Eq. (3): E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 3 ; 1 1 6 ; 6 1 9 where C ¼ dð1; : : : ; 1; detðUV T ÞÞ.

Normal Distribution Transform
The NDT first divides T into voxels and assigns an NDT to each one of them with mean u i and covariance Σ i . The goal is to find the transformation parameters that maximize the likelihood, p i , of points from S that lie on T. For this purpose, the correspondence between points in S and their voxels from T is obtained according to E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 2 . 3 ; 1 1 6 ; 3 7 2p , and c 1 and c 2 are constant values related to the size of the cell. The NDT score function is defined as sðθÞ ¼ − Pp ðHðS; θÞÞ. Here, Newton optimization is used to optimize sðθÞ and is computed to solve the equation HΔθ ¼ −g, where H is the Hessian matrix and the g is the gradient vector of the score. It is worth mentioning that the NDT requires high computing power capability and the performance is directly related to the size of the cell. 43

Gaussian Mixture Model Registration
The GMM (see Ref. 45) considers that the probability density function of a general Gaussian mixture is defined as pðxÞ ¼ P k i¼1 w i N ðμ i ; Σ i Þ, where: w i is the Gaussian weight. The input point cloud is represented by GMM S ¼ P i¼1;: : : ;m w i N ðμ i ; Σ i Þ and GMM T ¼ P j¼1;: : : ;n w j N ðμ j ; Σ j Þ. As can be seen, the number of Gaussian components is the number of points in the cloud, and all components are equally weighted. In addition, for each component, the mean vector is given by the spatial location of each point, and all components share the same covariance. An optimization problem can then be proposed to find the transformation HðθÞ, minimizing the L 2 distance of the two GMM as follows: Such an optimization could guarantee only a local minimum, and the results are related to the Gaussian smoothness. Therefore, the parameter σ can be decreased to expand the search area. In Ref. 33, a variation of GMM registration (SVR), which includes a pre-stage to the GMM registration, is presented as a one class support vector machine with a Gaussian radial basis function kernel, as shown in Eq. (6), E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 1 1 6 ; 6 1 8 where γ is the Gaussian kernel width, x i is the point vector, α i is the weight, x is the input vector, ρ is the bias and l is the number of training samples. The output of the support vector machine involves a sparse subset of the data points, which is later used to perform registration with a GMM representation.

Particle Filter Registration
The PF, for scan registration purposes, is adapted to perform a variation of the ICP. 47 The registration procedure considers a state-space model xðkÞ ¼ θðkÞ and the observation space given by zðkÞ ¼ θ m ðkÞ. First, N particles fx i ; i ¼ 1; : : : ; Ng are initialized, and the transformed point set is selected as the measurement. Then, it iteratively proceeds as follows: first, a motion error for each time k based on the predicted and measured state at the previous time is computed as shown in Eq. (7): E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 1 1 6 ; 4 1 0 Second, N p particles are drawn based on the proposal density presented in Eq. (8): E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 8 ; 1 1 6 ; 3 6 6 where Σ i k−1 is the covariance motion error and K is a Gaussian function. The next step is to minimize E ¼ P n j¼1 kT j − RS j − t r k 2 for L iterations, which is the same objective function presented for the ICP case in Sec. 2.1. Weights are updated according to w i k ¼ w i k−1 pðz k jx k Þ. 48 Once the weights are updated, the cumulative distribution function is built, and the particles are re-sampled. 48 Finally, the parameters that minimize the objective function are selected. As can be seen, particle-filtering involves additional steps to the basic ICP. A significant disadvantage of this method is the tuning of several parameters such as the number of particles, N p , the number of iterations L, and those associated with the Gaussian function, K.

Experimental Data Sets
For the urban scan matching evaluation, we considered the publicly available Ford Dataset, 49 which was generated using a Ford F-250 vehicle (© Ford Motor Company, S.A., Dearborn, Michigan) equipped with a Velodyne HDL-64E laser scanner (Velodyne LIDAR Inc., San José, California), and Applanix POS-LV 420 INS with Trimble GPS (Timble Inc., Sunnyvale, California) used for ground truth data. The LiDAR was mounted horizontally, and the data were captured with the laser spinning at 10 Hz. A single LiDAR raw data contains ∼80 points (acquisition speed of 800 points∕s).
For the scan matching assessment in agricultural environments, we generated our own data set. This data were acquired in a commercial Fuji apple orchard (Malus domestica Borkh. cv. Fuji) located in Mollerussa, Catalonia, Spain (41°36'48.5"N, 0°51'41.7"E). Trees grown in the selected orchard were trained in a tall spindle system with a maximum canopy height of 3.5 to 4 m, width of 1 to 1.5 m, and tree spacing of 4 × 1 m. Data were acquired on July 29th of 2019, when trees were at BBCH (Biologische Bundesanstalt, Bundessortenamt und Chemische Industrie) growth stage 75, 50 with fruits about half final size.
The autonomous platform shown in Fig. 4 was used to acquire the data in the agricultural field. The platform consisted of an aluminum structure mounted on a continuous track composed of two rubber belts moved by two electrical AC motors. These motors were powered by a petrolengine generator and controlled by two variable frequency drives, which were used to control the speed and the direction of advance. The platform was equipped with a LiDAR sensor and a GNSS-RTK mounted on a vertical mast of 2-m height that was firmly fixed to the front of the platform.
The LiDAR sensor was a Puck VLP-16 (Velodyne LIDAR Inc., San Jose, California), which provides a three-dimensional point cloud per scan by means of 16 laser beams (905-nm wavelength), with a maximum range of 100 m and an accuracy of AE0.03 m. This sensor was mounted in a vertical position at a height of 1.8 m, which corresponds to the half maximum height of the studied trees. Mounting the LiDAR sensor vertically is a common practice for geometric characterization of vegetation to have a higher vertical resolution. 51,52 The scanning frequency rate was set to 10 Hz, corresponding to a vertical angular resolution of 0.  was only used to obtain the UTC of each LiDAR point. The RTK-GNSS system was the GPS1200+ (Leica Geosystems AG, Heerbrugg, Switzerland), which obtains absolute coordinates at a frequency rate of 20 Hz with an error of 0.01∕0.02 m (horizontal/vertical). The GNSS rover antenna was mounted on the top position, at a height of 2 m. Each sensor was connected to a rugged laptop GETAC V110 (Getac Technology Corporation, Baoshan, Taiwan) with a 64-bit operating system, 8 GB of RAM and an Intel Core i7-7600 U 2.70 GHz processor. LiDAR data was acquired using VeloView 3.5 software (Velodyne LIDAR Inc., San Jose, California), while GNSS data were acquired using a self-developed LabVIEW (National Instruments, Austin) program that stores the receiver coordinates and the UTC time (synchronized with the LiDAR) of each positioning measurement. The scanning was performed driving the platform at a constant velocity of 0.5 ms −1 throughout five consecutive orchard alleyways (dirt road soil) of 250-m long. The generated dataset (AgLiMatch dataset) has been made publicly available at Ref. 53.

Experimental Results
To assess the different scan matching techniques, a frame-to-frame registration on both datasets was performed, considering the raw point cloud data acquired by the LiDAR sensor. The metrics that we considered were the total root mean square error, RMSE, the translational, e t , and rotational, e r , errors against the ground truth. The RMSE describes the total root mean square error per point j after registration: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 1 1 6 ; 4 7 6 where T, S, R, and t r denote the source point cloud, the target point cloud, and the rotation and translation transformations, respectively. As we are interested in both the translation and rotation, we projected the 6D distribution into the translation and rotation errors. 21,54 Considering the ground-truth transformation matrix, H g , and the corresponding registration solution, H, we can define the error ΔT as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 0 ; 1 1 6 ; 3 5 8 with its translation error, et, defined as the Euclidean norm of translation vector 21,54 ΔT: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 1 1 ; 1 1 6 ; 3 0 1 and its rotation error, e r , defined as the Geodesic distance from the rotation matrix 21,54 ΔR :

Parameter Selection
As described in Sec. 2, several parameters should be considered for the registration approaches.
Finding an appropriate combination of parameters can vary according to each application. [55][56][57][58][59][60] Some parameters were fixed in this work, and others were selected by analyzing the trade-off between speed and accuracy. Parameters such as the maximum number of iterations and the error threshold were set to 100 and 0.001, respectively. The parameters analyzed for each algorithm are described above. For the ICP in its point-to-point and point-to-plane version, we analyzed the use of a different percentage of paired points with the Euclidean distance, which is a common practice to add robustness in the registration procedure. 61 In the point-to-plane case, we use 10 points to compute the normals. For the GICP, we analyzed the number of points to use to compute the covariance matrix. For the NDT, we evaluated different voxel sizes. For the CPD, we analyzed different weights, w, of the uniform distribution. For the PF implementation, we considered the use of a different number of particles. For the GMM and SVR, a much more involved process is needed to tune the whole set of parameters; therefore, we considered the parameters assigned in their original implementation. For evaluating the parameters mentioned above, we considered two sub-sampled frames 62 of an urban scenario with a known transformation matrix. Figure 5 shows the algorithms speed against their translation, rotation, and root mean square error. Regarding the ICP point-to-plane version, we selected 90% of paired points because the translation, rotation, and RMSE error decrease when the percentage of paired points increases. Similar behavior was obtained with the ICP point-to-point version, but the lowest error and time was obtained with 95% of paired points. For the GICP, it is known that the average time to convergence and the errors increase when more points are considered in the covariance matrix; therefore, we selected the twenty closest points to construct the covariance matrix. For the NDT, we prioritize accuracy over time; as can be seen, the minimum error was obtained with a voxel of one meter. For the CPD, we selected a weight for uniform distribution equal to w ¼ 0.5 because a minimum variation was appreciated in all of the evaluated weights. For the PF, it can be seen that ninety particles can obtain the lowest error and computational time.  Figure 6 shows the consistency analysis for the overall pose estimation and frame-to-frame registration. For a better understanding of the estimation results, we specified the trajectory scan number of five paths, named A, B, C, D, and E.

Urban Dataset
To evaluate the pose estimation, we followed the guidelines presented in Ref. 63 to perform consistency tests. Figure 6 shows the consistency for the x and y coordinates of the complete experimentation. The results show that the ICP point-to-point, the ICP point-to-plane, the GICP, the NDT, and the CPD do not exceed more than 5% the maximum of twice their standard deviation, which suggests that such approaches could be used for fairly long distances. Nevertheless, the GMM, SVR, and PF registration show inconsistency in several parts of the road. The SVR, for example, shows consistency when the experiments start. However, close to the turn to take the A path, it becomes inconsistent. On the other hand, the GMM and the PF registration do not show consistency at all. Figure 7 shows a qualitative representation of the pose estimation for the Ford Dataset. Each algorithm shows the estimated path with the XY map projection of LiDAR data and the truth path obtained with the GNSS readings. The results show that accurate positioning could be obtained with some registration algorithms, such as GICP and NDT. Regarding the GMM, the results were not satisfactory due to the lack of motion detected, which is consistent with the experiments developed by Ref. 31. On the other hand, the SVR lost its direction in the second turn, similar to the PF at the beginning of the experiments.

Agricultural Dataset
To evaluate the performance of the scan matching techniques in an agricultural scenario, five different tracks from the agricultural dataset were considered: two short path distances, two medium-path distances, and one long path distance. The path followed by the vehicle can be seen in Fig. 8, where Sp k¼i;: : : ;5 denotes the starting point of each experiment.
In contrast to the urban scenario, the performance of the scan matching techniques degrades significantly with distance and maneuvers. As can be observed in Fig. 8, although tested trajectories were straight lines, all predicted trajectories, using different scan matching techniques, present significant misalignments in scale and rotation. The poor performance on this estimation was mainly due to the penetration of LiDAR beams into the vegetation, which produces a highentropy point cloud with information from leaves, fruits, branches, and ground, among others. Additionally, the scanned scene is non-static due to the movement of leaves under windy/outdoor conditions. Figures 9 and 10 show a consistency analysis for position estimation and the frame-to-frame registration for each track experimentation. For short-path experimentation, it can be seen that the error of most of the approaches keeps under the standard deviation; however, the error reaches values of 50 m in each axis, and the standard deviation presents a continuous growth. When analyzing the long path experimentation, similar behavior is present, but the error increases up to 100 m. Finally, the frame-to-frame consistency analysis shows that the translation become inconsistent in different parts of the followed path. It has to be noted that the error increase is unbounded.

Evaluation of Results
For evaluating the error distribution in both scenarios, we followed the guidelines described in Refs. 21 and 54. To do so, we analyzed the median and the quantiles of the recall-accuracy threshold plots, 21,54 which compare the cumulative probability of translation and rotational errors against the error magnitude. The quantiles are defined as A50 (i.e., the median), A75, and A95 and correspond to the probability of 0.5, 0.75, and 0.95 of the error distributions, respectively. An advantage of analyzing the quantiles is a straightforward interpretation of precision and accuracy on the registration procedure. 21 If the difference between the quantiles is small, then the solution is precise. Alternatively, a solution will be more accurate if the error in the quantiles is closer to zero. Fig. 8 Results from the agricultural environment. On top, the paths followed by the vehicle; S p denotes the starting point of each path. There are two short path trials, starting at S p2 and S p4 . The long path experimentation starts at S p5 . The medium path trials are labelled in red and blue, respectively, while the short paths are depicted in cyan; the long path is shown in purple. In yellow is the rest of the point cloud obtained during trials. On the bottom, the estimated path according to each registration technique is shown. Figures 11 and 12 show the cumulative probability of errors for the urban and agricultural settings, respectively. Such figures present the proportion of outcomes that lie beneath a given error. For the urban scenario, it is notorious that most of the approaches have similar cumulative distributions for translation and rotation errors. Only two approaches have a significantly different response: the GMM and the PF. The former seems to outperform the other approaches; however, this outcome is tricky, as shown in Fig. 7, due to the GMM outcome not perceiving any movement. The maximum translation estimated in the GMM approach was close to 10e −3 m, which tells us that the cumulative probability of the translational error is underestimating the registration error. On the other side, when analyzing the rotational error, it can be seen that the PF and the GMM obtain the worst results, with a high probability of failure.
For the agricultural scenario, some differences in the response of the algorithms can be seen. It is worth mentioning that the cumulative probability of the translation error showed small values compared with the urban dataset; however, it must be noted that the experiments were acquired with the vehicle at constant and rather low velocity, unlike the urban case. As can be seen in Fig. 12(a), the SVR and the PF have a higher probability of failure on the translation. The cumulative probability in the rotational error shows a higher probability of errors in all of the approaches, having the worst estimation with the GMM and the SVR approaches. It is worth mentioning that the variation in the heading of the vehicle between consecutive frames is smaller than in the urban environment, as the vehicle traverses the orchards almost in a straight line. Table 1 summarize the results for the quantiles in translation and rotation for the urban and agricultural settings. For the translation case, it can be seen that the ICP variations and the NDT are the most precise algorithms for both datasets because the difference between the A95 and A50 is similar and smaller than the other approaches. Meanwhile, the lowest precision in translation was obtained with the SVR and the PF approaches. To analyze the accuracy, consider the error value in the A50 statistic. The CPD showed the highest accuracy in translation for the urban (0.53 m) and agricultural (0.04 m) scenarios, and the lowest accuracy was obtained with the PF approach (0.67 m for the urban and 0.06 m for the agricultural). When analyzing the rotation error, it can be seen that, for the urban environment, the GMM and the PF have a lower precision when compared with the other approaches. Similarly, in the agricultural case, the GMM and the SVR present higher error values in the rotation. Regarding the accuracy, the ICP variations and the NDT obtain a similar accuracy for the urban and agricultural settings.  Based on the previous results and the calibration procedure developed in Sec. 4.1, Table 2 summarizes the results for precision, accuracy, parameter sensitivity, and computational complexity. We established the sensitivity and complexity based on the results of Fig. 5. As previously described, some approaches (ICP variants and NDT) require little tuning of their parameters to be used, but others may need a more refined process to obtain accurate results. For example, the GMM and the SVR contain several parameters with a significant influence in the estimation, making the calibration procedure not trivial. The computation complexity presented in Table 2 is a key factor in applications with real-time constraints or when the available processing power is limited. For precision and accuracy, we evaluated the A50 and A95 statistics in translation and rotation. First, lets analyze the results for the urban scenario. It is shown that the ICP point-to-point, ICP point-to-plane, GICP, and NDT obtain the highest precision, followed by the CPD; those with the lowest precision are the GMM, SVR, and PF. Both the SVR and PF obtain the highest A50 error and, thus, the lowest accuracy. The GMM, on the other hand, yields a medium accuracy with a relatively small error in translation, but with the thirdhighest value in the rotation error. A high accuracy was obtained with the ICP point-to-point, ICP point-to-plane, GICP, NDT, and CPD. Based on the results obtained, we consider that robust (with outlier rejection) ICP point-to-plane should be the first go-to method for any registration problem related to urban environments. Meanwhile, the agriculture environment was a lot more challenging for the presented approaches. As shown in Figs. 8 and 12(b), all of the algorithms have a high probability of failure in rotation. These results suggest that the scan matching techniques should not be applied to the raw data due to the highly unstructured environment, and, instead, key points of the orchards should be used. However, the acquisition of key points using a 3D point cloud representation solely is not an easy task in this type of environment. An alternative, for example, is to fuse the information of a LiDAR scanner and other exteroceptive sensors such as RGB cameras, obtaining the key points in the RGB domain and projecting them to the 3D point cloud representation. This, of course, will be subjected to the limitations of the camera used.

Lessons Learned
The following are the lessons learned during the experimentation of scan matching techniques in urban and agricultural scenarios.
• The performance of probabilistic approaches relies on the tuning of several parameters with a direct effect on the estimation of the pose of the vehicle. In this work, we used a speed versus accuracy plot for tuning some of those parameters. However, it is important to consider the trade-off between accuracy and computational time for large point cloud data sets. For example, for the NDT, we prioritized accuracy over computational time, but this will not be a good choice if an embedded system with computational power limitations  is used. Other approaches, such as GMM and SVR, need to have a previous adjustment of parameters to avoid inconsistency for each application, resembling an ad-hoc solution.
• The results obtained from the urban data set show that accurate position estimation can be achieved in relatively long path paths during straight driving. Nevertheless, when the vehicle turns, errors appear in the frame-to-frame registration. Such errors became more evident when the vehicle stops and objects are moving around it. This outcome highlights one of the main limitations when using scan matching techniques as a sole localization strategy. • In agricultural settings, the major problem was found in the rotation estimation. It was shown that all of the scan matching techniques have a high probability of failure in rotation (even for short paths), mainly due to the unstructured shape of these scenarios. An alternative would be to consider a pre-processing that could obtain key points on the 3D raw data. This, however, adds a level of complexity beyond the scope of this research. Commonly, key points rely on many types of geometric primitives existing in the scene, such as curvatures 64 or planes. 65 However, in dense orchards, one region appears much the same as any other, which causes a lack of visual signatures and the need for more specialized and sophisticated key-point extraction algorithms. • The computing time can play a significant role in online applications with real-time constraints or systems with limited processing power. Therefore, the time to convergence could limit the use of some approaches; however, it is not an easy task to get a general evaluation of the time to convergence in all of the algorithms because there are several factors that have a direct influence, such as the hardware use, the programming language, and the amount of parallelism, among others, but such an analysis goes beyond the scope of this paper. As recommended in Ref. 21, time should be considered to be only a qualitative measure. • Association among points is a key issue for scan registration. In this context, the experiments show that data association is more accurate when the LiDAR beams collide with continuous and solid surfaces, and it becomes more complex when the LiDAR beams penetrate the objects, as in the agricultural scenario. When the association fails, the optimization for parameters estimation achieves a higher probability of failure, as shown in Figs. 11 and 12. • The velocity of the mobile platform directly impacts the overlap between two consecutive scans. When the travel velocity increases, the overlap decreases. Future research should be devoted to better understand the overlapping region of influence and its effect on the accuracy of the scan matching approaches. • Future work should include the assessment of other sensors for scan matching purposes, thus enhancing the information managed by the approach. Additionally, the influence of 3D sensor performance (accuracy, precision, and resolution) and the effects on the scan matching accuracy should be further investigated. • Finally, as shown in Figs. 7 and 8, the scan matching techniques as localization strategies in autonomous vehicles should be used as a complementary technique to GNSS antennas (or another absolute localization system) because it is susceptible to falling into inconsistency, thus jeopardizing the autonomy and safety of the vehicle (and its possible passengers in urban applications).

Conclusions
In this work, we focused on the localization of autonomous vehicles based on the registration of 3D point clouds acquired at different times from a LiDAR sensor. Eight different scan matching algorithms that represent the current state-of-the-art in the field were investigated, namely, ICP, its variations to point-to-point and to point-to-plane, GICP, NDT, GMM, CPD, SVR, and PF. The choice of one of these algorithms generally depends on several important characteristics such as accuracy, computational complexity, and convergence rate, each of which depends, in turn, on the application of interest. To the best of our knowledge, a general discussion of each of the above methods is not available in the literature. The algorithms were tested using 3D data of two types of outdoor environments: urban and agricultural. Results show that the performance of most registration algorithms heavily depends on the data used and thus on the environment itself. The agricultural setting proved to be more challenging due to natural objects with less-structured features than urban scenarios. The results presented herein are intended to encourage researchers and developers to build improvements on the identified limitations by developing new scan matching systems based on more robust algorithms and more accurate 3D sensors.

Appendix: Source Codes
The dataset used in this work can be found in • Ford Dataset (Urban). 49 • Lleida Dataset (Agriculture). 53 Following, the repositories of the source codes used in this tutorial are given. Please refer to Sec. 2 for the mathematical background and to Sec. 3 for the design parameters adopted in each algorithm.