Fitting Cylinder to Point Cloud Data. Béla Paláncz, Ãrpád Somogyi, Rehány Nikolet and Tamás Lovas. Department of Photogrammetry and Geoinformatics,.
e - publication, Wolfram Research, Wolfram Information Center, http : // library.wolfram.com/infocenter/MathSource/8950 (2014)
Fitting Cylinder to Point Cloud Data Béla Paláncz, Árpád Somogyi, Rehány Nikolet and Tamás Lovas Department of Photogrammetry and Geoinformatics, Budapest University of Technology and Economy 1521 Budapest, Hungary
Abstract A new robust parameter estimation method taking into account the real model error distribution is presented for large size of noisy data points. Maximum likelihood technique is employed to compute the model parameters assuming that the distribution of the model errors is a Gaussian mixture corresponding to the inlier and outlier data points. The maximization is carried out by local method, where the initial guess values were computed via numerical Groebner basis. After parameter estimation based on an initially computed distribution, the real errors are determined and the corresponding Gaussian mixture is identified via expectation maximization algorithm. The iteration procedure is converging when the error distribution becomes stationary. The method is illustrated via identifying tree stems using ground-based Lidar data.
Keywords parameter estimation, cylinder, point cloud, Gröbner basis, outliers, maximum likelihood, Gaussian mixture, expectation maximization, Lidar.
1 - Introduction Fitting real circular cylinders is an important problem in the representation of geometry of man made structures such as industrial plants Vosselman et al (2004), deformation analysis of tunnel, Stal et al (2012), detecting and monitoring the deformation in deposition holes, Carrea et al (2014), positioning of femur pieces for surgical fracture reduction, Winkelbach et al (2003), estimating tree stems, Khameneh (2013) and so on. Since planes and cylinders compose up to 85% of all objects in industrial scenes, research in 3D reconstruction and modeling - see CAD - CAM applications - have largely focused on these two important geometric primitives, i.e. Petitjean (2002). In general, to define a cylinder we need 5 parameters: 4 for the axis and 1 for the radius, so one requires at last 5 points to determine parameters. It goes without saying, that in special cases, like cylinder is parallel to an axis or to a plane, less parameter is enough, see Beder and Förstner (2006). In case of more than 5 points, one has to find the 5 parameters of the cylinder, that the sum of the distances of data points from the cylinder surface is minimum in least square sense. Basically, two approaches can be followed: find the direction vector of the cylinder center-axis and transform the data points into vertical position to the x -y plane, then the remained 3 parameters of the cylinder oriented parallel to z axis - 2 shifting parameters and the radius - can be computed, Beder and Förstner (2006). all of the 5 parameters are computed via least square method employing local optimization technique, i.e. Lukacs et al (1998), Lichtblau (2007). There are different methods to find the orientation of the cylinder center axis: ◦ one may use Principal Component Analysis (PCA),
2
Cylinder_Article_nb_Last_08_21.nb
◦ considering the general form of a cylinder as a second order surfaces, the direction vector can be partially extracted, and computed via linear least square, see Khameneh (2013), ◦ the PCA method can be modified such as that instead of single points, a local neighborhood of randomly selected points are employed, Ruiz et al (2013). ◦ employing Hough transformation, see Su and Bethel (2010), Rabbani and van den Heuvel (2005). All of these methods consider least square method assuming that model error has normal Gaussian distribution with zero mean value. In this study we consider realistic model error distribution applying maximization likelihood technique for parameter estimation, where this distribution is represented by a Gaussian mixture of the in and outliers error, and identified by expectation maximization algorithm. For geometric modeling of a general cylinder, a vector algebraic approach is followed, see Lichtblau (2007).
2 - Vector algebraic definition Given a cylinder with axis line L in ℛ 3 . We may parametrize it using five model parameters (a, b, c, d, r), where r stands for the radius of the cylinder. Let P (x, y, z) is a point of the cylinder and the vector of the axis line of L is vec = {1, a, c}. Let us consider L’ to pass through the origin of the coordinate system and parallel to L. The translation vector is offset = {0, b, d}. The result of this translation is P’ P and L’ L. We project this P’ point onto L’ and denote the length of the orthogonal projection perp. It is computed as follows. The vector of P’ is projected onto L' and projection will be subtract from the vector of P’. Then for the magnitude of perp, for the norm of it, one can write perp2 - r2 = 0 Let us apply this definition to set the equation of the cylinder. Considering a point Clear[a, b, c, d, r] P = {x, y, z}; The vector of the locus of this point on the cylinder having model parameters (a, b, c, d, r) can be computed as it follows, see Fig. 1. The vector of the axis line vec = {1, a, c}; The vector translating the axis line to pass the origin is, offset = {0, b, d}; The function computing the orthogonal projection perp, carries out projection onto the translated axis line L’ and subtracts it from the vector of locus P’ is perp[vec1_, vec_, offset_] := vec1 - offset - Projection[vec1 - offset, vec, Dot]
Applying it to point P perp[P, vec, offset] // Simplify x -
x + a (- b + y) + c (- d + z)
-b + y -
, 1 + a2 + c2 a (x + a (- b + y) + c (- d + z)) 1 + a2 + c2
, -d + z -
c (x + a (- b + y) + c (- d + z)) 1 + a2 + c2
Cylinder_Article_nb_Last_08_21.nb
vec
vec
L'
3
P offset
P' perp vec1 L
vec1-offset
Projection (0, 0, 0)
Fig. 1 Explanation of perp function
Clearing denominators and formulating of the equation for perp vector = Numerator[Together[%.% - r2 ]] b2 + b2 c2 - 2 a b c d + d2 + a2 d2 - r2 - a2 r2 - c2 r2 + 2 a b x + 2 c d x + a2 x2 + c2 x2 - 2 b y - 2 b c2 y + 2 a c d y - 2 a x y + y2 + c2 y2 + 2 a b c z - 2 d z - 2 a2 d z - 2 c x z - 2 a c y z + z2 + a2 z2
This is practically the implicit equation of the cylinder with model parameters (a, b, c, d, r). It is important to realized that creating this equation, the algebraic error definition, perp2 - r2 has been used.
3 - Parametrized form of the cylinder equation In order to employ these model parameters to define the parametric equation of the cylinder, let us developed the usual parametric equation in form of x = x (u, v), y = y (u,v) and z = z (u,v) with actual scaling parameters (u, v) and using the model parameters (a, b, c, d, r), The locus of point P is obtained as sum of a vector on L’ plus a vector of length r perpendicular to L. Let v is a parameter along the length of axis L’. Then the projection of P (u, v) on L is a vector μ = v vec + offset
All vector perpendicular to L are spanned by any independent pair. We can obtain an orthonormal pair {w1 ,w2 } in the standard way by finding the null space to the matrix whose one row is the vector along the axial direction, that is vec, and then using Gram -Schmidt to orthogonalize that pair. Using parameter u, this vector having length r and perpendicular to L can be written as ρ = r cos (u) w1 +r sin (u) w2
Then the locus vector of a general point of the cylinder is, see Fig. 2,
4
Cylinder_Article_nb_Last_08_21.nb
λ = μ + ρ
vec
P
offset
ρ
λ
v vec
L
μ
(0, 0, 0)
Fig.2 Explanation of general locus vector λ
Let us carry out this computation step by step in symbolic way employing Mathematica, pair = NullSpace[{vec}]; {w1, w2} = Orthogonalize[pair, Dot];
Then the parametric equation of a general circular cylinder using {a, b, c, d, r) parameters parametric = Evaluate[v vec + offset + r Cos[u] w1 + r Sin[u] w2] // FullSimplify v -
c r Cos[u] 1 + c2
-
a r Sin[u] (1 + c2)
1+
, b+av+
a2 1+c2
r Sin[u] 1+
, d+cv+
r Cos[u] 1 + c2
a2 1+c2
-
a c r Sin[u] (1 + c2 )
1+
a2 1+c2
where u and v the actual scaling parameters.
4 - Implicit equation from the parametric one One can develop the implicit equation from the parametric one, too.The problem practically means to eliminate u and v or avoiding trigonometrical expression, sin(u), cos(u) and v from algebraic system symbolically,
x =v-
c r cos (u)
-
1 + c2
y=b+av+
a r sin (u) 1 + c2 r sin (u) 1+
a2 1+c2
1+
a2 1+c2
Cylinder_Article_nb_Last_08_21.nb
r cos (u)
z=d+cv+
1 + c2
5
a c r sin (u)
-
1 + c2
1+
a2 1+c2
and sin2 (u) + cos2 (u) = 1
In order to carry out this computation in Mathematica, from practical reason let sin(u) = U and cos(u)= V. Then our system is polys = Append[v vec + offset + r V w1 + r U w2 - {x, y, z}, U2 + V2 - 1] // Simplify arU
-
(1 + c2 )
- x, b +
1 + c2
1+a2 +c2
(1 + c2)
rU
+ a v - y,
1+a2 +c2
1+c2
1+c2
acrU
d-
crV
+v-
rV
+cv+
1+
1+a2 +c2 1+c2
- z, - 1 + U2 + V2
c2
Let us clear denominators ee = Numerator[MapAll[Together, polys]] // Simplify 1 + a2 + c2
1 + c2 r U - (1 + c2)
- a
1 + a2 + c2
b
1 + c2 ac
+rU+
1 + c2 1 + a2 + c2 1 + c2
1 + c2 r U + (1 + c2 )
-
1 + c2 v + c r V +
(a v - y), (1 + c2 )3/2
1 + a2 + c2 1 + c2
c
1 + c2 x, 1 + a2 + c2
1 + c2 v + r V -
1 + c2
d-
1 + c2 z, -1 + U2 + V2
ff = Numerator[Together[PowerExpand[ee]]] // Simplify 1 + c2 r U +
- a b
1 + a2 + c2 + ac
1 + c2 r U +
1 + a2 + c2 v + c2 v - c 1 + c2 r U +
1 + c2 r V - x - c2 x,
1 + a2 + c2 (a v - y), (1 + c2 )
1 + a2 + c2 c v + c3 v +
1 + a2 + c2 d -
1 + c2 r V - z - c2 z, - 1 + U2 + V2
Applying Groebner basis to eliminate U and V and v, the implicit form is implicit = First[GroebnerBasis[ff, {x, y, z}, {v, U, V}, Sort -> True, MonomialOrder -> EliminationOrder, CoefficientDomain -> RationalFunctions]] // Expand b2 + b2 c2 - 2 a b c d + d2 + a2 d2 - r2 - a2 r2 - c2 r2 + 2 a b x + 2 c d x + a2 x2 + c2 x2 - 2 b y - 2 b c2 y + 2 a c d y - 2 a x y + y2 + c2 y2 + 2 a b c z - 2 d z - 2 a2 d z - 2 c x z - 2 a c y z + z2 + a2 z2
This is the same equation, which was computed in Section 1. vector - implicit // Simplify 0
5 - Computing model parameters in determined case Let us consider five triples of points points5 = Table[{xi , yi , zi }, {i, 1, 5}] {{x1 , y1 , z1 }, {x2 , y2 , z2 }, {x3 , y3, z3 }, {x4 , y4 , z4 }, {x5, y5 , z5 }}
6
Cylinder_Article_nb_Last_08_21.nb
Substituting these points into the implicit form and setting r2 = rsqr, we get eqs = Map[implicit /. {x #[[1]], y #[[2]], z #[[3]]} &, points5] /. r2 rsqr {b2 + b2 c2 - 2 a b c d + d2 + a2 d2 - rsqr - a2 rsqr - c2 rsqr + 2 a b x1 + 2 c d x1 + a2 x21 + c2 x21 - 2 b y1 - 2 b c2 y1 + 2 a c d y1 - 2 a x1 y1 + y21 + c2 y21 + 2 a b c z1 - 2 d z1 - 2 a2 d z1 - 2 c x1 z1 - 2 a c y1 z1 + z21 + a2 z21 , b2
+ b2 c2 - 2 a b c d + d2 + a2 d2 - rsqr - a2 rsqr - c2 rsqr + 2 a b x2 + 2 c d x2 + a2 x22 +
c2 x22 - 2 b y2 - 2 b c2 y2 + 2 a c d y2 - 2 a x2 y2 + y22 + c2 y22 + 2 a b c z2 - 2 d z2 2 a2 d z2 - 2 c x2 z2 - 2 a c y2 z2 + z22 + a2 z22, b2 + b2 c2 - 2 a b c d + d2 + a2 d2 - rsqr a2 rsqr - c2 rsqr + 2 a b x3 + 2 c d x3 + a2 x23 + c2 x23 - 2 b y3 - 2 b c2 y3 + 2 a c d y3 2 a x3 y3 + y23 + c2 y23 + 2 a b c z3 - 2 d z3 - 2 a2 d z3 - 2 c x3 z3 - 2 a c y3 z3 + z23 + a2 z23, b2 + b2 c2 - 2 a b c d + d2 + a2 d2 - rsqr - a2 rsqr - c2 rsqr + 2 a b x4 + 2 c d x4 + a2 x24 + c2 x24 - 2 b y4 - 2 b c2 y4 + 2 a c d y4 - 2 a x4 y4 + y24 + c2 y24 + 2 a b c z4 - 2 d z4 2 a2 d z4 - 2 c x4 z4 - 2 a c y4 z4 + z24 + a2 z24, b2 + b2 c2 - 2 a b c d + d2 + a2 d2 - rsqr a2 rsqr - c2 rsqr + 2 a b x5 + 2 c d x5 + a2 x25 + c2 x25 - 2 b y5 - 2 b c2 y5 + 2 a c d y5 2 a x5 y5 + y25 + c2 y25 + 2 a b c z5 - 2 d z5 - 2 a2 d z5 - 2 c x5 z5 - 2 a c y5 z5 + z25 + a2 z25}
This is a polynomial system for the parameters based on the algebraic error definition. In order to create the determined system for the geometric error model, let us consider geometric error model , namely perp -r , 2
Δi = perp - r Δi2 = #.# - r
which can be applied to the points5 data, then eqs = MapNumeratorTogether - r +
x1 -
#.# - r &, Map[perp[#, vec, offset] &, points5]
x1 + a (- b + y1 ) + c (-d + z1 ) 1+
a2
2
+
+ c2
2
-b + y1 - (a (x1 + a (-b + y1 ) + c (- d + z1 ))) (1 + a2 + c2 ) + 2
-d + z1 - (c (x1 + a (-b + y1 ) + c (- d + z1 ))) (1 + a2 + c2 ) -r +
x2 -
x2 + a (- b + y2 ) + c (-d + z2 )
,
2
+
1 + a2 + c2
2
-b + y2 - (a (x2 + a (-b + y2 ) + c (- d + z2 ))) (1 + a2 + c2 ) + 2
-d + z2 - (c (x2 + a (-b + y2 ) + c (- d + z2 ))) (1 + a2 + c2 ) -r +
x3 -
x3 + a (- b + y3 ) + c (-d + z3 )
,
2
+
1 + a2 + c2
2
-b + y3 - (a (x3 + a (-b + y3 ) + c (- d + z3 ))) (1 + a2 + c2 ) + 2
-d + z3 - (c (x3 + a (-b + y3 ) + c (- d + z3 ))) (1 + a2 + c2 ) -r +
x4 -
x4 + a (- b + y4 ) + c (-d + z4 ) 1+
a2
+
,
2
+
c2
2
-b + y4 - (a (x4 + a (-b + y4 ) + c (- d + z4 ))) (1 + a2 + c2 ) + 2
-d + z4 - (c (x4 + a (-b + y4 ) + c (- d + z4 ))) (1 + a2 + c2 ) -r +
x5 -
x5 + a (- b + y5 ) + c (-d + z5 ) 1+
a2
+
c2
,
2
+ 2
-b + y5 - (a (x5 + a (-b + y5 ) + c (- d + z5 ))) (1 + a2 + c2 ) + 2
-d + z5 - (c (x5 + a (-b + y5 ) + c (- d + z5 ))) (1 + a2 + c2 )
Cylinder_Article_nb_Last_08_21.nb
7
6 - Computing model parameters in overdetermined case Let us consider algebraic error model first. This means to minimize the residual of the implicit form n
G (a, b, c, d, r) = Δi 2 i=1
where the algebraic error is, see implicit or vector, Δi = b2 + b2 c2 - 2 a b c d + d2 + a2 d2 - rsqr - a2 rsqr - c2 rsqr + 2 a b xi + 2 c d xi + a2 x2i + c2 x2i - 2 b yi 2 b c2 yi + 2 a c d yi - 2 a xi yi + y2i + c2 y2i + 2 a b c zi - 2 d zi - 2 a2 d zi - 2 c xi zi - 2 a c yi zi + z2i + a2 z2i ,
while the geometric error is, Δi = MapNumeratorTogether - r +
xi -
#.# - r &, Map[perp[#, vec, offset] &, {{xi , yi , zi }}]
xi + a (- b + yi ) + c (-d + zi ) 1 + a2 + c2
2
+ 2
-b + yi - (a (xi + a (-b + yi ) + c (- d + zi ))) (1 + a2 + c2 ) + 2
-d + zi - (c (xi + a (-b + yi ) + c (- d + zi ))) (1 + a2 + c2 )
In order to carry out this minimization problem via local minimization, since is much faster than the global one, we may solve the determined system ( n = 5) for randomly selected data points to get initial guess values.
7 - Application to estimation of tree stem diameter Outdoor laser scanning measurements have been carried out in the backyard of Budapest University of Technology and Economics, see Fig.3/a. In order to get simple fitting problem rather than segmentation one, the test object, the lower part of the stem of a tree was preselected by segmentation, see Fig. 3/ b
8
Cylinder_Article_nb_Last_08_21.nb
Fig.3/a/b Test environment (to left) and test object (to right)
The experiment has been carried out with a Faro Focus 3D terrestrial laser scanner, see Fig.4.
Fig. 4 Faro Focus 3D scanner
The scanning parameters were set to ½ resolution that equals to 3mm/10m point spacing. The test data set was cropped from the point cloud; moreover, further resampling was applied in order to reduce the data size. The final data set is composed of 16 434 points in ASCII format, and only the x, y, z coordinates were kept (no intensity values). Let us load the measured data: XYZ = Import"G:\\Bfa.dat"; = Length[XYZ] 16 434 p1 = ListPointPlot3D[XYZ, PlotStyle {Green, Directive[Tiny]}, BoxRatios {1, 1, 1.5}]
141
140
-12.0 139 -12.5 1.5 2.0
Fig.5 The points of cloud of data
dataP = XYZ;
We shall compute the parameters from the geometric error model employing local minimization of the sum of the square of the residual. To do that we compute the initial guess values from 5 randomly chosen points of data using algebraic error model, since the resulted polynomial system can be solved easily via numerical Groebner basis. In
Cylinder_Article_nb_Last_08_21.nb
9
algebraic error model, since the resulted polynomial system can be solved easily via numerical Groebner basis. In deterministic case the number of real solution can even number (0, 2, 4 , or 6). According to our numerical experiences, to ensure reliable initial values, one need such a 5 points which provides at least 4 real solutions. SeedRandom[123 456] dataPR = RandomSample[dataP, 5] {{1.5402, - 12.2018, 140.461}, {1.5536, - 12.1603, 141.285}, {1.4512, -12.319, 141.43}, {1.6294, -12.3794, 140.952}, {1.6821, - 12.2508, 140.86}}
The equations are perps = Map[perp[#, vec, offset] &, dataPR] 1.5402 - (1.5402 + a (- 12.2018 - b) + c (140.461 - d)) (1 + a2 + c2 ), - 12.2018 - b - (a (1.5402 + a (- 12.2018 - b) + c (140.461 - d))) (1 + a2 + c2), 140.461 - (c (1.5402 + a (-12.2018 - b) + c (140.461 - d))) (1 + a2 + c2 ) - d, 1.5536 - (1.5536 + a (- 12.1603 - b) + c (141.285 - d)) (1 + a2 + c2 ), - 12.1603 - b - (a (1.5536 + a (- 12.1603 - b) + c (141.285 - d))) (1 + a2 + c2), 141.285 - (c (1.5536 + a (-12.1603 - b) + c (141.285 - d))) (1 + a2 + c2 ) - d, 1.4512 - (1.4512 + a (- 12.319 - b) + c (141.43 - d)) (1 + a2 + c2 ), - 12.319 - b - (a (1.4512 + a (- 12.319 - b) + c (141.43 - d))) (1 + a2 + c2 ), 141.43 - (c (1.4512 + a (- 12.319 - b) + c (141.43 - d))) (1 + a2 + c2 ) - d, 1.6294 - (1.6294 + a (- 12.3794 - b) + c (140.952 - d)) (1 + a2 + c2 ), - 12.3794 - b - (a (1.6294 + a (- 12.3794 - b) + c (140.952 - d))) (1 + a2 + c2), 140.952 - (c (1.6294 + a (-12.3794 - b) + c (140.952 - d))) (1 + a2 + c2 ) - d, 1.6821 - (1.6821 + a (- 12.2508 - b) + c (140.86 - d)) (1 + a2 + c2 ), - 12.2508 - b - (a (1.6821 + a (- 12.2508 - b) + c (140.86 - d))) (1 + a2 + c2 ), 140.86 - (c (1.6821 + a (- 12.2508 - b) + c (140.86 - d))) (1 + a2 + c2 ) - d
Employing integer coefficients in order to avoid round-off error
10
Cylinder_Article_nb_Last_08_21.nb
exprs = Map[Numerator[Together[Rationalize[#.#, 0] - rsqr]] &, perps] {248 477 205 553 + 469 830 309 a + 246 645 809 213 a2 + 305 045 000 b + 38 505 000 a b + 12 500 000 b2 - 5 408 450 805 c + 42 846 925 745 a c + 3 511 525 000 a b c + 1 890 701 741 c2 + 305 045 000 b c2 + 12 500 000 b2 c2 - 3 511 525 000 d 3 511 525 000 a2 d + 38 505 000 c d - 305 045 000 a c d - 25 000 000 a b c d + 12 500 000 d2 + 12 500 000 a2 d2 - 12 500 000 rsqr - 12 500 000 a2 rsqr - 12 500 000 c2 rsqr, 2 010 932 412 109 + 3 778 448 416 a + 1 996 386 489 796 a2 + 2 432 060 000 b + 310 720 000 a b + 100 000 000 b2 - 43 900 075 200 c + 343 613 597 100 a c + 28 257 000 000 a b c + 15 028 656 905 c2 + 2 432 060 000 b c2 + 100 000 000 b2 c2 - 28 257 000 000 d 28 257 000 000 a2 d + 310 720 000 c d - 2 432 060 000 a c d - 200 000 000 a b c d + 100 000 000 d2 + 100 000 000 a2 d2 - 100 000 000 rsqr - 100 000 000 a2 rsqr - 100 000 000 c2 rsqr, 503 855 066 525 + 893 866 640 a + 500 113 772 036 a2 + 615 950 000 b + 72 560 000 a b + 25 000 000 b2 - 10 262 160 800 c + 87 113 808 500 a c + 7 071 500 000 a b c + 3 846 593 561 c2 + 615 950 000 b c2 + 25 000 000 b2 c2 - 7 071 500 000 d 7 071 500 000 a2 d + 72 560 000 c d - 615 950 000 a c d - 50 000 000 a b c d + 25 000 000 d2 + 25 000 000 a2 d2 - 25 000 000 rsqr - 25 000 000 a2 rsqr - 25 000 000 c2 rsqr, 500 517 896 209 + 1 008 549 718 a + 496 753 031 209 a2 + 618 970 000 b + 81 470 000 a b + 25 000 000 b2 - 11 483 359 440 c + 87 245 059 440 a c + 7 047 600 000 a b c + 3 897 612 218 c2 + 618 970 000 b c2 + 25 000 000 b2 c2 - 7 047 600 000 d 7 047 600 000 a2 d + 81 470 000 c d - 618 970 000 a c d - 50 000 000 a b c d + 25 000 000 d2 + 25 000 000 a2 d2 - 25 000 000 rsqr - 25 000 000 a2 rsqr - 25 000 000 c2 rsqr, 1 999 162 170 064 + 4 121 414 136 a + 1 984 436 906 041 a2 + 2 450 160 000 b + 336 420 000 a b + 100 000 000 b2 - 47 388 121 200 c + 345 129 537 600 a c + 28 172 000 000 a b c + 15 291 156 105 c2 + 2 450 160 000 b c2 + 100 000 000 b2 c2 - 28 172 000 000 d 28 172 000 000 a2 d + 336 420 000 c d - 2 450 160 000 a c d - 200 000 000 a b c d + 100 000 000 d2 + 100 000 000 a2 d2 - 100 000 000 rsqr - 100 000 000 a2 rsqr - 100 000 000 c2 rsqr}
Solving the system via numerical Gröbner basis sol5 = NSolve[exprs, {a, b, c, d, rsqr}] {{a - 0.745866, b - 11.1109, c - 22.1205, d 175.724, rsqr 0.0124972}, {a - 0.674735, b -11.2742, c - 3.34268, d 145.884, rsqr 0.0316293}, {a - 0.0533048 + 0.957057 , b - 12.1959 - 1.50844 , c 0.372737 + 0.0412819 , d 140.563 - 0.0688756 , rsqr 0.097419 - 0.00128866 }, {a - 0.0533048 - 0.957057 , b - 12.1959 + 1.50844 , c 0.372737 - 0.0412819 , d 140.563 + 0.0688756 , rsqr 0.097419 + 0.00128866 }, {a 2.43015, b - 14.9782, c - 1.54214, d 142.734, rsqr 0.268222}, {a - 2.03283, b - 9.11152, c 5.59169, d 132.397, rsqr 0.0299727}}
The real solutions solnsR = Select[sol5, Im[#[[1, 2]]] 0 &] {{a - 0.745866, b - 11.1109, c - 22.1205, d 175.724, rsqr 0.0124972}, {a - 0.674735, b -11.2742, c - 3.34268, d 145.884, rsqr 0.0316293}, {a 2.43015, b - 14.9782, c - 1.54214, d 142.734, rsqr 0.268222}, {a - 2.03283, b - 9.11152, c 5.59169, d 132.397, rsqr 0.0299727}}
We create the objective function with the geometric error model, Clear[vec, offset, a, b, c, d, r]
Cylinder_Article_nb_Last_08_21.nb
Employing all of the points vec = {1, a, c}; offset = {0, b, d}; perps = Map[perp[#, vec, offset] &, dataP];
Applying the geometric error model exprs = MapNumeratorTogether
#.# - r &, perps;
Then the objective function obj = Apply[Plus, Map[# ^ 2 &, exprs]]; SolnsR = MapJoinDrop[#, - 1], r
Last[#][[2]] &, solnsR
{{a - 0.745866, b - 11.1109, c - 22.1205, d 175.724, r 0.111791}, {a - 0.674735, b -11.2742, c - 3.34268, d 145.884, r 0.177846}, {a 2.43015, b - 14.9782, c - 1.54214, d 142.734, r 0.517901}, {a - 2.03283, b - 9.11152, c 5.59169, d 132.397, r 0.173126}}
From these solution we can select the very one, which gives the smallest residual objs = Map[obj /. # &, SolnsR] {773.458, 35.6256, 7816.1, 5356.02} best = SolnsR[[Position[objs, Min[objs]] // Flatten // First]] {a -0.674735, b - 11.2742, c - 3.34268, d 145.884, r 0.177846}
This solution will be used as initial guess values for the local minimization, initDataG = Map[{#[[1]], #[[2]]} &, best] {{a, - 0.674735}, {b, - 11.2742}, {c, - 3.34268}, {d, 145.884}, {r, 0.177846}} AbsoluteTiming[sol = FindMinimum[obj, initDataG];] {46.956083, Null}
The result is, sol {25.041, {a - 0.604342, b - 11.3773, c - 3.68704, d 146.479, r 0.178783}} p2 = ParametricPlot3D[(parametric /. sol[[2]]), {v, 1.3, 2.}, {u, 0, 2 Pi}, PlotStyle Directive[Opacity[0.6], Yellow], Mesh False];
11
12
Cylinder_Article_nb_Last_08_21.nb
Show[{p1, p2}]
Fig. 6 The fitted cylinder via geometrical error model
The distribution of the model error error = exprs /. sol[[2]]; p5 = Histogram[error, Automatic, "PDF"]
Fig.7 Histogram of the model error of the standard approach assumung (0, σ)
Mean[error] 2.08444 × 10-16 Min[error] - 0.0813939 Max[error] 0.224278
Cylinder_Article_nb_Last_08_21.nb
13
StandardDeviation[error] 0.0390362
It is clear from Fig .7, that the assumption for the model error is not true. Therefore one should employ likelihood function for the parameter estimation. The real model error distribution as well as the model parameters can be computed in iterative way. We can consider the distribution shown in Fig. 7. as a first guess for this iteration. In order to handle an empirical distribution in the likelihood function there are at least three ways: ◦ Gaussian kernel functions can be applied to the values of different bins, ◦ Employing a known type of distribution to approximate the empirical histogram, ◦ Considering the empirical distribution as a Gaussian mixture of errors corresponding to inliers and ouliers Here, we considered the third approach, and employed expectation maximization algorithm to compute the parameters of the component distributions.
8 - Expectation Maximization Let us consider a two-component Gaussian mixture represented by the mixture model in the following form, 12 (x) = η 1 (μ1 , σ1 , x) + η2 (μ2 , σ2 , x)
where
(μi, σi , x) =
-
(x-μi )2 2 σi2
,
i = 1, 2
2 π σi
and ηi ’s are the membership weights constrained with η1 + η2 = 1
We are looking for the parameter values (μ1 , σ1 ) and (μ2 , σ2 ). The log-likelihood function in case of N samples is N
N
Logℒ (xi, θ) = Log (12 (xi , θ)) = Log (η1 (μ1 , σ1 , xi ) + η2 (μ2 , σ2 , xi)) i=1
i=1
where θ = (μ1 , σ1 ,μ2 , σ2 ) the parameters of the normal densities. The problem is the direct maximization of this function, because of the sum of terms inside the logarithm. In order to solve this problem let us introduce the following alternative log-likelihood function N
Logℒ (xi, θ, Δ) = (1 - Δi) Log ( (μ1 , σ1 , xi)) + Δi Log ( (μ2, σ2, xi )) + i=1 N
(1 - Δi) Log (α1 ) + Δi Log (α2) i=1
Here Δi ’s are considered as unobserved latent variables taking values 0 or 1. If xi belongs to the first component then Δi= 0, so Logℒ (xi , θ, Δ) = Log ( (μ1 , σ1 , xi)) + N1 Log (α1 ) i ∈N1 (Δ)
otherwise xi belongs to the second component then Δi =1, therefore Logℒ (xi, θ, Δ) = Log ( (μ2 , σ2 , xi)) + N2 Log (α2 ) i∈N2 (Δ)
where N1 and N2 are the number of the elements of the mixture, which belong to the first and to the second
14
Cylinder_Article_nb_Last_08_21.nb
where and are the number of the elements of the mixture, which belong to the first and to the second component,respectively. Since the values of the Δi’s are actually unknown, we proceed in an iterative fashion, substituting for each Δi its expected value, ξi (θ) = E (Δi θ, x) = Pr (Δi = 1 θ, x) ≈
η2 (μ2 , σ2 , xi) (1 - η2 ) (μ1 , σ1 , xi) + η2 (μ2, σ2, xi )
This expression is also called as the responsibility of component two for observation i. Then the procedure called the EM algorithm for two -component Gaussian mixture is the following: ● Take initial guess for the parameters: θ = (μ1 , σ1 ,μ2 , σ2 ) and for η2 ● Expectation Step: compute the responsibilities: η2 (μ2, σ2, xi ) ξi = , (1 - η2 ) (μ1 , σ1 , xi) + η2 (μ2 , σ2 , xi)
for i = 1, 2, ..., N
● Maximization Step: compute the weighted means and variances for the two components: N N μ1 = 1 - ξi xi 1 - ξi i=1
i=1
N
N σ1 = 1 - ξi (xi - μ1 )2 1 - ξi i=1
i=1 N
N
i=1
i=1
μ2 = ξi xi ξi N
N
i=1
i=1
σ2 = ξi (xi - μ1 )2 ξi
and the mixing probability N η2 = ξi / N i=1
● Iterate these steps until convergence. This algorithm is implemented for Mathematica see, Fox et al (2013) as a Mathematica Demonstration project. The code has been modified and applied here. In the next section we illustrate how this function works.
Cylinder_Article_nb_Last_08_21.nb
15
ExpectationMaximization[samples_, niter_, init_] := Module{sim, data, updates}, sim = samples; problists[θ_, y_] := Block[{probs, totalprobs}, probs = Table[θ[[3, i]] * Map[PDF[Apply[NormalDistribution, θ[[i]]], #] &, y], {i, 2}]; totalprobs = Total[probs]; Map[# / totalprobs &, probs]]; pi[j_, p_] := Mean[p[[j]]]; Total[sim * p[[j]]] emu[sim_, j_, p_] := ; Total[p[[j]]] emstd[sim_, j_, u_List, p_] := Total[(sim - u[[j]]) ^2 * p[[j]]] / Total[p[[j]]]; em[params_, sim_] := Module[{theprobs, mus, vars, probs}, theprobs = problists[params, sim]; mus = Map[emu[sim, #, theprobs] &, {1, 2}]; vars = Map[emstd[sim, #, params[[1 ;; 2, 1]], theprobs] &, {1, 2}]; probs = Map[pi[#, theprobs] &, {1, 2}]; data = theprobs; Append[Transpose[{mus, vars^.5}], probs]]; updates = NestList[em[#, sim] &, init, niter]; {data, updates}
First we should compute the parameters of the two Gaussian via EM algorithm. Considering the histogram of the model errors, Fig. 7, the guess values of the two Gaussian are
16
Cylinder_Article_nb_Last_08_21.nb
{Δ, param} = ExpectationMaximization[error, 100, {{0.1, 0.05}, {0, 0.025}, {0.5, 0.5}}];
The result of the parameter values {{μ1, σ1}, {μ2, σ2}, {η1, η2}} = Last[param] {{0.0813029, 0.0535765}, {- 0.00940588, 0.0226878}, {0.103693, 0.896307}}
These can be displayed in a table form Grid[ {{Panel[TableForm[param[[- 1, 1 ;; 2]], TableHeadings {param[[- 1, 3]], {"μ", "σ"}}]]}}]
μ 0.0813029 - 0.00940588
0.103693 0.896307
σ 0.0535765 0.0226878
Table 1. Parameters of the Gaussian mixture after zero iteration
Fig. 8 shows the density functions of the two component. p3 = Plot[{param[[- 1, 3, 1]] * PDF[Apply[NormalDistribution, param[[-1, 1]]], x], param[[- 1, 3, 2]] * PDF[Apply[NormalDistribution, param[[- 1, 2]]], x]}, {x, - 0.1, 0.2}, PlotStyle {{Red}, {Blue}}, PerformanceGoal "Speed", PlotRange All] 15
10
5
-0.10
-0.05
0.05
0.10
0.15
0.20
Fig 8. The PDF of the two components
p4 = Plot[{param[[- 1, 3, 1]] * PDF[Apply[NormalDistribution, param[[-1, 1]]], x] + param[[- 1, 3, 2]] * PDF[Apply[NormalDistribution, param[[- 1, 2]]], x]}, {x, - 0.1, 0.2}, PlotStyle {Green}, PerformanceGoal "Speed", PlotRange All] 15
10
5
-0.10
-0.05
0.05
0.10
Fig 9. The joint PDF of the mixture
Show[{p5, p3, p4}]
0.15
0.20
Cylinder_Article_nb_Last_08_21.nb
Fig 10. The PDF of the two components and the normalized histogram of the data
The following figures show the convergence of the different parameters
17
18
Cylinder_Article_nb_Last_08_21.nb
s = Map[Flatten[#] &, param] // Transpose; Show[{ListPlot[{s[[1]], s[[3]]}, Joined True, PlotRange All, AxesLabel -> {"number of iterations", "μ1, μ2 "}], ListPlot[{s[[1]], s[[3]]}, PlotStyle PointSize[0.01]]}] μ1 , μ2 0.10 0.08 0.06 0.04 0.02
20
40
60
80
100
number of iterations
Fig 11. The convergence of the means of the two components
Show[{ListPlot[{s[[2]], s[[4]]}, Joined True, PlotRange All, AxesLabel -> {"number of iterations", "σ1, σ2 "}], ListPlot[{s[[2]], s[[4]]}, PlotStyle PointSize[0.01]]}] σ1 , σ2 0.07
0.06
0.05
0.04
0.03
20
40
60
80
100
number of iterations
Fig 12. The convergence of the standard deviations
Show[{ListPlot[{s[[5]], s[[6]]}, Joined True, PlotRange All, AxesLabel -> {"number of iterations", "α1, α2 "}], ListPlot[{s[[5]], s[[6]]}, PlotStyle PointSize[0.01]]}] α1 , α2
0.8
0.6
0.4
0.2 20
40
60
80
100
number of iterations
Fig 13. The convergence of the membership weights
Now, we separate the mixture of the samples into two clusters: cluster of outliers and cluster of inliers. Membership values of the first cluster Short[Δ[[1]], 10]
Cylinder_Article_nb_Last_08_21.nb
19
{0.163639, 0.00848004, 0.0116534, 1., 0.00851251, 0.0156639, 0.00894375, 0.00945738, 16 418, 1., 1., 1., 1., 1., 1., 0.999997, 0.999995}
Membership values of the second cluster Short[Δ[[2]], 10] 0.836361, 0.99152, 0.988347, 7.93708 × 10-8 , 0.991487, 0.984336, 0.991056, 0.990543, 16 419, 5.63356 × 10-11, 4.26408 × 10-11 , 1.24908 × 10-10, 1.40798 × 10-11, 2.28746 × 10-8 , 3.32431 × 10-6 , 5.16318 × 10-6
In order to get Boolean (crisp) clustering let us round the membership values S1 = Round[Δ[[1]]]; S2 = Round[Δ[[2]]];
The elements in the first cluster (outliers) are the corresponding elements of those having value 1 in the set containing the rounded member values (S1) XYZOut = Map[dataP[[#]] &, Position[S1, 1] // Flatten];
The number of these elements (outliers) Length[XYZOut] 1318
Similarly, the elements in the second cluster (inliers) are the corresponding elements of those having value 1 in the set containing the rounded member values (S2) XYZIn = Map[dataP[[#]] &, Position[S2, 1] // Flatten];
and Length[XYZIn] 15 116
which is the number of the inliers. Let us display the ouliers and the inliers pOut = ListPointPlot3D[XYZOut, PlotStyle Red]; pIn = ListPointPlot3D[XYZIn, PlotStyle Blue]; Show[{p1, pIn, pOut}]
141
140
-12.0 139 -12.5 1.5 2.0
20
Cylinder_Article_nb_Last_08_21.nb
Fig 14. The inliers (blue) and outliers (red) as a first guess resulted standard minimization of the total residual
9 - Maximum likelihood for Gaussian mixture The likelihood function for a two-component Gaussian mixture can be written as, Logℒ (xi, θ) = Log ( (μ1 , σ1 , xi )) + Log ( (μ2 , σ2 , xi )) + N1 Log (η1) + N2 Log (η 2) i ∈N1
i∈N2
where the likelihood function for one of the components can be developed as it follows. We have seen that the geometric error is Δi = -r + xi -
xi + a (-b + yi ) + c (-d + zi )
2
2
+ -b + yi - (a (xi + a (-b + yi) + c (-d + zi))) 1 + a2 + c2 +
1 + a2 + c2
-d + zi - (c (xi + a (-b + yi ) + c (-d + zi ))) 1 + a2 + c2
2
The probability density function for a single Gaussian
-
(x-μ)2 2 σ2
2π σ
Let us substitute the expression of the geometric error, (x Δi ) 2
-r-μ+
1
pdf =
xi -
x i +a -b+y i +c -d+zi 2 1+a2 +c2
a xi +a -b+y i +c -d+zi 2
+ -b+yi -
1+a2 +c2
-
+ -d+zi -
c xi +a -b+yi +c -d+zi 2 1+a2 +c2
2 σ2
2π σ
We apply a Mathematica function developed by Rose and Smith (2000) which is entitled as SuperLog. This function utilizes pattern-matching code that enhances Mathematica’s ability to simplify expressions involving the natural logarithm of a product of algebraic terms. Then the likelihood function is
ℒ = pdf i=1 2
-r-μ+
1
xi-
x i +a -b+y i +c -d+zi 2 1+a2 +c2
+ -b+yi -
a xi +a -b+yi +c -d+zi 2 1+a2 +c2
-
+ -d+zi -
c xi +a -b+yi +c -d+zi 2 1+a2 +c2
2 σ2
2π σ
i=1
Considering its logarithm which should be maximized Logℒ = Log[ℒ] -
b2 2 σ2
a 2 b2
-
2 1 +
2 a2 + c2
a3 b c d
2 1 + a2 + c2 σ2
i=1
c2 d 2 1 + a2 + c2 σ2
a b xi 2
1 + a2 + c2 σ2
+ i=1
+
a2 b 2 c2
σ2
2 1 +
2abcd
+
1 + a2 + c2 σ2
1 + a2 + c2 σ2 +
2
2 1 +
2
1 + a2 + c2 σ2 c4 d 2
σ2
2 a2 + c2
a b c3 d
-
2
a 4 b2
-
-
r2 2 σ2
a3 b x i
-
rμ
2
1 + a2 + c2 σ2
+ i=1
+
σ2
-
2 a2 + c2
-
d2 2 σ2
μ2 2 σ2
-
a 2 b2
+
1 + a2 + c2 σ2
σ2
c 2 d2
-
2
1 2
a b c2 x i
Log[2]
2
1 + a2 + c2 σ2
+ i=1
+
1 2
2
-
1 + a2 + c2 σ2 a 2 c 2 d2
-
2 1 + a2 + c2 σ2
abcd
-
-
2
2 1 + a2 + c2 σ2
Log [π] - Log[σ] +
2 a b xi 1 + a2 + c2 σ2
+
+
+
Cylinder_Article_nb_Last_08_21.nb
i=1
2
1 + a2 + c2 σ2
i=1
i=1
σ2
1
+ a2
i=1
i=1
σ2
2
2
2 1 + a2 + c2 σ2
2
c2 d zi 2
1 + a2 + c2 σ2
i=1
i=1
2
1 + a2 + c2 σ2
i=1
+ -
1 + a2 +
i=1
2
i=1
2
1 + a2 + c2 σ2
σ2
+ i=1
2
1 + a2 + c2 σ2
i=1
a2 c2 z2i
1 + a2 +
2
2 c2 d zi
σ2
2
1 + a2 + c2 σ2
+ i=1
1 + a2 + c2 σ2
2 c2
2 a b c zi
i=1
+
i=1
2 c xi z i
2 a c yi zi
c4 z2i
1 + a2 + c2 σ2
2 1 + a2 + c2 σ2
+ i=1
2 σ2
+
+
d zi σ2
+
+
1 + a2 + c2 σ2
i=1
2
+
i=1
+
i=1
1 + a2 + c2 σ2
1 + a2 + c2 σ2
+ -
+ -
a2 y2i
i=1
a c3 yi zi
2 1 + a2 + c2 σ2
+ -
c 3 xi z i
+ -
i=1
c4 d zi
2 c2
2
y2i
1 + a2 + c2 σ2
+
+
+
2 a xi y i
2 1 + a2 + c2 σ2
2
i=1
i=1
a2 c2 y2i
a3 c yi zi
+ -
i=1
1 + a2 + c2 σ2
+
+
1 + a2 + c2 σ2
2 a c d yi
2
1 + a2 + c2 σ2
+
i=1
1 + a2 + c2 σ2
i=1
a 2 c xi z i
2 1 + a2 + c2 σ2
σ2
2
1 + a2 + c2 σ2
i=1
c2 z2i
1
a2 c2 d zi
σ2
a b c3 zi
+
+ -
a c2 xi yi
+ -
a c yi zi
-
σ2
2
2
2 1 + a2 + c2 σ2
a3 b c zi
+ -
i=1
a4 y2i
1 + a2 + c2 σ2
i=1
+ -
i=1
+
2 σ2
2 a2 b y i
2
i=1
1 + a2 + c2 σ2
+ -
1 + a2 + c2 σ2
2 + c2
i=1
+
a2
1 +
i=1
2
2 c2
σ2
1 + a2 + c2 σ2
i=1
+
c xi z i 1 + a2 +
2 + c2
i=1
a2 b c2 yi
a c3 d y i
+
a 3 xi y i
+
1 + a2 + c2 σ2
i=1
i=1
i=1
+
+ -
x2i
2
2
1 + a2 + c2 σ2
2 1 + a2 + c2 σ2
+
1 + a2 + c2 σ2
a b c zi
-
1 +
+ -
a2 y2i
i=1
i=1
a2
1 + a2 + c2 σ2
i=1
a xi y i
-
+
2 c2
i=1
i=1
x2i
2 c d xi
c2 x2i
+ -
a4 b yi
a3 c d y i
+
2
2
i=1
+ -
1 + a2 + c2 σ2
i=1
2 1 + a2 + c2 σ2
2
+
a2 x2i
+
1 + a2 + c2 σ2
a c d yi
i=1
i=1
a2 b y i
+
2
+ -
c3 d x i
1 + a2 + c2 σ2
2
b yi
-
i=1
2 1 + a2 + c2 σ2
i=1
+
x2i
-
a2 c d x i
c d xi
21
+
+ i=1
c2 z2i 1 + a2 + c2 σ2
z2i 2 σ2
+
+
2
r xi - (xi + a (-b + yi ) + c (-d + zi )) 1 + a2 + c2 + -b + yi - (a (xi + a (-b + yi ) + c (-d + zi))) 2
2
1 + a2 + c2 + -d + zi - (c (xi + a (-b + yi) + c (-d + zi))) 1 + a2 + c2 +
i=1
1 σ2
2
μ xi - (xi + a (-b + yi ) + c (-d + zi )) 1 + a2 + c2 + -b + yi - (a (xi + a (-b + yi) + c (-d + zi))) 2
2
1 + a2 + c2 + -d + zi - (c (xi + a (-b + yi) + c (-d + zi))) 1 + a2 + c2
It is a rather complicated expression, but fortunately Mathematica has built in function to compute this expression numerically. Now we can create the corresponding likelihood functions for the identified ouliers and inliers. dataP1 = XYZOut; dataP2 = XYZIn;
Then the residuals should be computed, perps1 = Map[perp[#, vec, offset] &, dataP1]; ϵ1g = MapNumeratorTogether
#.# - r &, perps1;
perps2 = Map[perp[#, vec, offset] &, dataP2]; ϵ2g = MapNumeratorTogether
#.# - r &, perps2;
22
Cylinder_Article_nb_Last_08_21.nb
The likelihood functions can be developed by Mathematica, LL1 = LogLikelihood[ NormalDistribution[Last[param][[1, 1]], Last[param][[1, 2]]], ϵ1g] // N; LL2 = LogLikelihood[ NormalDistribution[Last[param][[2, 1]], Last[param][[2, 2]]], ϵ2g] // N;
Then the likelihood function for the mixture, which should be maximized LL = LL1 + LL2 + Length[dataP1] Log[Last[param][[3, 1]]] + Length[dataP2] Log[Last[param][[3, 2]]];
In order to carry out local maximization, the result of the standard parameter estimation (see above) can be considered sol[[2]] {a -0.604342, b - 11.3773, c - 3.68704, d 146.479, r 0.178783} initDataG = Map[{#[[1]], #[[2]]} &, sol[[2]]] {{a, - 0.604342}, {b, - 11.3773}, {c, - 3.68704}, {d, 146.479}, {r, 0.178783}} AbsoluteTiming[solLL = FindMaximum[LL, initDataG, Method "PrincipalAxis"];] {8.143214, Null}
The result solLL {34 746.5, {a - 0.643829, b - 11.3064, c - 3.60796, d 146.386, r 0.175172}}
Now, let us compute the model error distribution perps = Map[perp[#, vec, offset] &, dataP]; ϵg = MapNumeratorTogether
#.# - r &, perps;
error = ϵg /. solLL[[2]]; p5 = Histogram[error, Automatic, "PDF"]
Fig.15 Histogram of the model error employing maximum likelihood method
It means, that employing the model error distribution represented Fig.7, and employing maximum likelihood method for estimating the parameters, with these parameters we get model error distribution represented by Fig.15. If the two distribution is the same, than the parameter estimation is correct. To answer this question let us employ EM for this later distribution.
Cylinder_Article_nb_Last_08_21.nb
23
{Δ, param} = ExpectationMaximization[error, 100, Last[param]];
The result of the parameter values {{μ1, σ1}, {μ2, σ2}, {η1, η2}} = Last[param] {{0.0826151, 0.0596793}, {- 0.0111856, 0.018304}, {0.118863, 0.881137}}
These can be displayed in a table form Grid[ {{Panel[TableForm[param[[- 1, 1 ;; 2]], TableHeadings {param[[- 1, 3]], {"μ", "σ"}}]]}}]
μ 0.0826151 -0.0111856
0.118863 0.881137
σ 0.0596793 0.018304
Table 2. Parameters of the Gaussian mixture after zero iteration
Fig. 2 shows the density functions of the two component. p3 = Plot[{param[[- 1, 3, 1]] * PDF[Apply[NormalDistribution, param[[-1, 1]]], x], param[[- 1, 3, 2]] * PDF[Apply[NormalDistribution, param[[- 1, 2]]], x]}, {x, - 0.1, 0.2}, PlotStyle {{Red}, {Blue}}, PerformanceGoal "Speed", PlotRange All]
15
10
5
-0.10
-0.05
0.05
0.10
0.15
0.20
Fig 16. The PDF of the two components
p4 = Plot[{param[[- 1, 3, 1]] * PDF[Apply[NormalDistribution, param[[-1, 1]]], x] + param[[- 1, 3, 2]] * PDF[Apply[NormalDistribution, param[[- 1, 2]]], x]}, {x, - 0.1, 0.2}, PlotStyle {Green}, PerformanceGoal "Speed", PlotRange All] 20
15
10
5
-0.10
-0.05
0.05
0.10
Fig 17. The joint PDF of the mixture
Show[{p5, p3, p4}]
0.15
0.20
24
Cylinder_Article_nb_Last_08_21.nb
Fig 18. The PDF of the two components and the normalized histogram of the data
s = Map[Flatten[#] &, param] // Transpose; Show[{ListPlot[{s[[1]], s[[3]]}, Joined True, PlotRange All, AxesLabel -> {"number of iterations", "μ1, μ2 "}], ListPlot[{s[[1]], s[[3]]}, PlotStyle PointSize[0.01]]}] μ1 , μ2
0.08 0.06 0.04 0.02
20
40
60
80
100
number of iterations
Fig 19. The convergence of the means of the two components
Show[{ListPlot[{s[[2]], s[[4]]}, Joined True, PlotRange All, AxesLabel -> {"number of iterations", "σ1, σ2 "}], ListPlot[{s[[2]], s[[4]]}, PlotStyle PointSize[0.01]]}] σ1 , σ2 0.06
0.05
0.04
0.03
0.02 20
40
60
80
100
number of iterations
Fig 20. The convergence of the standard deviations
Show[{ListPlot[{s[[5]], s[[6]]}, Joined True, PlotRange All, AxesLabel -> {"number of iterations", "α1, α2 "}], ListPlot[{s[[5]], s[[6]]}, PlotStyle PointSize[0.01]]}]
Cylinder_Article_nb_Last_08_21.nb
25
α1 , α2
0.8
0.6
0.4
0.2 20
40
60
80
100
number of iterations
Fig 21. The convergence of the membership weights
Now, we separate the mixture of the samples into two clusters: cluster of outliers and cluster of inliers. Membership values of the first cluster Short[Δ[[1]], 10] {0.0825756, 0.0117538, 0.0342882, 1., 0.0125805, 0.0147983, 0.0104937, 0.0196103, 0.0308094, 0.0161561, 0.0126288, 16 412, 0.99999, 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.}
Membership values of the second cluster Short[Δ[[2]], 10] 0.917424, 0.988246, 0.965712, 1.78221 × 10-11, 0.98742, 0.985202, 0.989506, 16 420, 2.3251 × 10-21, 8.37015 × 10-24, 4.22025 × 10-21, 1.97601 × 10-24, 1.26222 × 10-17, 1.2341 × 10-13, 2.78243 × 10-13
In order to get Boolean (crisp) clustering let us round the membership values S1 = Round[Δ[[1]]]; S2 = Round[Δ[[2]]];
The elements in the first cluster (outliers) are the corresponding elements of those having value 1 in the set containing the rounded member values (S1) XYZOut = Map[dataP[[#]] &, Position[S1, 1] // Flatten];
The number of these elements (outliers) Length[XYZOut] 1577
Similarly, the elements in the second cluster (inliers) are the corresponding elements of those having value 1 in the set containing the rounded member values (S2) XYZIn = Map[dataP[[#]] &, Position[S2, 1] // Flatten];
and Length[XYZIn] 14 857
which is the number of the inliers. Let us display the ouliers and the inliers pOut = ListPointPlot3D[XYZOut, PlotStyle Red]; pIn = ListPointPlot3D[XYZIn, PlotStyle Blue]; Show[{p1, pIn, pOut}]
26
Cylinder_Article_nb_Last_08_21.nb
141
140
-12.0 139 -12.5 1.5 2.0
Fig 22. The inliers (blue) and outliers (red) as a first guess resulted standard minimization of the total residual
Since the two distribution is different, we need to carry out a new iteration step.
10 - Iteration process Second Iteration dataP1 = XYZOut; dataP2 = XYZIn; solLL[[2]] {a -0.643829, b - 11.3064, c - 3.60796, d 146.386, r 0.175172} initDataG = Map[{#[[1]], #[[2]]} &, solLL[[2]]] {{a, - 0.643829}, {b, - 11.3064}, {c, - 3.60796}, {d, 146.386}, {r, 0.175172}} perps1 = Map[perp[#, vec, offset] &, dataP1]; ϵ1g = MapNumeratorTogether
#.# - r &, perps1;
perps2 = Map[perp[#, vec, offset] &, dataP2]; ϵ2g = MapNumeratorTogether
#.# - r &, perps2;
LL1 = LogLikelihood[ NormalDistribution[Last[param][[1, 1]], Last[param][[1, 2]]], ϵ1g] // N; LL2 = LogLikelihood[ NormalDistribution[Last[param][[2, 1]], Last[param][[2, 2]]], ϵ2g] // N; LL = LL1 + LL2 + Length[dataP1] Log[Last[param][[3, 1]]] + Length[dataP2] Log[Last[param][[3, 2]]]; AbsoluteTiming[solLL = FindMaximum[LL, initDataG, Method "PrincipalAxis"];] {4.1496073, Null} solLL {35 587.3, {a - 0.655632, b - 11.2872, c - 3.57826, d 146.343, r 0.174872}}
Cylinder_Article_nb_Last_08_21.nb
27
perps = Map[perp[#, vec, offset] &, dataP]; ϵg = MapNumeratorTogether
#.# - r &, perps;
error = ϵg /. solLL[[2]]; p5 = Histogram[error, Automatic, "PDF"]
{Δ, param} = ExpectationMaximization[error, 100, Last[param]];
The result of the parameter values {{μ1, σ1}, {μ2, σ2}, {η1, η2}} = Last[param] {{0.0813938, 0.0631309}, {- 0.0117629, 0.0176321}, {0.125038, 0.874962}}
These can be displayed in a table form Grid[ {{Panel[TableForm[param[[- 1, 1 ;; 2]], TableHeadings {param[[- 1, 3]], {"μ", "σ"}}]]}}]
0.125038 0.874962
μ 0.0813938 -0.0117629
σ 0.0631309 0.0176321
Now, we separate the mixture of the samples into two clusters: cluster of outliers and cluster of inliers. Membership values of the first cluster Short[Δ[[1]], 10] {0.0644796, 0.0144548, 0.0522669, 1., 0.0157726, 0.0155338, 0.012263, 0.0274401, 0.0475602, 0.0217465, 0.0160341, 0.0130666, 1., 0.0204233, 0.0124045, 0.0151212, 0.0166262, 16 400, 1., 1., 1., 1., 1., 1., 0.999999, 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.}
Membership values of the second cluster Short[Δ[[2]], 10] 0.93552, 0.985545, 0.947733, 4.68161 × 10-12, 0.984227, 0.984466, 0.987737, 0.97256, 0.95244, 0.978253, 0.983966, 0.986933, 1.09774 × 10-11, 16 408, 1.0728 × 10-9 , 3.31276 × 10-9 , 5.28607 × 10-7 , 3.26642 × 10-24, 2.30586 × 10-23, 3.94453 × 10-20, 1.01705 × 10-24, 3.54345 × 10-28 , 1.5008 × 10-24, 8.03577 × 10-29, 7.77316 × 10-21, 2.65401 × 10-16, 6.62002 × 10-16
In order to get Boolean (crisp) clustering let us round the membership values S1 = Round[Δ[[1]]]; S2 = Round[Δ[[2]]];
28
Cylinder_Article_nb_Last_08_21.nb
The elements in the first cluster (outliers) are the corresponding elements of those having value 1 in the set containing the rounded member values (S1) XYZOut = Map[dataP[[#]] &, Position[S1, 1] // Flatten];
The number of these elements (outliers) Length[XYZOut] 1641
Similarly, the elements in the second cluster (inliers) are the corresponding elements of those having value 1 in the set containing the rounded member values (S2) XYZIn = Map[dataP[[#]] &, Position[S2, 1] // Flatten];
and Length[XYZIn] 14 793
which is the number of the inliers. Let us display the ouliers and the inliers pOut = ListPointPlot3D[XYZOut, PlotStyle Red]; pIn = ListPointPlot3D[XYZIn, PlotStyle Blue]; Show[{p1, pIn, pOut}]
141
140
-12.0 139 -12.5 1.5 2.0
Third Iteration dataP1 = XYZOut; dataP2 = XYZIn; solLL[[2]] {a -0.655632, b - 11.2872, c - 3.57826, d 146.343, r 0.174872} initDataG = Map[{#[[1]], #[[2]]} &, solLL[[2]]] {{a, - 0.655632}, {b, - 11.2872}, {c, - 3.57826}, {d, 146.343}, {r, 0.174872}} perps1 = Map[perp[#, vec, offset] &, dataP1]; ϵ1g = MapNumeratorTogether
#.# - r &, perps1;
perps2 = Map[perp[#, vec, offset] &, dataP2];
Cylinder_Article_nb_Last_08_21.nb
ϵ2g = MapNumeratorTogether
29
#.# - r &, perps2;
LL1 = LogLikelihood[ NormalDistribution[Last[param][[1, 1]], Last[param][[1, 2]]], ϵ1g] // N; LL2 = LogLikelihood[ NormalDistribution[Last[param][[2, 1]], Last[param][[2, 2]]], ϵ2g] // N; LL = LL1 + LL2 + Length[dataP1] Log[Last[param][[3, 1]]] + Length[dataP2] Log[Last[param][[3, 2]]]; AbsoluteTiming[solLL = FindMaximum[LL, initDataG, Method "PrincipalAxis"];] {5.5224097, Null} solLL {35 596.8, {a - 0.661954, b - 11.2767, c - 3.57789, d 146.344, r 0.175006}} perps = Map[perp[#, vec, offset] &, dataP]; ϵg = MapNumeratorTogether
#.# - r &, perps;
error = ϵg /. solLL[[2]]; p5 = Histogram[error, Automatic, "PDF"]
{Δ, param} = ExpectationMaximization[error, 100, Last[param]];
The result of the parameter values {{μ1, σ1}, {μ2, σ2}, {η1, η2}} = Last[param] {{0.0837141, 0.062688}, {- 0.0120969, 0.0176298}, {0.122936, 0.877064}}
These can be displayed in a table form Grid[ {{Panel[TableForm[param[[- 1, 1 ;; 2]], TableHeadings {param[[- 1, 3]], {"μ", "σ"}}]]}}]
0.122936 0.877064
μ 0.0837141 -0.0120969
σ 0.062688 0.0176298
Now, we separate the mixture of the samples into two clusters: cluster of outliers and cluster of inliers. Membership values of the first cluster Short[Δ[[1]], 10]
30
Cylinder_Article_nb_Last_08_21.nb
{0.0513136, 0.0140639, 0.0574218, 1., 0.0155604, 0.0129873, 0.0113733, 0.0288436, 0.0522719, 0.0223699, 0.0159017, 0.0124254, 1., 0.0208635, 0.0115949, 0.0149015, 0.0165609, 0.0120249, 16 399, 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.}
Membership values of the second cluster Short[Δ[[2]], 10] 0.948686, 0.985936, 0.942578, 6.72198 × 10-12, 0.98444, 0.987013, 0.988627, 0.971156, 0.947728, 0.97763, 0.984098, 0.987575, 1.57813 × 10-11, 16 408, 4.79868 × 10-10, 1.5154 × 10-9 , 2.90584 × 10-7 , 9.74239 × 10-25, 7.05604 × 10-24, 1.3216 × 10-20, 3.0343 × 10-25, 6.35491 × 10-29, 4.10623 × 10-25, 1.43191 × 10-29, 2.03899 × 10-21, 8.12355 × 10-17, 2.03604 × 10-16
In order to get Boolean (crisp) clustering let us round the membership values S1 = Round[Δ[[1]]]; S2 = Round[Δ[[2]]];
The elements in the first cluster (outliers) are the corresponding elements of those having value 1 in the set containing the rounded member values (S1) XYZOut = Map[dataP[[#]] &, Position[S1, 1] // Flatten];
The number of these elements (outliers) Length[XYZOut] 1637
Similarly, the elements in the second cluster (inliers) are the corresponding elements of those having value 1 in the set containing the rounded member values (S2) XYZIn = Map[dataP[[#]] &, Position[S2, 1] // Flatten];
and Length[XYZIn] 14 797
which is the number of the inliers. Let us display the ouliers and the inliers pOut = ListPointPlot3D[XYZOut, PlotStyle Red]; pIn = ListPointPlot3D[XYZIn, PlotStyle Blue]; Show[{p1, pIn, pOut}]
Cylinder_Article_nb_Last_08_21.nb
141
140
-12.0 139 -12.5 1.5 2.0
We consider it as the last iteration step. The parametric equation of the fitted cylinder is parametric /. solLL[[2]] {v + 0.168546 Cos[u] + 0.00826369 Sin[u], - 11.2767 - 0.661954 v + 0.172292 Sin[u], 146.344 - 3.57789 v + 0.0471078 Cos[u] - 0.0295665 Sin[u]}
Let us visualize it with the inliers and outliers data points p2 = ParametricPlot3D[(parametric /. solLL[[2]]), {v, 1.3, 2.}, {u, 0, 2 Pi}, PlotStyle Directive[Opacity[0.6], Yellow], Mesh False]; Show[{p1, p2, pIn, pOut}]
Fig 23. The inliers (blue) and outliers (red) as a first guess resulted by the maximum likelihood technique
31
32
Cylinder_Article_nb_Last_08_21.nb
The results of 3 iteration steps can be seen in Table 3 and 4. Table 3 The computed cylinder parameters
Iteration 0 1 2 3
a -0.6043 -0.6438 -0.6556 -0.6619
b -11.3773 -11.3064 -11.2872 -11.2767
c -3.6870 -3.6080 -3.5783 -3.5779
d 146.479 146.386 146.343 146.344
r 0.1788 0.1752 0.1749 0.1750
Table 4 Parameters of the components of the Gaussian mixture
Iteration 0 1 2 3
μ1 0.0813 0.0826 0.0814 0.0837
μ2 -0.010 -0.011 -0.012 -0.012
σ1 0.0536 0.0597 0.0631 0.0627
σ2 0.0227 0.0183 0.0176 0.0176
η1 0.1037 0.1188 0.1250 0.1229
η2 0.8962 0.8811 0.8749 0.8771
N1 1318 1577 1641 1637
N2 15 116 14 857 14 793 14 797
11 - Application to leafy tree In the example above there were only ∼10 % outliers. Now let us see another example where the ratio of the outliers are much more higher, nearly 40% . This situation is closer to segmentation than to simple fitting problem. Here the test object is the same tree, but with foliage, see Fig.
Fig 24. The inliers (blue) and outliers (red) as a first guess resulted standard minimization of the total residual
Let us load the data XYZ = Import"M:\\Cfa_sub.dat"; = Length[XYZ] 91 089
Cylinder_Article_nb_Last_08_21.nb
p1 = ListPointPlot3D[XYZ, PlotStyle {Green, Directive[Tiny]}, BoxRatios {1, 1, 1.5}]
144
142
140 0 1 -11.5
-11.0
-12.0
2 -12.5 -13.0
Fig. 24 The points of cloud of data
The first guess will be the result, which we have got without foliage. dataP = XYZ; best = solLL; best = {a -0.6619, b -11.2767, c - 3.57789, d 146.344, r 0.1750} {a -0.6619, b -11.2767, c - 3.57789, d 146.344, r 0.175} p2 = ParametricPlot3D[(parametric /. best), {v, 0.6, 1.95}, {u, 0, 2 Pi}, PlotStyle Directive[Opacity[0.6], Yellow], Mesh False];
33
34
Cylinder_Article_nb_Last_08_21.nb
Show[{p1, p2}, PlotRange All]
Fig. 25 The first estimation of the cylinder fitting to leafy tree
Applying the geometric error model, the model error distribution of this first approach Clear[vec, offset, a, b, c, d, r]
Employing all of the points vec = {1, a, c}; offset = {0, b, d}; perps = Map[perp[#, vec, offset] &, dataP]; exprs = MapNumeratorTogether
#.# - r &, perps;
error = exprs /. best; p5 = Histogram[error, Automatic, "PDF"]
Fig. 26 The model error distribution of the first estimation
Now let us identify the components of the Gaussian mixture
Cylinder_Article_nb_Last_08_21.nb
35
{Δ, param} = ExpectationMaximization[error, 100, {{0.6, 0.15}, {0, 0.1}, {0.5, 0.5}}];
The result of the parameter values {{μ1, σ1}, {μ2, σ2}, {η1, η2}} = Last[param] {{0.341165, 0.275897}, {- 0.0109171, 0.0204849}, {0.463995, 0.536005}}
These can be displayed in a table form Grid[ {{Panel[TableForm[param[[- 1, 1 ;; 2]], TableHeadings {param[[- 1, 3]], {"μ", "σ"}}]]}}]
μ 0.341165 -0.0109171
0.463995 0.536005
σ 0.275897 0.0204849
Table 5. Parameters of the Gaussian mixture after zero iteration
The density functions of the two component. p3 = Plot[{param[[- 1, 3, 1]] * PDF[Apply[NormalDistribution, param[[-1, 1]]], x], param[[- 1, 3, 2]] * PDF[Apply[NormalDistribution, param[[- 1, 2]]], x]}, {x, - 0.2, 1}, PlotStyle {{Red}, {Blue}}, PerformanceGoal "Speed", PlotRange All] 10
8
6
4
2
-0.2
0.2
0.4
0.6
0.8
1.0
Fig 27. The PDF of the two components
p4 = Plot[{param[[- 1, 3, 1]] * PDF[Apply[NormalDistribution, param[[-1, 1]]], x] + param[[- 1, 3, 2]] * PDF[Apply[NormalDistribution, param[[- 1, 2]]], x]}, {x, - 0.2, 1}, PlotStyle {Green}, PerformanceGoal "Speed", PlotRange All] 10
8
6
4
2
-0.2
0.2
0.4
0.6
Fig 28. The joint PDF of the mixture
Show[{p5, p3, p4}]
0.8
1.0
36
Cylinder_Article_nb_Last_08_21.nb
Fig 29. The PDF of the two components and the normalized histogram of the data
s = Map[Flatten[#] &, param] // Transpose; Show[{ListPlot[{s[[1]], s[[3]]}, Joined True, PlotRange All, AxesLabel -> {"number of iterations", "μ1, μ2 "}], ListPlot[{s[[1]], s[[3]]}, PlotStyle PointSize[0.01]]}] μ1 , μ2 0.6 0.5 0.4 0.3 0.2 0.1
20
40
60
80
100
number of iterations
Fig 30. The convergence of the means of the two components
Show[{ListPlot[{s[[2]], s[[4]]}, Joined True, PlotRange All, AxesLabel -> {"number of iterations", "σ1, σ2 "}], ListPlot[{s[[2]], s[[4]]}, PlotStyle PointSize[0.01]]}] σ1 , σ2 0.25 0.20 0.15 0.10 0.05
20
40
60
80
100
number of iterations
Fig 31. The convergence of the standard deviations
Show[{ListPlot[{s[[5]], s[[6]]}, Joined True, PlotRange All, AxesLabel -> {"number of iterations", "α1, α2 "}], ListPlot[{s[[5]], s[[6]]}, PlotStyle PointSize[0.01]]}]
Cylinder_Article_nb_Last_08_21.nb
37
α1 , α2 0.7 0.6 0.5 0.4 0.3 0.2 0.1 20
40
60
80
100
number of iterations
Fig 32. The convergence of the membership weights
Now, we separate the mixture of the samples into two clusters: cluster of outliers and cluster of inliers. Membership values of the first cluster Short[Δ[[1]], 10] {1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 91 009, 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.}
Membership values of the second cluster Short[Δ[[2]], 10] 2.168186518114028× 10-569, 5.715327153955006× 10-563, 4.439837476761972× 10-375, 6.094575596255934× 10-374, 1.526341682623959× 10-585, 2.869284522501804× 10-372, 91 078, 9.51406 × 10-40, 6.5817 × 10-32, 8.02675 × 10-35, 1.28721 × 10-30, 1.99815 × 10-38
In order to get Boolean (crisp) clustering let us round the membership values S1 = Round[Δ[[1]]]; S2 = Round[Δ[[2]]];
The elements in the first cluster (outliers) are the corresponding elements of those having value 1 in the set containing the rounded member values (S1) XYZOut = Map[dataP[[#]] &, Position[S1, 1] // Flatten];
The number of these elements (outliers) Length[XYZOut] 39 982
Similarly, the elements in the second cluster (inliers) are the corresponding elements of those having value 1 in the set containing the rounded member values (S2) XYZIn = Map[dataP[[#]] &, Position[S2, 1] // Flatten];
and Length[XYZIn] 51 107
which is the number of the inliers. Let us display the ouliers and the inliers pOut = ListPointPlot3D[XYZOut, PlotStyle Red]; pIn = ListPointPlot3D[XYZIn, PlotStyle Blue]; Show[{p1, pIn, pOut}]
38
Cylinder_Article_nb_Last_08_21.nb
144
142
140
0
1 -11.0 -11.5 -12.0
2 -12.5 -13.0
Fig. 33 The inliers (blue) and outliers (red) as a first guess resulted by the first approach
Now let us compute the first iteration employing maximum likelihood method dataP1 = XYZOut; dataP2 = XYZIn; initDataG = Map[{#[[1]], #[[2]]} &, best] {{a, - 0.6619}, {b, - 11.2767}, {c, - 3.57789}, {d, 146.344}, {r, 0.175}} perps1 = Map[perp[#, vec, offset] &, dataP1]; ϵ1g = MapNumeratorTogether
#.# - r &, perps1;
perps2 = Map[perp[#, vec, offset] &, dataP2]; ϵ2g = MapNumeratorTogether
#.# - r &, perps2;
LL1 = LogLikelihood[ NormalDistribution[Last[param][[1, 1]], Last[param][[1, 2]]], ϵ1g] // N; LL2 = LogLikelihood[ NormalDistribution[Last[param][[2, 1]], Last[param][[2, 2]]], ϵ2g] // N; LL = LL1 + LL2 + Length[dataP1] Log[Last[param][[3, 1]]] + Length[dataP2] Log[Last[param][[3, 2]]]; AbsoluteTiming[solLL = FindMaximum[LL, initDataG, Method "PrincipalAxis"];] {31.6212555, Null} solLL {59 925.1, {a - 0.660978, b - 11.28, c -3.60862, d 146.398, r 0.17533}} perps = Map[perp[#, vec, offset] &, dataP];
Cylinder_Article_nb_Last_08_21.nb
ϵg = MapNumeratorTogether
#.# - r &, perps;
error = ϵg /. solLL[[2]]; p5 = Histogram[error, Automatic, "PDF"]
Fig 34. The PDF of the two components and the normalized histogram of the data
Now the parameters of the Gaussian mixture
39
40
Cylinder_Article_nb_Last_08_21.nb
{Δ, param} = ExpectationMaximization[error, 100, Last[param]];
The result of the parameter values {{μ1, σ1}, {μ2, σ2}, {η1, η2}} = Last[param] {{0.343552, 0.275066}, {- 0.0109826, 0.0207809}, {0.458853, 0.541147}}
These can be displayed in a table form Grid[ {{Panel[TableForm[param[[- 1, 1 ;; 2]], TableHeadings {param[[- 1, 3]], {"μ", "σ"}}]]}}]
μ 0.343552 -0.0109826
0.458853 0.541147
σ 0.275066 0.0207809
Table 6. Parameters of the Gaussian mixture after zero iteration
The density functions of the two component. p3 = Plot[{param[[- 1, 3, 1]] * PDF[Apply[NormalDistribution, param[[-1, 1]]], x], param[[- 1, 3, 2]] * PDF[Apply[NormalDistribution, param[[- 1, 2]]], x]}, {x, - 0.2, 1}, PlotStyle {{Red}, {Blue}}, PerformanceGoal "Speed", PlotRange All] 10
8
6
4
2
-0.2
0.2
0.4
0.6
0.8
1.0
Fig 35. The PDF of the two components
p4 = Plot[{param[[- 1, 3, 1]] * PDF[Apply[NormalDistribution, param[[-1, 1]]], x] + param[[- 1, 3, 2]] * PDF[Apply[NormalDistribution, param[[- 1, 2]]], x]}, {x, - 0.2, 1}, PlotStyle {Green}, PerformanceGoal "Speed", PlotRange All] 10
8
6
4
2
-0.2
0.2
0.4
0.6
Fig 36. The joint PDF of the mixture
Show[{p5, p3, p4}]
0.8
1.0
Cylinder_Article_nb_Last_08_21.nb
Fig 37 The PDF of the two components and the normalized histogram of the data
s = Map[Flatten[#] &, param] // Transpose; Show[{ListPlot[{s[[1]], s[[3]]}, Joined True, PlotRange All, AxesLabel -> {"number of iterations", "μ1, μ2 "}], ListPlot[{s[[1]], s[[3]]}, PlotStyle PointSize[0.01]]}] μ1 , μ2 0.35 0.30 0.25 0.20 0.15 0.10 0.05 20
40
60
80
100
number of iterations
Fig 38. The convergence of the means of the two components
Show[{ListPlot[{s[[2]], s[[4]]}, Joined True, PlotRange All, AxesLabel -> {"number of iterations", "σ1, σ2 "}], ListPlot[{s[[2]], s[[4]]}, PlotStyle PointSize[0.01]]}] σ1 , σ2 0.25 0.20 0.15 0.10 0.05
20
40
60
80
100
number of iterations
Fig 39. The convergence of the standard deviations
Show[{ListPlot[{s[[5]], s[[6]]}, Joined True, PlotRange All, AxesLabel -> {"number of iterations", "α1, α2 "}], ListPlot[{s[[5]], s[[6]]}, PlotStyle PointSize[0.01]]}]
41
42
Cylinder_Article_nb_Last_08_21.nb
α1 , α2 0.54
0.52
0.50
0.48
0.46 20
40
60
80
100
number of iterations
Fig 40. The convergence of the membership weights
Now, we separate the mixture of the samples into two clusters: cluster of outliers and cluster of inliers. Membership values of the first cluster Short[Δ[[1]], 10] {1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 90 969, 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.}
Membership values of the second cluster Short[Δ[[2]], 10] 5.197294751685767× 10-563, 9.62586576863179× 10-557, 2.440525376542210× 10-371, 3.138647064118679× 10-370, 7.696701760704474× 10-579, 1.319339597606332× 10-368, 91 077, 5.26593 × 10-38, 1.12943 × 10-38, 4.7876 × 10-31, 6.62412 × 10-34, 7.8142 × 10-30, 1.95385 × 10-37
In order to get Boolean (crisp) clustering let us round the membership values S1 = Round[Δ[[1]]]; S2 = Round[Δ[[2]]];
The elements in the first cluster (outliers) are the corresponding elements of those having value 1 in the set containing the rounded member values (S1) XYZOut = Map[dataP[[#]] &, Position[S1, 1] // Flatten];
The number of these elements (outliers) Length[XYZOut] 39 404
Similarly, the elements in the second cluster (inliers) are the corresponding elements of those having value 1 in the set containing the rounded member values (S2) XYZIn = Map[dataP[[#]] &, Position[S2, 1] // Flatten];
and Length[XYZIn] 51 685
which is the number of the inliers. Let us display the ouliers and the inliers pOut = ListPointPlot3D[XYZOut, PlotStyle Red]; pIn = ListPointPlot3D[XYZIn, PlotStyle Blue];
Cylinder_Article_nb_Last_08_21.nb
Show[{p1, pIn, pOut}]
144
142
140
0
1 -11.0 -11.5 2
-12.0 -12.5 -13.0
Fig. 34 The inliers (blue) and outliers (red) as a first guess resulted by the first iteration
The inlier points, Show[{pIn}, BoxRatios {1, 1, 2}]
43
44
Cylinder_Article_nb_Last_08_21.nb
-12.0 -12.5
144
142
140
1.0 1.5 2.0
Fig. 35 The inliers
We consider it as the last iteration step. The parametric equation of the fitted cylinder is parametric /. solLL[[2]] {v + 0.168963 Cos[u] + 0.00813893 Sin[u], - 11.28 - 0.660978 v + 0.172661 Sin[u], 146.398 - 3.60862 v + 0.0468219 Cos[u] - 0.0293703 Sin[u]} p2 = ParametricPlot3D[(parametric /. solLL[[2]]), {v, 0.6, 1.95}, {u, 0, 2 Pi}, PlotStyle Directive[Opacity[0.6], Yellow], Mesh False];
Cylinder_Article_nb_Last_08_21.nb
Show[{pIn, p2}, BoxRatios {1, 1, 2}]
Fig. 36 The inliers (blue) and the fitted cylinder
45
46
Cylinder_Article_nb_Last_08_21.nb
Show[{p1, p2, pIn, pOut}]
Fig. 37 The inliers (blue) and outliers (red) and the fitted cylinder Table 7 The computed cylinder parameters
Iteration a b c d r 0 -0.6619 -11.2767 -3.5779 146.344 0.1750 1 -0.6611 -11.2791 -3.6086 146.398 0.1753 Table 8 Parameters of the components of the Gaussian mixture
Iteration μ1 μ2 σ1 σ2 η1 η2 N1 N2 0 0.3412 -0.011 0.2759 0.0205 0.4640 0.5360 39 982 51 107 1 0.3435 -0.010 0.2751 0.0208 0.4588 0.5418 39 404 51 685
Conclusions The results our computation illustrate that in case of noisy measurement, the frequently assumed (0, σ) model error distribution is not true. Therefore to estimate the model parameters, iteration is necessary with the proper representation of the non-Gaussian distribution. In this study expectation maximization algorithm was employed. In case of general cylinder geometry, the 5 model parameters can be computed via local maximization of the maximum likelihood function. The implicit equation based on algeraic error definition, and solved easily by Groebner basis, can provide a proper initil guess value for the maximization problem.
Cylinder_Article_nb_Last_08_21.nb
47
References Beder C and Förstner W (2006) Direct Solutions for Computing Cylinders from Minimal Sets of 3D Points, Computer Vision-ECCV 2006, Lecture Notes in Computer Sci. Vol. 3951 pp. 135-146 Carrea D, Jaboyedoff M and Derron M-H (2014) Feasibility Study of Point Cloud Data from Test Deposition Holes for Deformation Analysis, Working Report 2014-01, Universite de Lausanne (UNIL) Fox A, Smith J, Ford R and Doe J (2013) Expectation Maximization for Gaussian Mixture Distributions, Wolfram Demonstration Project. Khameneh M (2013) Tree Detection and Species Identification using LiDAR Data, MSc. Thesis, KTH, Royal Institute of Technology, Stockholm Lichtblau D (2007) Cylinders Through Five Points: Computational Algebra and Geometry. Automated Deduction in Geometry, Lecture Notes in Computer Sci. Vol. 4869, pp 80-97 Lukacs G, Martin R and Marshall D (1998) Faithful Least-Squares Fitting of Spheres, Cylinders, Cones and Tori for Reliable Segmentation (1998) Burkhardt H, Neumann B (eds) Computer Vision -ECCV’98 Vol.I.LNCS 1406, pp.671-686, Springer-Verlag. Petitjean S (2002) A survey of methods for recovering quadrics in triangle meshes, ACM Computing Surveys, Vol. 34, No.2, pp. 211-262. Ruiz O, Arroyave S and Acosta D (2013) Fitting of Analytic Surfaces to Noisy Point Clouds, American J. of Computational Mathematics, 3, pp 18-26 Rose C and Smith D (2000) Symbolic Maximum Likelihood Estimation with Mathematica, The Statistician 49, pp.229 - 240 Stal C, Timothy Nuttens T,Constales D, Schotte K, De Backer H, De Wulf A (2012) Automatic Filtering of Terrestrial Laser Scanner Data from Cylindrical Tunnels, TS09D - Laser Scanners III, 5812, FIG Working Week 2012, Knowing to manage the territory, protect the environment, evaluate the cultural heritage, Rome, Italy, 6-10 May 2012 Su Y-T and Bethel J (2010) Detection and Robust Estimation of Cylinder Features in Point Clouds, ASPRS 2010 Annual Conference, San Diego, California April 26-30. Vosselman G, Gorte B, Sithole G and Rabbani T (2004) Recognizing structure in laser scanner point clouds. In Int. Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. 46. pp. 33-38. Winkelbach S, Westphal R and Goesling T (2003) Pose estimation of cylinder fragments for semi-automatic bone fracture reduction. In Pattern Recognition (DAGM 2003). Michaelis B and Krell G eds. Lecture Notes in Computer Science 2781: pp. 566 - 573, Springer- Verlag 2003