Jan 16, 2008 - resultado = hausdorff(x(indx,:),y(indy,:));. 1 function resultado = dhn(x,y). 2. % Calculates the Hausdorff distance between the convex hull. 3.
Simplex dispersion ordering Toolbox for Matlab: User Reference Guide
∗
Guillermo Ayala, Miguel L´opez January 16, 2008
Contents 1 Simplex dispersion ordering
1
2 Hausdorff distance
2
3 Evaluation of simplex dispersion ordering
3
4 Hypothesis testing
3 Abstract
This document contains the code and a short description of the software used in [2].
1
Simplex dispersion ordering
The class of nonempty compact subsets of Rd , will be denoted as K and Kc the subclass of nonempty compact convex subsets of Rd . For B, C ∈ K we will have the Minkowski addition B + C = {b + c : b ∈ B, c ∈ C} and the product by a scalar, λB = {λb : b ∈ B} with λ ∈ R. The Euclidean norm will be denoted by k·k, and the inner product in Rd will be represented by < .,. > . The Hausdorff distance between B, C ∈ K is defined as dH B, C = max { sup inf kb − ck, sup inf kb − ck }. b∈B c∈C
c∈C b∈B
Then, (K, dH ) is a complete and separable metric space, and (Kc , dH ) is a closed subspace (see [3, 5]). We will denote by co(·) the application convex hull, i.e. the mapping co(·) : K → Kc , with co(B) the convex hull of B ∈ K. It is a continuos mapping when we consider the Hausdorff distance because dH (co (B), co (C)) ≤ dH (B, C) for all B, C ∈ K. If B ∈ K and x ∈ Rd , then d(x, B) = inf b∈B kx − bk is the distance from x to B. Observe that such an infimum is in fact a minimum because B ∈ K, and for B ∈ Kc , such a minimum is attained at a unique point. Let S d−1 be the unit sphere in Rd , that is, S d−1 = {u ∈ Rd : kuk = 1}. We will denote by C(S d−1 ) the class of real-valued continuous functions on S d−1 , and by k · k∞ the associated supremum norm, so, kf k∞ = supu∈S d−1 |f (u)|, with f ∈ C(S d−1 ). Given C ∈ Kc , sC will stand for the support function of the set C, that is, sC : S d−1 → R with sC (u) = supc∈C < c, u > . Note that sC ∈ C(S d−1 ) for all C ∈ Kc . ∗ Universidad
de Valencia and Universidad de Oviedo.
1
The class Kc can be embedded isometrically into the space C(S d−1 ), when we consider on Kc the Hausdorff distance and on C(S d−1 ) the supremum norm (see [1]). The embedding is given by means of the mapping j : Kc −→ C(S d−1 ), with j(C) = sC . It holds that dH B, C = ksB − sC k∞ . (1) Let (Ω, A) be a measurable space, a mapping W : Ω −→ K is said to be a random set if it is measurable with respect to A and the Borel σ-field induced by the topology generated by the Hausdorff distance on K ([5, 7, 6]). If Z1 and Z2 are random sets on the same measurable space, the mapping dH Z1 , Z2 is measurable when we consider on R the usual Borel σ-field ([4]). Throughout the paper, if X is a Rd -valued random vector, X1 , . . . , Xd+1 , X01 , . . . , X0d+1 , will denote independent random vectors distributed as X, and defined on the same probability space. It is well-known that the random simplex SX = co {X1 , . . . , Xd+1 } is a random set. Similarly we will have SX0 = co {X01 , . . . , X0d+1 } . Let us give the main definition. Definition 1 (Simplex dispersion ordering) Let X, Y be Rd -valued random vectors, we will say that X is less dispersive than Y in the simplex dispersion ordering if dH SX , SX0 st dH SY , SY0 . It will be denoted by X sx Y. The notation X ∼sx Y will mean that X sx Y and Y sx X hold simultaneously.
2
Hausdorff distance
The Hausdorff distance has been calculated using the following code of Hany Farid http://www.cs.dartmouth. edu/farid/ 1 2 3 4 5 6 7 8 9 10 11 12 13 14
%%% %%% HAUSDORFF: Compute the Hausdorff distance between two point clusters %%% in an arbitrary dimensional vector space. %%% H(A,B) = max(h(A,B),h(B,A)), where %%% h(A,B) = max(min(d(a,b))), for all a in A, b in B, %%% where d(a,b) is a L2 norm. %%% dist = hausdorff( A, B ) %%% A: the rows of this matrix correspond to points in the first cluster %%% B: the rows of this matrix correspond to points in the second cluster %%% A and B may have different number of rows, but must have the %%% same number of columns (i.e., dimensionality) %%% Hany Farid; Image Science Group; Dartmouth College %%% 10.4.06 %%%
15 16
function [dist] = hausdorff( A, B)
17 18 19 20 21 22
if( size(A,2) 6= size(B,2) ) fprintf( 'WARNING: dimensionality must be the same\n' ); dist = []; return; end
23 24
dist = max( compute dist(A,B), compute dist(B,A) );
25 26 27 28 29
%%% %%% Compute distance %%% function[ dist ] = compute dist( A, B )
2
30
m = size(A,1); n = size(B,1); dim = size(A,2);
31 32 33 34
for k = 1 : m C = ones(n,1) * A(k,:); D = (C−B) .* (C−B); D = sqrt( D * ones(dim,1) ); dist(k) = min(D); end dist = max(dist);
35 36 37 38 39 40 41
3
Evaluation of simplex dispersion ordering
The functions used to evaluate the simplex dispersion ordering are dh2 for the two-dimensional case and dhn for the n-dimensional case. 1 2 3 4
function resultado = dh2(x,y) % Calculates the Hausdorff distance between the convex hull % of the two−dimensional point sets x and y % Each row of x and y corresponds to one point
5 6 7 8
1 2 3 4
indx = convhull(x(:,1),x(:,2)); indy = convhull(y(:,1),y(:,2)); resultado = hausdorff(x(indx,:),y(indy,:));
function resultado = dhn(x,y) % Calculates the Hausdorff distance between the convex hull % of the n−dimensional point sets x and y % Each row of x and y corresponds to one point
5 6 7 8 9 10 11 12 13 14
[dd0 dd1]= size(x); indx = convhulln(x); [indx1 indx2] = size(indx); indxcol = indx(:,1); for i=2:dd1 indxcol = vertcat(indxcol,indx(:,i)); end xt = tabulate(indxcol); x0 = xt(xt(:,2)>0,1);
15 16 17 18 19 20 21 22 23
indy = convhulln(y); [indy1 indy2] = size(indy); indycol = indy(:,1); for i=2:dd1 indycol = vertcat(indycol,indy(:,i)); end yt = tabulate(indycol); y0 = yt(yt(:,2)>0,1);
24 25
resultado = hausdorff(x(x0,:),y(y0,:));
4
Hypothesis testing
The functions used to test the null hypothesis of the same distribution using the simplex dispersion are bsx00 for the bootstrap procedure and bsx1 for the paired procedure.
3
1 2 3
function [pwilcoxon hwilcoxon estwilcoxon hks pks estks] = bsx00(x,y,nsamples) %Bootstrap testing of the simplex dispersion ordering %Input: x and y are the data corresponding to each random vector
4 5 6
dhx = zeros(1,nsamples); dhy = zeros(1,nsamples);
7 8 9 10 11 12 13 14 15 16 17 18 19 20
[rx cx]=size(x); [ry cy]=size(y); ix = 1:rx; iy = 1:ry; nn = cx + 1; for i=1:nsamples ixa1 = randsample(ix,nn); lix = ones(1,rx); lix(ixa1)=0; ixa2 = randsample(ix(logical(lix)),nn); xa1 = x(ixa1,:); xa2 = x(ixa2,:);
21
iya1 = randsample(iy,nn); liy = ones(1,ry); liy(iya1)=0; iya2 = randsample(iy(logical(liy)),nn); ya1 = y(iya1,:); ya2 = y(iya2,:); dhx(i) = dhn(xa1,xa2); dhy(i) = dhn(ya1,ya2);
22 23 24 25 26 27 28 29 30
end
31 32 33 34 35 36
1 2 3
%Wilcoxon test [pwilcoxon,hwilcoxon,estwilcoxon]=ranksum(dhx,dhy); %Kolmogorov−Smirnov test [hks,pks,estks] = kstest2(dhx,dhy);
function [pwilcoxon pks] = bsx1(x,y) %Testing of the simplex dispersion ordering %Input: x and y are the data corresponding to each random vector
4 5 6
[rx cx]=size(x); [ry cy]=size(y);
7 8
nn = cx + 1;
9 10 11 12
%Number of samples for x nsamplesx = floor(rx/nn); nsamplesy = floor(ry/nn);
13 14 15 16 17 18 19 20 21
dhx =[ ]; for i=0:2:(nsamplesx−2) xa1 = x((i*nn+1):((i+1)*nn),:); xa2 = x(((i+1)*nn+1):((i+2)*nn),:); dhx = [dhx dhn(xa1,xa2)]; end
22 23 24 25 26
dhy =[ ]; for i=0:2:(nsamplesy−2) ya1 = y((i*nn+1):((i+1)*nn),:);
4
ya2 = y(((i+1)*nn+1):((i+2)*nn),:); dhy = [dhy dhn(ya1,ya2)];
27 28 29
end
30 31 32 33 34
[pwilcoxon,hwilcoxon,estwilcoxon]=ranksum(dhx,dhy); [hks,pks,estks] = kstest2(dhx,dhy);
References [1] Z. Arstein and R.A. Vitale. A strong law of large numbers for random compact sets. Annals of Probability, 3:879–882, 1975. [2] G. Ayala and M. L´ opez. Simplex dispersion ordering and its application to the evaluation of a human corneal endothelium. Submitted, 2008. [3] G. Debreu. Integration of correspondences. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume II, pages 351–372, Berkeley, 1967. University California Press. [4] F. Hiai and H. Umegaki. Integrals, conditional expectations, and martingales of multivalued functions. Journal of Multivariate Analysis, 7:149–182, 1977. [5] G. Matheron. Random sets and Integral Geometry. Wiley, London, 1975. [6] I. Molchanov. Theory of Random Sets, Probability and its Applications. Springer-Verlag, London, 2005. [7] D. Stoyan. Random sets: models and statistics. International Statistical Review, 66:1–27, 1998.
5