Simplex dispersion ordering Toolbox for Matlab: User ...

3 downloads 0 Views 150KB Size Report
Jan 16, 2008 - resultado = hausdorff(x(indx,:),y(indy,:));. 1 function resultado = dhn(x,y). 2. % Calculates the Hausdorff distance between the convex hull. 3.
Simplex dispersion ordering Toolbox for Matlab: User Reference Guide



Guillermo Ayala, Miguel L´opez January 16, 2008

Contents 1 Simplex dispersion ordering

1

2 Hausdorff distance

2

3 Evaluation of simplex dispersion ordering

3

4 Hypothesis testing

3 Abstract

This document contains the code and a short description of the software used in [2].

1

Simplex dispersion ordering

The class of nonempty compact subsets of Rd , will be denoted as K and Kc the subclass of nonempty compact convex subsets of Rd . For B, C ∈ K we will have the Minkowski addition B + C = {b + c : b ∈ B, c ∈ C} and the product by a scalar, λB = {λb : b ∈ B} with λ ∈ R. The Euclidean norm will be denoted by k·k, and the inner product in Rd will be represented by < .,. > . The Hausdorff distance between B, C ∈ K is defined as  dH B, C = max { sup inf kb − ck, sup inf kb − ck }. b∈B c∈C

c∈C b∈B

Then, (K, dH ) is a complete and separable metric space, and (Kc , dH ) is a closed subspace (see [3, 5]). We will denote by co(·) the application convex hull, i.e. the mapping co(·) : K → Kc , with co(B) the convex hull of B ∈ K. It is a continuos mapping when we consider the Hausdorff distance because dH (co (B), co (C)) ≤ dH (B, C) for all B, C ∈ K. If B ∈ K and x ∈ Rd , then d(x, B) = inf b∈B kx − bk is the distance from x to B. Observe that such an infimum is in fact a minimum because B ∈ K, and for B ∈ Kc , such a minimum is attained at a unique point. Let S d−1 be the unit sphere in Rd , that is, S d−1 = {u ∈ Rd : kuk = 1}. We will denote by C(S d−1 ) the class of real-valued continuous functions on S d−1 , and by k · k∞ the associated supremum norm, so, kf k∞ = supu∈S d−1 |f (u)|, with f ∈ C(S d−1 ). Given C ∈ Kc , sC will stand for the support function of the set C, that is, sC : S d−1 → R with sC (u) = supc∈C < c, u > . Note that sC ∈ C(S d−1 ) for all C ∈ Kc . ∗ Universidad

de Valencia and Universidad de Oviedo.

1

The class Kc can be embedded isometrically into the space C(S d−1 ), when we consider on Kc the Hausdorff distance and on C(S d−1 ) the supremum norm (see [1]). The embedding is given by means of the mapping j : Kc −→ C(S d−1 ), with j(C) = sC . It holds that  dH B, C = ksB − sC k∞ . (1) Let (Ω, A) be a measurable space, a mapping W : Ω −→ K is said to be a random set if it is measurable with respect to A and the Borel σ-field induced by the topology generated by the Hausdorff distance on K ([5, 7, 6]).  If Z1 and Z2 are random sets on the same measurable space, the mapping dH Z1 , Z2 is measurable when we consider on R the usual Borel σ-field ([4]). Throughout the paper, if X is a Rd -valued random vector, X1 , . . . , Xd+1 , X01 , . . . , X0d+1 , will denote independent random vectors distributed as X, and defined on the same probability space. It is well-known that the random simplex SX = co {X1 , . . . , Xd+1 } is a random set. Similarly we will  have SX0 = co {X01 , . . . , X0d+1 } . Let us give the main definition. Definition 1 (Simplex dispersion ordering) Let X, Y be Rd -valued random vectors, we will say that X is less dispersive than Y in the simplex dispersion ordering if   dH SX , SX0 st dH SY , SY0 . It will be denoted by X sx Y. The notation X ∼sx Y will mean that X sx Y and Y sx X hold simultaneously.

2

Hausdorff distance

The Hausdorff distance has been calculated using the following code of Hany Farid http://www.cs.dartmouth. edu/farid/ 1 2 3 4 5 6 7 8 9 10 11 12 13 14

%%% %%% HAUSDORFF: Compute the Hausdorff distance between two point clusters %%% in an arbitrary dimensional vector space. %%% H(A,B) = max(h(A,B),h(B,A)), where %%% h(A,B) = max(min(d(a,b))), for all a in A, b in B, %%% where d(a,b) is a L2 norm. %%% dist = hausdorff( A, B ) %%% A: the rows of this matrix correspond to points in the first cluster %%% B: the rows of this matrix correspond to points in the second cluster %%% A and B may have different number of rows, but must have the %%% same number of columns (i.e., dimensionality) %%% Hany Farid; Image Science Group; Dartmouth College %%% 10.4.06 %%%

15 16

function [dist] = hausdorff( A, B)

17 18 19 20 21 22

if( size(A,2) 6= size(B,2) ) fprintf( 'WARNING: dimensionality must be the same\n' ); dist = []; return; end

23 24

dist = max( compute dist(A,B), compute dist(B,A) );

25 26 27 28 29

%%% %%% Compute distance %%% function[ dist ] = compute dist( A, B )

2

30

m = size(A,1); n = size(B,1); dim = size(A,2);

31 32 33 34

for k = 1 : m C = ones(n,1) * A(k,:); D = (C−B) .* (C−B); D = sqrt( D * ones(dim,1) ); dist(k) = min(D); end dist = max(dist);

35 36 37 38 39 40 41

3

Evaluation of simplex dispersion ordering

The functions used to evaluate the simplex dispersion ordering are dh2 for the two-dimensional case and dhn for the n-dimensional case. 1 2 3 4

function resultado = dh2(x,y) % Calculates the Hausdorff distance between the convex hull % of the two−dimensional point sets x and y % Each row of x and y corresponds to one point

5 6 7 8

1 2 3 4

indx = convhull(x(:,1),x(:,2)); indy = convhull(y(:,1),y(:,2)); resultado = hausdorff(x(indx,:),y(indy,:));

function resultado = dhn(x,y) % Calculates the Hausdorff distance between the convex hull % of the n−dimensional point sets x and y % Each row of x and y corresponds to one point

5 6 7 8 9 10 11 12 13 14

[dd0 dd1]= size(x); indx = convhulln(x); [indx1 indx2] = size(indx); indxcol = indx(:,1); for i=2:dd1 indxcol = vertcat(indxcol,indx(:,i)); end xt = tabulate(indxcol); x0 = xt(xt(:,2)>0,1);

15 16 17 18 19 20 21 22 23

indy = convhulln(y); [indy1 indy2] = size(indy); indycol = indy(:,1); for i=2:dd1 indycol = vertcat(indycol,indy(:,i)); end yt = tabulate(indycol); y0 = yt(yt(:,2)>0,1);

24 25

resultado = hausdorff(x(x0,:),y(y0,:));

4

Hypothesis testing

The functions used to test the null hypothesis of the same distribution using the simplex dispersion are bsx00 for the bootstrap procedure and bsx1 for the paired procedure.

3

1 2 3

function [pwilcoxon hwilcoxon estwilcoxon hks pks estks] = bsx00(x,y,nsamples) %Bootstrap testing of the simplex dispersion ordering %Input: x and y are the data corresponding to each random vector

4 5 6

dhx = zeros(1,nsamples); dhy = zeros(1,nsamples);

7 8 9 10 11 12 13 14 15 16 17 18 19 20

[rx cx]=size(x); [ry cy]=size(y); ix = 1:rx; iy = 1:ry; nn = cx + 1; for i=1:nsamples ixa1 = randsample(ix,nn); lix = ones(1,rx); lix(ixa1)=0; ixa2 = randsample(ix(logical(lix)),nn); xa1 = x(ixa1,:); xa2 = x(ixa2,:);

21

iya1 = randsample(iy,nn); liy = ones(1,ry); liy(iya1)=0; iya2 = randsample(iy(logical(liy)),nn); ya1 = y(iya1,:); ya2 = y(iya2,:); dhx(i) = dhn(xa1,xa2); dhy(i) = dhn(ya1,ya2);

22 23 24 25 26 27 28 29 30

end

31 32 33 34 35 36

1 2 3

%Wilcoxon test [pwilcoxon,hwilcoxon,estwilcoxon]=ranksum(dhx,dhy); %Kolmogorov−Smirnov test [hks,pks,estks] = kstest2(dhx,dhy);

function [pwilcoxon pks] = bsx1(x,y) %Testing of the simplex dispersion ordering %Input: x and y are the data corresponding to each random vector

4 5 6

[rx cx]=size(x); [ry cy]=size(y);

7 8

nn = cx + 1;

9 10 11 12

%Number of samples for x nsamplesx = floor(rx/nn); nsamplesy = floor(ry/nn);

13 14 15 16 17 18 19 20 21

dhx =[ ]; for i=0:2:(nsamplesx−2) xa1 = x((i*nn+1):((i+1)*nn),:); xa2 = x(((i+1)*nn+1):((i+2)*nn),:); dhx = [dhx dhn(xa1,xa2)]; end

22 23 24 25 26

dhy =[ ]; for i=0:2:(nsamplesy−2) ya1 = y((i*nn+1):((i+1)*nn),:);

4

ya2 = y(((i+1)*nn+1):((i+2)*nn),:); dhy = [dhy dhn(ya1,ya2)];

27 28 29

end

30 31 32 33 34

[pwilcoxon,hwilcoxon,estwilcoxon]=ranksum(dhx,dhy); [hks,pks,estks] = kstest2(dhx,dhy);

References [1] Z. Arstein and R.A. Vitale. A strong law of large numbers for random compact sets. Annals of Probability, 3:879–882, 1975. [2] G. Ayala and M. L´ opez. Simplex dispersion ordering and its application to the evaluation of a human corneal endothelium. Submitted, 2008. [3] G. Debreu. Integration of correspondences. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume II, pages 351–372, Berkeley, 1967. University California Press. [4] F. Hiai and H. Umegaki. Integrals, conditional expectations, and martingales of multivalued functions. Journal of Multivariate Analysis, 7:149–182, 1977. [5] G. Matheron. Random sets and Integral Geometry. Wiley, London, 1975. [6] I. Molchanov. Theory of Random Sets, Probability and its Applications. Springer-Verlag, London, 2005. [7] D. Stoyan. Random sets: models and statistics. International Statistical Review, 66:1–27, 1998.

5