An algorithm and tool for computing exact conditional probabilities of ...

Psychology Science, Volume 47, 2005 (3/4), p. 391-400

An algorithm and tool for computing exact conditional probabilities of configuration frequencies MANFRED BEIER1 Abstract Traditionally the exact conditional probability of a configuration frequency is calculated with methods based on Fisher's well known formula for two by two contingency tables or its extensions for tables of higher dimensions. I present here a different, combinatorial approach that shows a much better scaling behavior for an increasing number of variables, and is in principle independent of the number of categories. Key words: Exact conditional probability, multidimensional contingency table, configural frequency analysis (CFA)

1

Dipl.-Ing. Manfred Beier, Institut für Humangenetik und Anthropologie, Heinrich-Heine-Universität Düsseldorf, Universitätsstr. 1, D-40225 Düsseldorf, Germany; E-mail: [email protected]

M. Beier

392

1. Algorithm Please note that in the following the term “matrix” does not refer to a contingency table but to the raw data matrix a contingency table can be constructed from. Let A = (aij) be a matrix with m rows (observations, sample size) and n columns (variables, dimension of a contingency table). What is the probability for finding k or more identical row vectors W = (wj) (configurations, patterns) in the matrix if the components (attributes) of the column vectors may be ordered at random? Let F = (fj) be a vector holding the frequencies (corresponding marginal sums of a contingency table) of wj for each column j = 1,..,n. For the first component of the first row, a11, the probability for matching w1 is: (1)

P ( a11 = w1; m, f1 ) =

f1 f , for a12: P ( a12 = w2 ; m, f 2 ) = 2 etc. m m

Assuming independence of the columns the probability for all cells of the first row being equal to W = (wj) is: (2)

(

n

fj

) ∏m .

P ( a11 ,..., a1n ) = W ; m, F =

j=1

The probability for at least having the first k rows filled with W is:

(3)

(

n

fj

f j −1

f j − k +1

n

) ∏ m ⋅ m − 1 ⋅ … ⋅ m − k +1 = ∏

P ( a11 ,..., a1n ) = … = ( ak1 ,..., akn ) = W ; m, F =

j=1

j=1

 fj     k =P . 0 m   k 2

The maximum possible number of W is limited by fmin, i.e. the smallest component of F. Consequently, for k = fmin the formula above gives us the probability for exactly the first k rows containing W with no further occurrence in the remaining matrix. Therefore, the probability for a matrix Ak with k vectors W occurring in arbitrary rows can be computed by summing up the probabilities of all possible ways to choose k rows out of m: (4)

(5)

2

m P ( Ak ; k = f min , m, F ) =   ⋅ P0 . k In the case of n = 2 with f1 = fmin = k:  m  ⋅   k

2

∏ j=1

 fj   f min    k   m   f    =  min  ⋅   ⋅  m   f min   m        k  f min  

The minimum possible number is given by max {0, Σfj - m(n-1)}.

f2     f min   = m     f min  

f2   f min  , m   f min 

An Algorithm and Tool for Computing Exact Conditional Probabilities of Configuration Frequencies

393

this is equivalent to the p-value given by the hypergeometric distribution of a 2 by 2 contingency table, here shown with Fisher's formula for R1 = f1 = fmin = k and C1 = f2: O11 = k = fmin O21 = f2-fmin C1 = f2

(6)

O12 = 0 O22 = C2 C2

R1 = f1 = fmin R2 = m-fmin m

 f2 ! f min !( m − f min )! f 2 !C2 ! R1 !R2 !C1 !C2 !  f 2 − f min )!  ( m! m ! = = = m!  O11 !O12 !O21 !O22 ! f min !0!( f 2 − f min )!C2 !  m f ! − ( min ) 

f2   f min  . m   f min 

For k < fmin the probability for exactly k row vectors matching W corresponds to the probability of getting at least k rows and no further row: (7)

))

( (

m P ( Ak ; k < f min , m, F ) =   ⋅ P0 ⋅ 1 − P { A1 ,..., A fmin − k }; m − k , ( f1 − k ,..., f n − k ) . k

This leads to the following recursive formula for getting k or more row vectors W: n

(8)

(

)

P { Ak ,..., A fmin }; m, F =

(

 fj 

∏  k  j=1

)

 m   k

n −1

( (

⋅ 1 − P { A1 ,..., A fmin −k }; m − k , ( f1 − k ,..., f n − k )

))

+ P { Ak +1 ,..., A fmin }; m, F . At first sight this formula seems to be computationally infeasible. The number of pvalues that have to be computed grows exponentially with the distance between k and fmin. But by storing and reusing partial results for 1-P({A1,...,Afmin-i}; m-i, F-i) for all i = k,...,fmin-1, the number of interim values is reduced to Σ1≤i≤fmin-k+1 i, i.e. a quadratic growth rate. An algorithm is given below in the form of an implementation in the programming language R (www.R-project.org). In addition to the necessary “data cache” just mentioned (p1array), a second array (p0) allows the first part of the formula to be calculated using the recurrence relation n

(9)

n

 fj 

∏  k  j=1

m   k

n −1

=

n

 fj 

∏  k − 1 ∏ ( f j=1

 m     k − 1

n −1

⋅

j

)

− k +1

j=1

k ( m − k +1)

n −1

.

M. Beier

394

exact.p

An algorithm and tool for computing exact conditional probabilities of ...

An algorithm and tool for computing exact conditional probabilities of ...

Suggest Documents

An Algorithm for Computing the Exact Distribution of

An Incremental Algorithm for Computing

Supplementary Material: An Exact Algorithm for ...

An exact conserving algorithm for nonlinear dynamics

An Exact Algorithm for Travelling Salesman Problem

An Exact Algorithm for the Petrol Station

Anthropics and Myopics: Conditional Probabilities and the

ASYMPTOTIC CONDITIONAL PROBABILITIES - Cornell Computer ...

ASYMPTOTIC CONDITIONAL PROBABILITIES - Cornell Computer ...

An Algorithm And Code For Computing Exact Critical Values For The ...

A Algorithm XXX: An algorithm and software for computing ...

An algorithm for development of transition probabilities matrices to

AN ALGORITHM TO COMPUTE BLOCKING PROBABILITIES IN ...

Q-Conditional Symmetries and Exact Solutions of

MULTIPLE SOLUTIONS OF AN EXACT ALGORITHM ...

An Exact and Efficient Algorithm for the Constrained Dynamic Operator ...

An Exact and Efficient Algorithm for the Constrained Dynamic Operator ...

An Exact and Efficient Algorithm for the Orthogonal Art ... - CiteSeerX

EXACT: Algorithm and hardware architecture for an ... - CiteSeerX

An exact algorithm for vehicle routing and scheduling ... - Springer Link

An Exact Branch-and-Price Algorithm for Scheduling Rescue Units ...

Exact conditional tests for cross-classifications ... - Statistics

An Effective Algorithm for Automatic Detection and Exact Demarcation ...

ZSWEEP: An Efficient and Exact Projection Algorithm for ... - CiteSeerX