Maximum Likelihood Estimation of Parameters in ...

Maximum Likelihood Estimation of Parameters in Linkage. Comparative Numerical Iterative Methods Rodica SOBOLU1), Dana PUSTA2), Ioana POP1)

Faculty of Horticulture, University of Agricultural Sciences and Veterinary Medicine Cluj-Napoca, 3-5 Manastur Street, 400372, Romania, Tel: +0040264-593792 1)

Faculty of Veterinary Medicine, University of Agricultural Sciences and Veterinary Medicine Cluj-Napoca, 3-5 Manastur Street, 400372, Romania, Tel: +0040264-593792 2)

*)Corresponding author, e-mail: [email protected], [email protected] BulletinUASVM Horticulture 72(2) / 2015 Print ISSN 1843-5254, Electronic ISSN 1843-5394 DOI:10.15835/buasvmcn-hort: 11418

Abstract Taking into account the advantages of Maximum Likelihood Method (most precise estimation), the statistical properties of MLEs (unbiasedness, consistency, efficiency, invariance, asymptotic normality) this paper aim is to present MLE in the context of estimate the recombination fraction r in linkage analysis. Maximum Likelihood Method follows some steps: specifies the likelihood function; takes derivatives of likelihood with respect to the parameters; sets the derivatives equal to zero and finally generates a likelihood equation, that maximized provides the most precise estimation of the recombination fraction. Generally, it is solved by iterative procedures, if no, closed form solution exists for likelihood equation. In this work we discuss comparatively two iterative optimization methods useful in computing MLE of the recombination fraction: Newton-Raphson method and Fisher’s Method of Scoring. We implemented these two methods in Maple application and we illustrated them by an example: the estimation of the recombination fraction in the case of the Morgan (1909) experiment on fruit flies. The Maple code for these two methods connected with the Morgan example is given in the appendix. We can not guarantee which of the two presented methods give us an optimal maximum. Keywords: recombination fraction, MLE estimation, linkage analysis.

Introduction Most traits of biological and agricultural importance are under the control of genes which are in interaction, each with a small effect, and of environmental factors. Linkage analysis is a powerful tool to study the structure of the genome and detect the location of genes responsible for same quantitative trait of interest within the pedigrees. The genes are identified by mapping their positions relative to known marker loci. The degree of linkage between any two markers is expressed in terms of the recombination fraction r. The maximum likelihood estimate (MLE) is a statistical inference procedure that provides precise estimators for some parameters. Fisher

published first the method of maximum likelihood in 1912 in term of “inverse probability” and outlined the term “likelihood” in 1921. The full development of the method appears in articles of 1922 and 1925. MLE is the value of some parameter(s) that maximizes the likelihood of the observed samples, i.e. makes the observed data most likely to occur. We present MLE in the context of optimization methods specialized in statistics and we discuss some numerical methods useful for solving statistical optimization problems. Devore et al. 2012 reveal some properties of maximum likelihood estimators. In this study we take certain known properties of these types of estimators and adapt them in the context of numerical

538

optimization methods for likelihood function. We assume only one unknown parameter θ . 2. Preliminaries

2.1 Linkage and recombination fraction Linkage is the tendency for genes to be inherited together because they are located near one another on the same chromosome. A measure of the linkage is described in terms of the number of recombination fraction r between two loci. The r value quantifies the amount of linkage between two genes, r =

R where R is the number of N

recombinant gametes and N is the number of gametes in a sample of meioses. The recombination fraction has biologically significance in the case when r ∈ [0, 0.5] . The value r=0 shows complete linkage and the value r=0.5 indicates the independence of the two loci. A.H. Sturtevant (1913) first used recombination fraction to order, i.e. map genes along fruit fly chromosomes. 2.2 Likelihood-Estimation The likelihood function is a theoretical concept that underlies many statistical inferences made in genetics. This function is the probability of the observed data, considered as a function of the parameter. Let be y = ( y1 , y2 ,K , yn ) an iid sample and θ a scalar parameter or a parameter vector

θ = (θ1 ,θ 2,...,θ k ) , k ≤ n . The likelihood function

is

n

L(θ | y ) = f ( y θ ) (1),

f ( y θ ) = ∏ f ( yi θ ) i =1

where

is the sample density

function. Assume that L(θ | y ) is twicedifferentiable with respect to θ . The Maximum Likelihood Estimation method (MLE) chooses the estimated parameter θ , as the value that maximizes the log-likelihood function: n

log L(θ | y ) = ∑ log f i ( yi | θ ) (2). i =1

Bulletin UASVM Horticulture 72(2) / 2015

SOBOLU et al

The MLE estimator is the value θˆ such that

log L(θˆ | y ) ≥ log L(θ | y ) for all θ . MLE method

generates an equation, that maximized provides the most precise estimate. In what follows we consider the Fisher’s score function

S (θ ) =

∂ ln L(θ | y ) ∂θ

(3).

If θ is a parameter vector than S (θ ) is the score vector of first partial derivatives, one for each component of θ . The first derivative of the score function multiplied by -1 is called the observed information

I (θ ) = −

∂S (θ ) ∂θ

(4).

The quantity I (θ ) sets the sharpness of the peak in the likelihood function around its maximum. The expected information is defined as

J (θ ) = Var ( S (θ ) = E [I (θ )] (5). Note that for large i.i.d. samples it holds

approximately that θˆ ~ N (θ , J (θ ) −1 ) . The score function has some interesting statistical properties: the expectation of S (θ ) with respect

to y is equal to zero E[ S (θ )] = 0 and the inverse of the information value of a MLE is its variance.

(6).

(7).

(8).

2.3 Numerical Methods for solving equations Numerical analysis combines computer science and mathematics in order to create and implement algorithms for solving equations or

Maximum Likelihood Estimation of Parameters in Linkage. Comparative Numerical Iterative Methods

systems of equations that are not linear. There are many iterative methods which approximate the solution of these types of equations: Newton’s method, Bisection method, Secant method. Newton’s method is widely used for function optimization. It calls an iterative formula for finding a maximum of log L(θ ) and thus for solving the equation S (θ ) = 0 (10) when the direct solutions of this do not exist. Newton’s method (also known as the NewtonRaphson method) was proposed by the most influential scientists of all times, Isaac Newton (1642-1727). Generally, this method facilitates obtaining iteratively better approximations to the roots of the equation of type f ( x) = 0 (11), where

f is a real-valued function. Newton’s method uses

Taylor’s theorem to approximate the equation (11). We present below the Newton method in one variable (quoted by Ram 2010, Micula et. al, 2008): given a function f defined over the set of real numbers and its first derivative f ' , we start

with an initial guess x0 for a root of the function

f . A better approximation x1 is expressed by the recurrence x1 = x0 −

f ( x0 ) f ' ( x0 )

(12),

where ( x1 ,0) is the intersection with the x-axis of

a line tangent to f at the point ( x0 , f ( x0 ) ) . The process is repeated as

xk +1 = xk −

f ( xk ) f ' ( xk )

(13)

until the increment becomes small, so

xk +1 − xk < ε (14). The iterative process is graphically shown in Figure 1. Let the actual root of the equation (11) be a , f (a ) = 0 . A sequence of iterates xk that converges

to

a

has

order

of

convergence

p > 1 if

lim

k →∞

xk +1 − a x k −a

p

539

= c , where 0

Maximum Likelihood Estimation of Parameters in ...

Maximum Likelihood Estimation of Parameters in ...

Suggest Documents

MAXIMUM LIKELIHOOD ESTIMATION The maximum

Maximum Lq-likelihood estimation

Maximum Kernel Likelihood Estimation

maximum likelihood estimation - NYU

maximum likelihood estimation - NYU

Maximum Likelihood Estimation of the Parameters of Exponentiated ...

Maximum Likelihood Estimation of Population

maximum likelihood estimation of the parameters ... - Semantic Scholar

Degeneracy in the maximum likelihood estimation of

Maximum-likelihood image estimation using

Restricted Maximum Likelihood Estimation ...

Maximum-likelihood parameter estimation for

Targeted Maximum Likelihood Estimation for

Maximum Lq-likelihood estimation - arXiv

Maximum Penalized Likelihood Estimation in a

Maximum Likelihood Parameter Estimation in a ...

Maximum Penalized Likelihood Estimation in a

Approximated maximum likelihood estimation in multifractal random ...

Modified Maximum Likelihood Estimation in Poisson

Maximum-Likelihood Sequence Estimation in Dispersive Optical ...

Maximum Likelihood Sequence Estimation in Channels with ...

Maximum Likelihood Multipath Estimation in ... - Semantic Scholar

Using Longitudinal Targeted Maximum Likelihood Estimation in ...

maximum likelihood estimation in multivariate lognormal diffusion