Luca Lista. Statistical methods in LHC data analysis. 3. Hypothesis testing. ⢠The problem from the point of view of a
Setting limits for the Higgs boson search (1/2) Luca Lista INFN Napoli
Contents • • • • • •
Hypothesis testing Upper limit concept Treatment of background Bayesian limits Unified Feldman-Cousins approach Modified frequentist approach (CLs method) • Profile likelihood • Nuisance parameters and Cousins-Highland approach Luca Lista
IN2P3 School of Statistics 2012
2
Hypothesis testing • The problem from the point of view of a physicist: – A data sample is characterized by n variables, (x1, …, xn), with different distributions for two cases possible process: signal, and background – Given a measurement (= event) of the n variables having discriminating power, identify (discriminate) the event as coming from signal or background •
Clearly, the identification sometimes gives the correct answer, sometimes gives the wrong answer
• Property of discriminator: – Selection efficiency: probability to correctly identify signal events – Misidentification probability: probability to misidentify signal as background – Purity: fraction of signal in a positively identified sample • Depends on the signal and background composition! It is not a property of the discriminator only
– Fake rate: fraction of background in a positively identified sample, = 1 - Purity Luca Lista
Statistical methods in LHC data analysis
3
Terminology for statisticians • Statisticians’ terminology is usually less natural for physics applications than previous slide, but is intended for a more general applicability • H0 = null hypothesis
– E.g.: a sample contains only background; a particle is a pion; etc.
• H1 = alternative hypothesis
– E.g.: a sample contains background + signal; or a particle is a muon; etc.
• α = significance level: probability to reject H1 if true (error of first kind), i.e. assuming H1 – α = 1 – selection efficiency
• β = probability to reject H0 if true (error of second kind), i.e. assuming H0 – β = misidentification probability
Luca Lista
Statistical methods in LHC data analysis
4
Cut analysis • Cut on one (or more…) variable(s): – If x ≤ xcut – Else, if x > xcut
⇒ ⇒
signal background
Efficiency (1−α) Mis-id probability(β)
xcut Luca Lista
x
Statistical methods in LHC data analysis
5
Efficiency vs mis-id • Varying the cut both the efficiency and mis-id change Efficiency
1
0
Luca Lista
0
Mis-id
Statistical methods in LHC data analysis
1
6
Neyman-Pearson lemma • Fixing the signal efficiency (1− α), a selection based on the likelihood ratio gives the lowest possible mis-id probability (β):
λ(x) = L(x|H1) / L(x|H0) > kα • If we can’t use the likelihood ratio, we can choose other discriminators, or “test statistics”: • A test statistic is any function of x (like λ(x)) that allows to discriminate the two hypotheses • Neural networks, boosted decision trees are example of discriminators that may closely approximate Neyman-Pearson limit Luca Lista
Statistical methods in LHC data analysis
7
Claiming a discovery • If we measure a new signal sufficiently inconsistent with the background-only hypothesis, we can claim a discovery of a new physics phenomena • The magnitude of the new effect is measured in terms of significance • statistical significance = probability to observe an “incompatibility” equal or greater than the observed one in the hypothesis of pure background fluctuation • The definition of “incompatibility” will be better detailed later – Requires the choice of a test statistics – May involve for instance counting signal events over background, discriminating variables, like: reconstructed particle mass, peaking against a (more or less) flat background, output of a (multivariate?) discriminator, etc. Luca Lista
IN2P3 School of Statistics 2012
8
Significance • Statistical significance = probability p (p-value) to observe an “incompatibility” ≥ obs. in the background-only hypothesis • Often preferred to quote “nσ” significance, where:
or:
in ROOT: Z = n = −TMath::NormQuantile(p)
• Usually, in literature: – If the significance is > 3 (“3σ”) one claims “evidence of” – If the significance is > 5 (“5σ”) one claims “observation” (discovery!) • probability of background fluctuation p < 2.87×10−7 Luca Lista
IN2P3 School of Statistics 2012
9
Discovery and scientific method • From Cowan et al., EPJC 71 (2011) 1554:
“
It should be emphasized that in an actual scientific context, rejecting the background-only hypothesis in a statistical sense is only part of discovering a new phenomenon. One’s degree of belief that a new process is present will depend in general on other factors as well, such as the plausibility of the new signal hypothesis and the degree to which it can describe the data. Here, however, we only consider the task of determining the p-value of the background-only hypothesis; if it is found below a specified threshold, we regard this as “discovery”.
Complementary role of Frequentist and Bayesian approaches J Luca Lista
IN2P3 School of Statistics 2012
10
Excluding a signal hypothesis • For purposes of excluding a signal hypothesis, usually a milder request is applied on the p-value: – p < 0.05 (i.e.: 95% confidence level), corresponding to Z = 1.64 – p < 0.10 (i.e.: 90% confidence level), corresponding to Z = 1.28
• p in this case is the probability of a signal underfluctuation – Null hypothesis is inverted w.r.t. discovery
• Discovering a new signal usually requires more stringent evidence than excluding it! Luca Lista
IN2P3 School of Statistics 2012
11
Upper limits • Experiments not always lead to discoveries L • What conclusion can be derive if the significance of the observed signal is not sufficiently high? • One possible definition of upper limit: – “largest value of the signal s for which the probability of a signal
under-fluctuation equal to what has been observed or less is more than a given level α (usually 10% or 5%)” – Upper limit @ CL= max. s such that p ≥ α = 1 - CL – Similar to confidence interval with a central value, but the interval is fully asymmetric in the upper limit case
• Other approaches are possible: – Bayesian limits: extreme of an interval over which the poster probability of [0, s] is 1 − α • It’s a different definition!
– Unified Feldman-Cousins frequentist limits – CLs (modified frequentist approach) Luca Lista
IN2P3 School of Statistics 2012
12
Frequentist vs Bayesian intervals • The two main approaches address different questions • Frequentist confidence interval: – Probability that a fixed µ∈ [µ1, µ2] = 1 − α – The interval [µ1, µ2] is a random variable interval determined from the experiment response – Choosing the interval requires an ordering principle (fully asymmetric, central, unified)
• Bayesian confidence interval: – The a posterior probability (degree of belief) of µ in the interval [µ1, µ2] is equal to 1 − α Luca Lista
IN2P3 School of Statistics 2012
13
Event counting • Simplest and frequently used example • Assume we have a signal process on top of a background process. Both signal and background counts follow a Poissonian distribution:
• If we can’t discriminate signal from background events, we measure the total number of selected events: – n = ns + nb • The distribution of n is again Poissonian! Luca Lista
IN2P3 School of Statistics 2012
14
Demonstration: Poisson ⊗ Binomial ⊗
Sorry for being pedantic: we will revise this derivation later on…! Luca Lista
Statistical Methods for Data Analysis
15
Simplest case: zero events observed • If we observe zero events we can state that: – No background events have been observed (nb = 0) – No signal events have been observed (ns = 0)
• Further simplification: let’s assume that the expected background b is negligible: b ≅ 0 • The probability to observe n = ns events expecting s events, is given by Poisson distribution: p = Ps(n = ns) = e−s sn / n! – For n = ns = 0 we have: p = Ps (n = ns = 0) = Ps(0) = e−s
• We can set an upper limit on the expected signal yield s excluding values of s for which p = e−s < α, with α = 5% or 10% – p = e−s ≥ α = 1 − CL
• So: s ≤ −ln(α) = sup. For α = 5% or 10%: – s ≤ 2.9957 @ 95% C.L. – s ≤ 2.3026 @ 90% C.L.
Luca Lista
IN2P3 School of Statistics 2012
16
Bayesian inference of a Poissonian • Posterior probability, assuming the prior to be π(s):
• If is π(s) is uniform, the denom. is:
• We have: • Most probable value: Luca Lista
,
IN2P3 School of Statistics 2012
… but this is somewhat arbitrary, since it is metric-dependent!
17
Bayesian interpretation • The posterior PDF for s, assuming a uniform prior, is:
• The cumulative is:
• In particular: • Which gives by chance identical result of the previous example:
f(s|0) α=5%
• But the interpretation is very different! Luca Lista
IN2P3 School of Statistics 2012
0
1
2
s
3
18
Upper limit with event counting From PDG in case of no background
“
It happens that the upper limit from [central Neyman interval] coincides numerically with the Bayesian upper limit for a Poisson parameter, using a uniform prior p.d.f. for ν.
More details on Neyman limits in next slides… Luca Lista
” IN2P3 School of Statistics 2012
19
Upper limits with background • Let’s start with the Bayesian approach which has an easier treatment • A uniform prior, π(s) = 1, from 0 to ∞ simplifies the computation:
• Where, for a fixed b:
• The limit sup can be obtained inverting the equation: • The special case with b = 0, n = 0 gives the previous result Luca Lista
IN2P3 School of Statistics 2012
20
Upper limits with background (cont.) • Graphical view (due to O. Helene, 1983) O. Helene. Nucl. Instr. and Meth. A 212 (1983), p. 319
sup
sup n
n
b
b
Remember, it’s under the Bayesian approach Luca Lista
IN2P3 School of Statistics 2012
21
Poissonian background uncertainty • •
Some analytical derivation reach high level of complexity Assume we estimate the background from sidebands applying scaling factors:
– λ = s = n – b = n - α nsb + β ncb
cb
sb
sb
cb sb
– sb = “side band”, cb = “corner band”
•
Upper limit on s with a CL = δ can be as difficult as:
cb
sb
cb
Physical integration bound: K.K. Gan, et al., Nucl. Instr. and Meth. A 412 (1998) 475
•
Numerical treatment is required in many cases! Luca Lista
IN2P3 School of Statistics 2012
22
Problems of Bayesian limits • Bayesian inference, as well as Bayesian limits, require the choice of a prior distribution • This makes estimates somewhat subjective (some Bayesian supporter use the term “inter-subjective”) • Choices frequently adopted in physics are not unique: – Uniform PDF as a function of the signal strength? – Uniform PDF as a function of the Higgs boson mass?
• In some cases results do not depend strongly on the assumed prior – But this usually happens when the statistical sample is sufficiently large, which is not the case often for upper limits
Luca Lista
IN2P3 School of Statistics 2012
23
Choosing the prior PDF •
•
If the prior PDF is uniform in a choice of variable (“metrics”), it won’t be uniform when applying coordinate transformation Given a prior PDF in a random variable, there is always a transformation that makes the PDF uniform The problem is: chose one metric where the PDF is uniform Harold Jeffreys’ prior: chose the prior form that is inviariant under parameter transformation metric related to the Fisher information (metrics invariant!)
•
Some common cases:
• • •
– – – – –
•
Poissonian mean: Poissonian mean with background b: Gaussian mean: Gaussian r.m.s: Binomial parameter:
Problematic with more than one dimension!
Luca Lista
IN2P3 School of Statistics 2012
Demonstration on Wikipedia: see: Jeffreys prior
24
Zech’s “frequentist” interpretation • • •
Proposed attempt to derive Helene’s formula under a frequentist approach Restrict the probability to the observed condition that the number of background events does not exceed the number of observed events “In an experiment where b background events are expected and n events are found, P(nb;b) no longer corresponds to our improved knowledge of the background distributions. Since nb can only take the numbers nb ≤ n, it has to be renormalized to the new range of nb:
G. Zech, Nucl. Instr. and Meth. A 277 (1989) 608-610
”
•
Leads to a result identical to the Bayesian approach!
•
Zech’s frequentist derivation attempted was criticized by Highlands: does not ensure the proper coverage
•
Often used in a “pragmatic” way, and recommended for some time by the PDG
Luca Lista
IN2P3 School of Statistics 2012
25
Zech’s derivation references •
Bayesian solution found first proposed by O.Helene – O. Helene. Nucl. Instr. and Meth. A 212 (1983), p. 319 Upper limit of peak area (Bayesian)
•
Attempt to derive the same conclusion with a frequentist approach – G. Zech, Nucl. Instr. and Meth. A 277 (1989) 608-610 Upper limits in experiments with background or measurement errors
•
Frequentist validity criticized by Highland – V.L. Highland Nucl. Instr. and Meth. A 398 (1989) 429-430 Comment on “Upper limits in experiments with background or measurement errors” [Nucl. Instr. and Meth. A 277 (1989) 608–610]
•
Zech agreement that his derivation is not rigorously frequentist – G. Zech, Nucl. Instr. and Meth. A 398 (1989) 431-433 Reply to ‘Comment on “Upper limits in experiments with background or measurement errors” [Nucl. Instr. and Meth. A 277 (1989) 608–610]’
•
Cousins overview and summary of the controversy – Workshop on Confidence Limits, 27-28 March, 2000, Fermilab
Luca Lista
IN2P3 School of Statistics 2012
26
Jerzy Neyman’s confidence intervals • • •
Scan an unknown parameter θ over its range Given θ, compute the interval [x1, x2] that contain x with a probability CL = 1−α Ordering rule is needed! – Central interval? Asymmetric? Other?
•
•
•
Invert the confidence belt, and find the interval [θ1, θ2] for a given experimental outcome of x A fraction 1−α of the experiments will produce x such that the corresponding interval [θ1, θ2] contains the true value of µ (coverage probability) Note that the random variables are [θ1, θ2], not θ
From PDG statistics review RooStats::NeymanConstruction
Luca Lista
IN2P3 School of Statistics 2012
27
Ordering rule • For a fixed θ = θ0 we can have different possible choices of intervals giving the same probability 1−α are possible f(x|θ0)
f(x|θ0)
α/2 1−α
α 1−α Upper limit choice
Luca Lista
x
Central interval
IN2P3 School of Statistics 2012
α/2 x
28
Feldman-Cousins ordering • Find the contour of the likelihood ratio that gives an area α • Rµ = {x : L(x|θ) / L(x|θbest) > kα} • Motivation discussed in next slides f(x|θ0)
f(x|θ0)/f(x| θbest(x))
1-α RooStats::FeldmanCousins Luca Lista
x Gary J. Feldman, Robert D. Cousins, Phys.Rev.D57:3873-3889,1998
IN2P3 School of Statistics 2012
29
• •
Upper limit construction from inversion of Neyman belt with asymmetric intervals Building a confidence interval on the observable
,θ=s
Frequentist upper limits
– The observable x is the number of events n for counting experiments – Final confidence interval must be asymmetric if we want to compute upper limits:
0 ≤ s ≤ sup
nmin ≤ n ≤ ∞
• s ∈ [s1, s2] ⇒ s ∈ [0, sup] – Upper limit = right-most edge of asymmetric interval – Hence, we should have an asymmetric interval on n:
• n ∈ [n1, n2] ⇒ n ∈ [nmin, ∞] Poissonian distributions involve discrete values, can’t exactly satisfy coverage: produce the smallest overcoverage – Use the smallest interval that has at least the desired C.L.:
P(s ∈ [0, sup]) ≥ CL = 1 - α ⇔ P(n ∈ [nmin, ∞]) = 1 − p ≥ CL = 1 - α Luca Lista
IN2P3 School of Statistics 2012
P(n;s)
•
,x=n
s = 4, b = 0 1 − p ≥ 1−α
p≤α n 30
A concrete Poissonian example • Poissonian counting, b = 0 • Compute the p.d.f. varying s – asymmetric for a Poissonian
P(n;s)
s = 4, b = 0 1−p = 90.84%
• Determine the probability 1−p corresponding to the p = 9.16% interval [nobs, ∞] • The limit sup is the maximum s such that 1 − p is greater or equal to CL Lower limit choice (p < α ⇒ excluded; p ≥ α ⇒ allowed):
n
• In case of nobs = 0 the simple formula holds: What we did intuitively reflects Neyman’s construction. By chance identical to Bayesian result Luca Lista
IN2P3 School of Statistics 2012
31
“Flip-flopping” • When to quote a central value or upper limit? • E.g.: – “Quote a 90% C.L. upper limit of the measurement is below 3σ; quote a central value otherwise” – Upper limit ↔ central interval decided according to observed data – This produces incorrect coverage!
• Feldman-Cousins interval ordering guarantees the correct coverage
Luca Lista
IN2P3 School of Statistics 2012
32
“Flip-flopping” with Gaussian PDF • Assume Gaussian with a fixed width: σ=1 µ = x ±1.64485
µ
90%
µ < x + 1.28155
10%
5%
5%
5% x
90%
Central interval
10%
Coverage is 85% for low µ!
x
Upper limit
3
x
Gary J. Feldman, Robert D. Cousins, Phys.Rev.D57:3873-3889,1998
Luca Lista
IN2P3 School of Statistics 2012
33
Feldman-Cousins approach • Define range such that: – P(x|µ) / P(x|µbest(x)) > kα µ
µbest = max(x, 0)
Usual errors
µbest = x for x ≥ 0
Asymmetric errors
Upper limits
Numerical solution can be found
x Gary J. Feldman, Robert D. Cousins, Phys.Rev.D57:3873-3889,1998
Luca Lista
IN2P3 School of Statistics 2012
34
Binomial Confidence Interval • Using the proper Neyman belt inversion, e.g. Feldman-Cousins method (but also Clopper-Pearson), avoids odd problems, like null errors when estimating efficiencies equal to 0 or 1, that would occur using the central limit formula:
• Smooth transition from quasi-central intervals to upper/lower limits!
Luca Lista
IN2P3 School of Statistics 2012
35
Feldman-Cousins: Poissonian case Purely frequentist ordering based on likelihood ratio
b = 3, 90% C.L.
Belt depends on b, of course
G.Feldman, R.Cousins, Phys.Rev.D,57(1998), 3873
Luca Lista
IN2P3 School of Statistics 2012
36
Upper limits with Feldman-Cousins 90% C.L. Note that the curve for n = 0 decreases with b, while the result of the Bayesian calculation is independent on b, at 2.3 F&C reply: frequentist interval do not express P(µ|x) !
G.Feldman, R.Cousins, Phys.Rev.D,57(1998), 3873
Luca Lista
IN2P3 School of Statistics 2012
37
A Close-up Note the ‘ripple’ Structure due to the discrete nature of Poissonian statistics
C. Giunti, Phys.Rev.D,59(1999), 053001
Luca Lista
IN2P3 School of Statistics 2012
38
Limits in case of no background From PDG “Unified” (i.e.: FeldmanCousins) limits for Poissonian counting in case of no background are larger than Bayesian limits
Luca Lista
IN2P3 School of Statistics 2012
39
Pros and cons of F&C approach • Pros: – – – –
Avoids problems with physical boundaries on parameters Never returns an empty confidence interval Does not incur flip-flop problems Ensure proper statistical coverage
• Cons: – Constructing the confidence intervals is complicated, requires numerical algorithms, and very often CPU-intensive toy Monte Carlo generation – Systematic uncertainties are not easily to incorporate – Peculiar features with small number of events – In case of zero observed events, gives better limits for experiments that expect higher background
Luca Lista
IN2P3 School of Statistics 2012
40
From PDG Review…
“
The intervals constructed according to the unified procedure for a Poisson variable n consisting of signal and background have the property that for n = 0 observed events, the upper limit decreases for increasing expected background. This is counter-intuitive, since it is known that if n = 0 for the experiment in question, then no background was observed, and therefore one may argue that the expected background should not be relevant. The extent to which one should regard this feature as a drawback is a subject of some controversy
Luca Lista
IN2P3 School of Statistics 2012
41
Problems of frequentist methods • The presence of background may introduce problems in interpreting the meaning of upper limits • A statistical under-fluctuation of the background may lead to the exclusion of a signal of zero at 95% C.L. – Unphysical estimated “negative” signal?
• “tends to say more about the probability of observing a similar or stronger exclusion in future experiments with the same expected signal and background than about the non-existence of the signal itself” [*] • What we should derive, is just that there is not sufficient information to discriminate the b and s+b hypotheses • When adding channels that have low signal sensitivity may produce upper limits that are severely worse than without adding those channels [*]
A. L. Read, Modified frequentist analysis of search results (the CLs method), 1st Workshop on Confidence Limits, CERN, 2000
Luca Lista
IN2P3 School of Statistics 2012
42
Modified frequentist method: CLs • •
Method developed for Higgs limit at LEP-II Using the likelihood ratio as test statistics:
•
Confidence levels estimator (àdifferent from Feldman-Cousins):
– Gives over-coverage w.r.t. classical limit (CLs > CLs+b: conservative) – Similarities with Bayesian C.L.
•
Identical to Bayesian limit for Poissonian counting!
•
“approximation to the confidence in the signal hypothesis, one might have obtained if the experiment had been performed in the complete absence of background”
•
No problem when adding channels with low discrimination
Luca Lista
IN2P3 School of Statistics 2012
43
CLs with toy experiments • The actual CLb and CLs+b are computed in practice from toy Monte Carlo experiments
Luca Lista
IN2P3 School of Statistics 2012
44
Main CLs features •
•
•
• •
exp. for s+b
CLs+b: probability to obtain a result which is 1−CLb ~ 0 less compatible with the signal than the observed result, assuming the signal hypothesis CLb: probability to obtain a result less compatible with the signal than the observed one in the background-only hypothesis exp. If the two distributions are very well for s+b separated than 1−CLb will be very small ⇒ 1−CLb ~ 1 CLb ~1 and CLs ~ CLs+b , i.e: the ordinary p-value of the s+b hypothesis If the two distributions are very close than 1−CLb will be large ⇒ CLb small, preventing CLs to become very small
exp. for b
CLs+b ~ CLs
−2ln(Q) exp. for b
CLs+b < CLs
−2ln(Q)
•
CLs < α prevents to reject where there is little sensitivity
Luca Lista
IN2P3 School of Statistics 2012
45
Observations on CLs method • “A specific modification of a purely classical statistical analysis is used to avoid excluding or discovering signals which the search is in fact not sensitive to” • “The use of CLs is a conscious decision not to insist on the frequentist concept of full coverage (to guarantee that the confidence interval doesn’t include the true value of the parameter in a fixed fraction of experiments).” • “confidence intervals obtained in this manner do not have the same interpretation as traditional frequentist confidence intervals nor as Bayesian credible intervals”
A. L. Read, Modified frequentist analysis of search results (the CLls method), 1st Workshop on Confidence Limits, CERN, 2000
Luca Lista
IN2P3 School of Statistics 2012
46
General likelihood definition • The exact definition of the likelihood function depends on the data model “format”. In general:
signal strength
nuisance parameters PDF, typically Gaussian, log-normal, flat
• Binned case (histogram):
• Unbinned case (signal/background PDFs):
Luca Lista
IN2P3 School of Statistics 2012
47
Nuisance parameters • So called “nuisance parameters” are unknown parameters that are not interesting for the measurement – E.g.: detector resolution, uncertainty in backgrounds, background shape modeling, other systematic uncertainties, etc.
• Two main possible approaches: • Add the nuisance parameters together with the interesting unknown to your likelihood model – But the model becomes more complex! – Easier to incorporate in a fit than in upper limits
• “Integrate it away” (à Bayesian) Luca Lista
IN2P3 School of Statistics 2012
48
Nuisance pars in Bayesian approach • No particular treatment:
• P(θ|x) obtained as marginal PDF, “integrating out” ν:
Luca Lista
IN2P3 School of Statistics 2012
49
How to compute Posterior PDF • Perform analytical integration – Feasible in very few cases
• Use numerical integration
RooStats::BayesianCalculator
– May be CPU intensive
• Markov Chain Monte Carlo – Sampling parameter space efficiently using a random walk heading to the regions of higher probability – Metropolis algorithm to sample according to a PDF f(x) 1. Start from a random point, xi, in the parameter space 2. Generate a proposal point xp in the vicinity of xi 3. If f(xp) > f(xi) accept as next point xi+1 = xp else, accept only with probability p = f(xp) / f(xi) 4. Repeat from point 2
– Convergence criteria and step size must be defined RooStats::MCMCCalculator Luca Lista
Statistical methods in LHC data analysis
50
Nuisance parameters, frequentist • Introduce a complementary dataset to constrain the nuisance parameters ν (e.g.: calibration data, control sample…) • Formulate the statistical problem in terms of both the main data sample (x) and control sample (y)
• Use likelihood method in more than one dimension • May be CPU intensive • Usually leads to results that are very similar to the hybrid Cousins-Highland method (à next slide) Luca Lista
IN2P3 School of Statistics 2012
51
Cousins-Highland hybrid approach • No fully solid background exists on a genera approach to incorporate nuisance parameters within a frequentist approach • Hybrid approach proposed by Cousins and Highland – Integrate(“marginalize”) the likelihood function over the nuisance parameters (Nucl.Instr.Meth.A320 331-335, 1992)
• Also called “hybrid” approach, because some Bayesian approach implicit in the integration: ‘‘seems to be acceptable to many pragmatic frequentists” (G. Zech, Eur. Phys. J. C 4 (2002) 12) • Bayesian integration of PDF, then likelihood used in a frequentist way
• Some numerical studies with Toy Monte Carlo showed that the frequentist calculation gives very similar results in many cases RooStats::HybridCalculator Luca Lista
IN2P3 School of Statistics 2012
52
Profile Likelihood •
A different test statistics w.r.t. L(s+b)/L(b):
Fix µ, fit θ Fit both µ and θ •
• •
•
µ is usually the “signal strength” (i.e.: σ/σSM) in case of Higgs search, instead of number of signal events (s) compatible incompatible Profile likelihood broadened by nuisance with signal with signal parameters θ (loss of information) Nice asymptotic property: distribution of qµ = -2lnλ(µ) tends to a χ2 distribution with one degree of freedom (one parameter of interest = µ) Different ‘flavors’ of test statistics, e.g.: deal with unphysical µ < 0, … RooStats::ProfileLikelihoodCalculator
Luca Lista
IN2P3 School of Statistics 2012
53
CLs with profile likelihood at LHC •
Use profile likelihood as test statistics, then evaluate CLs
•
The constraint ensures that upward fluctuations of the data such that are not considered as evidence against the signal hypothesis, namely a signal with strength µ
•
Agreed estimator between ATLAS and CMS for Higgs search: –
ATL-PHYS-PUB-2011-11
Luca Lista
IN2P3 School of Statistics 2012
54
LEP, Tevatron, LHC Higgs limits
Luca Lista
IN2P3 School of Statistics 2012
55
Asymptotic approximations •
The constrain imposed on the profile likelihood distorts its from Wilk’s[*] asymptotic approximation, so the distribution tends no longer to a χ2, but:
•
Where:
• •
Asimov set (à next slide) Approximations are a valuable way to perform computation quickly More details on asymptotic approximations:
and qµ,A is the test statistics evaluated on the
– Cowan, Cranmer, Gross, Vitells, arXiv:1007.1727, EPJC 71 (2011) 1554 – [*] S.S. Wilks, The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Stat. 9, 60–62 (1938) – A. Wald, Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Am. Math. Soc. 54(3), 426–482 (1943)
Luca Lista
IN2P3 School of Statistics 2012
56
Asimov[*] sets • • • • •
•
Approximate evaluation of expected (median) limits avoiding CPU-intensive generation of toy Monte Carlo samples by using a single “representative set” Replace each bin of the observable distribution (e.g.: reconstructed Higgs mass spectrum) by its expectation value Set nuisance parameters to their nominal value Approximation valid in the asymptotic limit Median significance can be approximated with the sqrt of the test statistic, evaluated at the Asimov set:
Uncertainty bands on expected upper limits can also be evaluated using Asimov sets, avoiding large toy MC extractions:
•
Mathematical validity and approximations of this approach are discussed by Cowan et al. [**] Asimov, Franchise, in Isaac Asimov: The Complete Stories, vol. 1 (Broadway Books, New York, 1990) [**] Cowan, Cranmer, Gross, Vitells, arXiv:1007.1727, EPJC 71 (2011) 1554 [*]
Luca Lista
IN2P3 School of Statistics 2012
57
Look-Elsewhere Effect (LEE) • Imagine you search for a signal peak over a background distribution that is spread over a wide range • You could either: – Know which mass to look at, e.g.: search for a rare decay with a known particle, like Bs→µµ – Search for a peak at an unknown mass value, like for the Higgs boson
• In the former case it’s easy to compute the peak significance: – Evaluate the test statistics for µ=0 (background only) at your observed data sample – Evaluate the p-value according to the expected distribution of t under the background-only hyp., possibly convert it to the area of a Gaussian tail:
Luca Lista
IN2P3 School of Statistics 2012
58
LEE (cont.) •
In case you search for a peak at an unknown mass, the previous p-value has only a “local” meaning: – Probability to find a background fluctuation as large as your signal or more at a fixed mass value:
– Different w.r.t. the (global) probability to find a background fluctuation at least as large as your signal at any mass value – “local” p-value would be an overestimate of the “global” p-value
• •
The chance that an over-fluctuation occurs on at least one mass value increases with the searched range Magnitude of the effect: roughly proportional to the ratio of resolution over the search range – Better resolution = less chance to have more events compatible with the same mass value
•
Possible approach: let also m fluctuate in the test statistics fit:
Luca Lista
IN2P3 School of Statistics 2012
59
Estimate LEE •
The effect can be evaluated with brute-force Toy Monte Carlo – Run N experiments with background-only, find the largest ‘local’ significance over the entire search range, and get its distribution to determine the ‘overall’ significance –
Requires very large toy Monte Carlo samples: need to go down to ~10−7 (5σ: p = 2.87×10−7)
•
Approximate evaluation based on local p-value, times correction factors (“trial factors”, Gross and Vitells, EPJC 70:525-530,2010, arXiv:1005.1891) Asympt. Limit (Wilk’s theorem)
•
〈Nu〉 is the average number of up-crossings of the likelihood ratio scan, can be evaluated at some lower referene level (toy MC) and scaled:
Toy MC Scaling
u
Luca Lista
IN2P3 School of Statistics 2012
60