Modeling Fake Missing Transverse Energy with ...

2 downloads 0 Views 5MB Size Report
We describe how Bayesian neural networks (BNN) can be used to model the ... using γ+jets events and how, in turn, the resulting BNN function can be used to ...
Modeling Fake Missing Transverse Energy with Bayesian Neural Networks Silvia Tentindo and Harrison B. Prosper Department of Physics, Florida State University, Tallahassee, FL 32306, USA E-mail: [email protected] Abstract. Neural networks (NN) are universal approximators. Therefore, in principle, it should be possible to use them to model any reasonably smooth probability density such as the /T ) arising from instrumental effects, so-called probability density of missing transverse energy (E /T . The modeling of fake E /T is an important experimental issue in and other events such fake E as Z(→ l+ l− )+jets, which is an important background in Higgs searches at the Large Hadron /T Collider. We describe how Bayesian neural networks (BNN) can be used to model the fake E using γ+jets events and how, in turn, the resulting BNN function can be used to model the fake missing transverse energy distribution in samples other than γ+jets.

1. Introduction /T )1 is an important observable in many physics analyses at the Missing transverse energy (E CERN Large Hadron Collider (LHC), including electroweak, top, supersymmetry, and analyses searching for exotic phenomena. In particular, missing transverse energy is an important observable in all searches for the Higgs boson in which neutrinos are present in the final state. Neutrinos, like other neutral weakly interacting particles, escape the detector, so their existence can only be inferred from an imbalance in the measured total transverse momentum, that is, in /T observable. The method proposed in this paper for modeling E /T , a non null value of the E /T , is meant to fulfill two goals: 1) to provide a better in particular the fake component of E /T in the low Higgs boson mass region, where the Higgs understanding and measurement of E boson is now expected, and 2) to address the possibility that it may become necessary to measure /T distribution and apply it as a correction to the Monte Carlo simulation of events. the fake E /T distribution using a Bayesian neural network The method proposed fits the observed fake E (BNN) [1, 2]. 2. Missing transverse energy in Higgs searches Finding the Higgs boson is one of the principal objectives of high energy physics today. The work reported here was inspired by the ongoing Higgs searches at the LHC (see, for example, 1

Missing transverse energy is defined as:

/~T ≡ − E

X

p ~T i ,

i

where p ~T i is the momentum, in the plane transverse to the beam direction, of the ith particle, charged or neutral, in an event. Because of conservation of momentum in the transverse plane, and in the absence of undetected particles, the vectorial sum of the transverse momentum must be zero.

Ref. [3]).

Figure 1. (left) SM cross sections for different Higgs production processes. The dominant process is pp → H, which proceeds via gluon-gluon fusion. (right) SM Higgs decay branching ratios (BR). The channel H → ZZ has a BR of approximately 10% at 150 GeV [3].

/T . Table 1. Examples of Higgs decay channels that contain E 1 2 3 4

Higgs decay channel H → ZZ H → WW H → ZZ H → ZZ

W/Z boson decay channels Z → l, l, Z → ν, ν W → l, ν, W → l, ν Z → l, l, Z → b, b Z → l, l, Z → j, j

/T is essential. The For completeness, we briefly review the present Higgs searches in which E production cross section for the Standard Model (SM) Higgs boson, at the present energy of the LHC (7 TeV), is dominated by the gluon-gluon fusion process (see left plot of Fig. 1) and is a few pb in the Higgs boson mass window 120–150 GeV. In the same mass window, the dominant decay channels are H → W W , H → ZZ and H → b¯b. For higher Higgs boson masses, the b¯b contribution decreases, and the W W and ZZ become dominant (see right plot of Fig. 1). In Table 1 are given the decay channels for the ZZ and W W . In the decay channels /T is generated by the presence of neutrinos (“real” E /T ) that originate from the weak 1 and 2, E /T , however, even in channels where the decay of the W or Z gauge bosons. There could be E /T , the decay produces no neutrinos, such as the decay channels 3 and 4 in Table 1. This E /T , is related mainly to the presence of jets, including b quark jets, and is due so-called “fake” E to several instrumental factors, such as: jet energy fluctuations, mis-measurements of momenta, misidentification of particles, un-instrumented regions of the detector, to cosmic rays crossing the detector, or beam-halo particles. Jets from recoil and from fragmentation contribute to fake

/$ /$ 1$

2$

2$

/$

2$ /$ !"#$,-.($

!"#$%&'($ 0$

H ! Z Z ! (l, l) (v, v)

3$

0$

3$

4$

pp ! Z (l,l) )$*(%+$

5$ 5$

5

Figure 2. (left) Higgs channel producing true MET: H → Z(l, l)Z(ν, ν) (the signal). (right) Drell-Yan background channel producing fake MET : pp → Z(l, l) + jets (the dominant background to the signal). Energy fluctuations or mis-measurement of the jets can lead to fake MET that mimics undetected neutrinos. / . Processes such as multi-jet QCD or Drell-Yan, that constitute backgrounds to the Higgs E

T Figuresignals, 2. (left) Higgs channel producing true MET: H → Z(l, l)Z(ν, ν) (the signal). (right) /T . also produce a non null contribution to E Drell-YanThe background Z(l, l) + jets dominant left diagramchannel in Fig. 2producing shows the fake HiggsMET decay :topp ZZ→ to neutrinos, which(the is the main background to the signal). Energy fluctuations or mis-measurement of the jets can lead /T , while the diagram to the right depicts its dominant background, which source of real E is to fake MET that mimics undetected neutrinos. /T . Figure 4 shows event the Drell-Yan process, pp → Z(l, l) + jets, and which contains fake E

displays of di-lepton events observed by the CMS Collaboration [3] that illustrate what a signal and background event might look like.

/T . Processes such as multi-jet QCD or Drell-Yan, that constitute backgrounds to the Higgs E /T . signals,3.also produce a non null contribution to E /T Monte Carlo simulations of E TheThe leftMonte diagram in Fig. 2 shows the Higgs decay tois currently ZZ to neutrinos, which main /T distributions Carlo (MC) simulation of E satisfactory. Figureis 3the shows /T , while source the of real E the diagram its backgrounds dominant background, which /T distribution simulated E for Hto→the Z(l,right l)Z(ν,depicts ν) and its compared to the (1 is −1 of) data obtained by the ATLAS Collaboration [4], taken in the first half of 2011. For this fb / the Drell-Yan process, pp → Z(l, l) + jets, and which contains fake ET . Figure 4 shows event 2 have not been an issue. In the E /T [3] period, pile up (PU) distribution of Fig.what 3, thea low displays of di-lepton eventseffects observed by the CMS Collaboration that illustrate signal / / E region (where the Drell-Yan background, Z+jets, is dominant) is mainly due to fake E T T, and background event might look like.

/T region (where top, W W and ZZ backgrounds dominate and the Higgs signal while the high E /T is mostly real, that is, due to undetected neutrinos. is expected) E /Tshow some discrepancy between Monte Carlo predictions and 3. Monte Carlo simulations of E More recent analyses, however, /T in The Monte Carlo (MC) simulation of E distributions currently satisfactory. Figure 3 shows /Tisregion, data (taken in the latter part of 2011) the lower E due to pile up. Several attempts /Tmade, the simulated E distribution for Hmade, → Z(l, backgrounds compared to the /T its have been and are being to l)Z(ν, design ν) newand E algorithms that are more robust to (1 piledata up; but, the more effects increase, the more taken sophisticated algorithms become. Wethis fb−1 of) obtained by these the ATLAS Collaboration, in thethe first half of 2011. For

/T distribution of Fig. 3, the low period,2 pile up (PU) effects2 have not been an issue. In the E As the luminosity of the machine increases, the number of interactions per beam collision increases. This causes /T region /T , E theinDrell-Yan events(where to “pile up” the detector. background, Z+jets, is dominant) is mainly due to fake E /T region (where top, W W and ZZ backgrounds dominate and the Higgs signal while the high E /T is mostly real, that is, due to undetected neutrinos. is expected) E 2

As the luminosity of the machine increases, the number of interactions per beam collision increases. This causes events to “pile up” in the detector.

ATLAS

s = 7 TeV

!

-1

Data L dt = 1.04 fb Total Background Z ZZ,WZ,WW Top Other Backgrounds Signal (m = 400 GeV)

H # ZZ # ll""

105

Events / 0.4 rad

Events / 5 GeV

106 104 103

H

102 10 1.20 1 0.8

0

50

100

150

200

250

300

50

100

150

200

250

300

Data / MC

Data / MC

1

108 107 106 105 104 103 102 10 1 1.20 1 0.8

0

Dat Tot Z ZZ, Top Oth Sig

0.5

0.5

Emiss [GeV] T

miss FIG. ET (left) ∆φ(",of")E/(right) distributions for events with exact Figure2:3.The Data/Monte Carlo and comparison T including the expected signal (H → Z(l, l)Z(ν, ν))Z formass a 400 GeV SM Higgs boson. (Courtesy ATLAS Collaboration inside the window. The insets at the bottom show [4].) the ratio between dat as well as a band corresponding to the combined systematic uncertainties of the

could reach a level of pile up, or a beam energy or both, where no algorithm will be adequate and an alternative approach may be needed.

the m!! window scale, PDF and 4. Motivation for thiscut. work Since inclusive Z production gives miss /T algorithms, We propose a “data-driven” method, as an distribution, alternative to the development of new E rise to a steeply falling ET systematic ungrounds are no / / that is based on measuring the fake component of E . The E (that automatically includes T T miss certainties on thefrom ET reconstruction are particularly /T distribution The backgro pile up effects) obtained fitting the observed fake E could be used to correct / / simulated E distributions. The corrected simulated E distribution could then be used in T T correctly. important to estimate this background The from MC, afte / analyses [5] whose aim is to identify as signal candidates all events whose E is above some T miss dominant the E uncertainty /T would well the data threshold. Thiscontributions kind of “data-driven”to modeling of T E be intrinsically come robust with respect to all possible instrumental and physical effects connected with increased pile up and beam from the knowledge of the jet energy scale and the modence of a lepto energy, which will unavoidably affect the data taking at the LHC in the years to come. elling of also inclusive Figure 2 LHC, shows also taken f We note that, basedZonproduction. the preliminary results from the the that Higgs bosonis mass /T notagreement yet excluded, haswithin been reduced to a window uncertainties around 130 GeV. This to E aregion, good systematic iscorresponds obis verified to a /T component (and pile up effects) dominate. values of about 50 GeV, where typically the fake E served between data and the combined background extainties, in tw /T low mH region, events are required to 5. Modeling fake E pectation. In the one requires at In order to apply our method, we need first to find in Z+jets events observables that are miss satisfy ET > GeV,withwhilst mHcanregion selects events /T , the both well measured and 66 correlated the fake in E so thathigh the latter be inferred from the miss measurement of these observables. noted the Z+jets process the requirement is ET As > 82above, GeV. These cutsconstitutes reducethe dominant Additional b background to the H → ZZ → (l, l), (ν, ν) signal (Fig. 2, right), with an event yield five orders significantly the backgrounds from processes with no or events or inclu modest genuine missing transverse momentum originatdecays or jets f ing from unobserved neutrinos. background is The boost of the Z bosons originating from a Higgs boMC in control son decay increases with mH , thus reducing the expected electron-muon

/T in Z(l, l) + jets of magnitude larger than that of the signal3 . Since our goal is to model the E events (Fig. 2, left), we need to work with a sample, other than Z+jets, that is dominated by /T . fake E

/T , most likely due to neutrinos. (right) A 3D view of a Figure 4. (left) An event with large E Z+jets event. (Courtesy of CMS Collaboration [3].)

Figure 5. An event display of a photon+jets candidate event. Collaboration [3].)

(Courtesy of CMS

5.1. Z+jets and photon+jets samples Photon+jets (γ+jets) events4 are kinematically and topologically similar to Z+jets events. /T distributions to be similar. We shall measure Therefore, a priori, one would expect their fake E /T in a photon + jets sample for three additional reasons: the fake E 3 /T ≥ 60 GeV suffices to suppress significantly the fake For Higgs masses above 250 GeV, a cut of, e.g., E /T . However, for Higgs boson masses in the range 120–150 GeV, the E /T of the signal and the fake component of E /T overlap significantly. E 4 For the present work, we have used a sample simulated using Pythia 6 [6] and the fast detector simulation program Delphes [7].

• the cross section for photon+jets is higher than that for Z+jets; /T distribution with as little bias as possible, it is preferable • in order to measure the fake E /T , and to use a sample of events with as little contamination from events with real E • the transverse momentum (pT ) of the photon, as well as that of a Z boson decaying to leptons (muons or electrons), can be measured with high precision. Figure 5 shows a photon+jets candidate event observed by CMS [3]. 5.2. Strategy The steps of our method are: (i) choose an observable, here the photon pT , of the photon+jets system that is well measured /T ; and correlated with E /T distribution in a sample of photon+jets events by fitting the conditional (ii) measure the E /T |pT ) to the E /T and photon pT data; probability density p(E (iii) check the accuracy of the fit by computing /T ) = fγ (E

Z

/T |pT ) fγ (pT ) dpT , p(E

/T where fγ (pT ) is the photon pT spectrum, and compare the predicted photon+jets E /T ), with the observed distribution (that is, perform a closure test), and distribution, fγ (E /T distribution in a given sample of Z+jets events, fZ (E /T ), using (iv) predict the E /T ) = fZ (E

Z

/T |pT ) fZ (pT ) dpT , p(E

where fZ (pT ) is the pT spectrum of the Z bosons.

w* wi

x1   y(x,w)

x2  

nn(x) = y(x, w*)

bnn(x) =

1 N ∑ y(x, wi ) N i=1

Figure 6. (left) An example of the structure of a 2-input, 5-hidden node, single output neural € network, showing the connections between nodes, each of €which is associated with a weight. (right) Training an NN entails finding a single point w∗ in the network parameter space, while (right) training a BNN entails sampling from a posterior probability density p(w|T ) defined on the parameter space, where T denotes the training data. Given a sample of size N of parameter points wi , the BNN (as it is used in high energy physics) is given by the average of the NN functions y(x, wi ).

6. Missing transverse energy probability density and neural networks Before proceeding with the analysis, we describe the procedure we use to approximate the /T |pT ), mentioned in (iii) of the previous section. By definition, the probability density p(E /T |pT ) is conditional probability p(E /T |pT ) = p(E /T , pT )/p(pT ), p(E

(1)

/T , pT ) is the joint probability density of the E /T and the photon pT (displayed in Fig. 7 where p(E (right)) and p(pT ) is the probability density of the photon pT . Given any known distribution of /T , u(E /T ), it proves useful to re-write Eq. (1) as E /T ) p(E /T , pT ) u(E , /T ) p(pT ) u(E

/T |pT ) = p(E

/T , pT ) F (E /T ) = u(E , /T , pT ) 1 − F (E "

where /T , pT ) ≡ F (E

#

/T , pT ) p(E . /T , pT ) + u(E /T ) p(pT ) p(E

(2)

(3)

/T , pT ) with a Bayesian neural network, and, in this exploratory study, we We approximate F (E /T ) to be a uniform density. choose u(E 6.1. Bayesian neural networks In order to appreciate how a Bayesian neural network (BNN) [1, 2] differs from a normal feedforward neural network (NN) (see, for example, Ref. [8] for a pedagogical introduction), we briefly review how the latter is constructed. A neural network is simply a non-linear function of one or more variables x1 , · · · , xI . The specific function we have used in this study is y(x, w) =

1 , 1 + exp[−f (x, w)]

where f (x, w) = b +

H X j=1

vj tanh(aj +

(4) I X

uji xi ),

(5)

i=1

where I is the number of inputs and H is the number of hidden nodes. The parameters w = (b, v, a, u) are often referred to as weights. An example of a 2-input, 5-hidden node, neural network in shown in Fig. 6 (left). Typically, a neural network is trained by minimizing a function of labeled training data. In the current study, where we use a sample of simulated photon+jets events, each event is /T , pT ). We divide the photon+jets sample into two classes: characterized by the variables x = (E /T values randomly one which uses the original missing transverse energies, while the other uses E /T ), which in this study we take to be a uniform distribution in sampled from the density u(E the range 0 to 100 GeV. Training a neural network (NN) means finding a single point w∗ in the space of network weights, as illustrated in Fig. 6 (sketch 1 on the right), while training a bayesian neural network (BNN) involves sampling from a suitably defined posterior density p(w|T ) [1], where T denotes the training data, as shown in Fig. 6 (sketch 2 on the right). In effect a Bayesian neural network is an average of the NN functions y(x, wi ). In this work we use the Flexible Bayesian Modeling (FBM) software by Neal [1], which generates a sample of points wi by a Markov Chain Monte Carlo (MCMC) method.

PTγ [GeV/c]

1500

40

1000

30

500

-1

50

-0.5

0

0.5

1

∆φ(MET, jets) [GeV]

20 0

10

20

30 MET [GeV]

/T and the jet system in the Figure 7. (left) The cosine of the azimuthal angle φ between the E photon + jets sample. The peak at +1 and −1 is evidence of the fake nature of the measured /T , originating mainly from fluctuations in the jet energy measurement. (right) 2D plot of E /T E vs pT showing the correlation between these two observables. The pT of the photon has the advantage of being measured with very high precision. 7. Results We begin with a brief note about the characteristics of the photon+jets sample, in particular /T . the pT of the photon, the jet system, and the E Figure 7 (left) shows, for (simulated) photon + jets events, the distribution of the cosine of /T vector and the pT vector of the jet system. This figure the azimuthal angle between the E /T , typical for the photon + jets process. The peaks at +1 and exemplifies the fake nature of E /T vector and the vector sum of the transverse momenta of the jets are −1 are evidence that the E correlated, specifically they tend to be aligned either parallel or antiparallel to each other. For example, the −1 component originates from (under)fluctuations in the jet energy measurement. /T should be zero. In absence of fluctuations, the E The results of our exploratory study are as follows. /T for simulated (i) Figure. 7 (right) shows the correlation between the photon pT and E photon+jets events. It is a basic requirement of our method that the photon and the jet system be correlated in some way. In the transverse plane, and in absence of undetected particles and in absence of instrumental effects, their transverse momentum vectors should be back-to-back. Moreover, because of conservation of momentum, we expect that the magnitude of the pT vector of the jet system be correlated with the magnitude of the pT vector of the photon. It is therefore plausible that the pT of the photon is correlated with /T in this sample. the E /T |pT ) obtained from the fitting the BNN to (ii) Figure 8 shows the probability distribution p(E /T and photon pT data in a photon + jets Monte Carlo sample. The function p(E /T |pT ) the E is a continuous, smooth, function of its arguments, but for simplicity the figure shows its form for two different values of the pT of the photon, 20 GeV/c (left) and 30 GeV/c (right). /T at the same The BNN curve (red line) is compared to the MC points (black dots) for E fixed values of the photon pT . The bigger errors in the right plot are due to the rapidly decreasing statistics of photon events at higher photon pT values.

T

p(MET | p )

T

p(MET | p )

300

10

200

5

100

0 0

15

50

0 0

100

50

MET [GeV]

100 MET [GeV]

Figure 8. For two different photon pT bins around 20 GeV/c (left) and 30 GeV/c (right), the /T |pT ) is compared to the (simulated) distribution for the same pT BNN-fitted distribution p(E bins.

2000

1000

0 0

50

100 MET [GeV]

/T ) distribution, integrated Figure 9. Results for the closure test on BNN training: BNN p(E /T ). over all values of Photon Pt, compared to the simulated Monte Carlo p(E /T |pT ), (iii) Figure 9 shows the results of the closure test, in which the BNN-fitted density—p(E /T distribution. The integrated over the photon pT spectrum—is compared to the E preliminary results of the closure test look promising, especially in view of the fact that we have yet to optimize the training parameters of the BNN method itself nor the choice of observables. 8. Summary and conclusions /T could As the LHC luminosity and beam energy increase, we expect that the simulation of E /T spectrum from photon+jets data become harder. We proposed a method to extract the fake E

and approximate it with a Bayesian Neural Network. The method could be useful in modeling /T for Z+jets events, which are the dominant background in the Higgs search, in the the fake E channel H → Z(l, l)Z(ν, ν). Acknowledgments We are thankful to the organizers of this very interesting, inspiring and enjoyable conference. References [1] R.M. Neal, Bayesian Learning of Neural Networks, Springer-Verlag, New York, 1996. [2] P.C. Bhat and H.B. Prosper, Bayesian neural networks, in Statistical Problems in Particles, Astrophysics and ¨ Cosmology, Imperial College Press, Editors L. Lyons and M. Unel, 2005. [3] CMS Physics Results. https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResults. [4] Aad G. et al., ATLAS Collaboration, CERN-PH-EP-2011-142, arXiv:1109.3357; submitted to Physical Review Letters. [5] Search for the Higgs in the H → Z(l, l)Z(ν, ν) channel in pp collisions at 7 TeV, the CMS Collaboration, CMS-PAS-HIG-11-016 [6] Pythia 6, Sjostrand T, Mrenna S and Skands P Z, JHEP 0605, 026 (2006), hep- ph/0603175. [7] Delphes, Ovyn S, Rouby X and V. Lemaitre V, arXiv:0903.2225. [8] Bishop C M, Pattern Recognition and Machine Learning, Springer, New York, 2007.

Suggest Documents