Exercise 2: Data driven background estimate - sideband fit

Although the Monte-Carlo prediction for the background looks ok, we can actually try to estimate the background normalisation, by determining the scalefactor (α) by fitting the level of background in a signal-free region (a side-band). That will allow you to get a more accurate prediction for the background in the region where there is actually a signal present.

In the most general terms the combined signal + background mass distribution as a function of the 4-lepton invariant mass (m4l) is parametrised as:

\(f_{\rm total} = \mu \cdot f_{\rm Higgs} + \alpha \cdot f_{\rm SM}\)

, here \(\mu\) and \(\alpha\) are the scale factors for the predicted number of events from the SM and Higgs that come from the histograms.

Code you could use form the skeleton code:

SideBandFit()

Exercise 2a)

Perform a likelihood fit to the side-band region 150 ≤ m ≤ 400 GeV to find the optimal scale-factor for the background and it’s uncertainty (α ± ∆α) ? Compute and plot −2ln(L), with the likelihood given by:

\(-2{\rm Log(Lik)} = -2 \cdot \sum_{\rm bins} {\rm Log}( {\rm Poisson}(N_{\rm observed} | \lambda_{\rm expected})\)

In this formula the expected value \(\lambda\) is the expected value in each bin (which you can take from the histogram) multiplied by \(\alpha\).

Exercise 2b)

Use the best estimate of the background scale factor and its uncertainty to predict the level of background (b±∆b) in a 10 GeV window around 125 GeV. You can also use the optimal window that your found in question 1.

Impact of the uncertainty on the background on the expected significance in our counting experiment

The uncertainty on the background will have an effect on the significances we computed in exercise 1 as the expected number of events is now not described by a single Poisson distribution (\(b\)), since also the central value has an uncertainty. To find the p-values you need the distribution expected events. These are called toy-experiments. To get that follow the following procedure. For each toy experiment (\(i\)):

Get a value for the mean for this particular experiment. This you can get by drawing a random number from a Gauss(b\(\pm \Delta b\)) \(\rightarrow \lambda_i\)
Draw a random number from a Poisson with mean \(\lambda_i \rightarrow n_i\).

Do this many times (10,000 events for example – the more the better). When you have the distribution for SM and SM+Higgs you can use the same techniques as in Exercise 1 to compute the expected significance.

Exercise 2c)

Compute the expected and observed significance using this new background estimate. Compare to those in Exercise 1 and discuss the differences.