Chapter 1: a counting experiment

In the first part of the walkthrough exercise we will study a few basic concepts related to a counting experiments, i.e. how to quantify the compatibility with a number of observed events with expectations from two hypotheses: in our case the Standard Model (null) hypothesis and the alternative in which we assume a HIggs boson to be present.

For this we need to introduce a few concepts: the Poisson distribution, p-values and the (expected and observed) significances. We will do this is a compact form and for more details we refer you to the slides and many (online) lectures and books. To illustrate some of the concepts we will use an example where we expect 10 events from the Standard Model, 5 extra if the Higgs boson exists and we have observed 12 events in data.

1. The Poisson distribution

The Poisson distribution is a discrete probability distribution that describes the probability to observe a certain number of events (n) given a well defined predicted mean number of events (\(\lambda\)). The Poisson distribution is a binomial distribution in the limit where the \(p\) tends to zero and the number of trials goes to infinity. It appears in many places in particle physics.

To be concrete, the probability to observe n events when \(\lambda\) are expected is given by:

\(\rm{Poisson} (n l \lambda) = \frac{\lambda^{n}e^{-λ}} {n!}\)

In this expression n is an integer and \(\lambda\) a real number.

Example: 10 events expected from the Standard Model and 5 extra from Higgs

The probability distribution for number of expected events, when a mean number of 10 events are expected is shown in the figure below - Poisson(n|10), and a similar distribution for the alternative hypothesis when the Higgs boson is present.

Figure 1: Probability distribution two hypotheses

The Poisson distribution has some very interesting properties. When \(\lambda\) is large the distribution tends to a normal distribution, but for smaller \(\lambda\) the asymmetric nature of the distribution is more pronounced. Some interesting properties for Poisson(n | \(\lambda\)):

The standard deviation is \(\sqrt{\lambda}\). This is the famous \sqrt{n} uncertainty that is often (not entirely correct) assigned to a data point when n events have been observed.
If \(\lambda\) is an integer number, the probability to observe \(\lambda\) or \(\lambda\)-1 are equal.
The number of events with the highest probability to be observed is the first integer smaller than \(\lambda\), so if you expect 4.99 events it is still more likely to observe 4 events than 5.

It is a good investment to spend some time studying at a few example distributions. If you for example expect 5 events on average, what are the probabilities to observe 4 and 6 events respectively? And imagine now that you observe 5 events in the data. Is it then more likely that the 'real' underlying model (\(\lambda\)) predicted 4 events on average or 6?

We will use the Poisson probability extensively in this tutorial. A Poisson distribution describes the expected number of events in a specific range in the 4-lepton mass distribution and we will use it to quantify compatibilities with specific hypotheses. In later exercises we will use it to quantify compatibilities of the data in a bin with the prediction of that in turn depends on the parameters in our model.

2. P-values (expected and observed)

The p-value is a measure of the (in)compatibility of a given number of events with the null hypothesis, i.e. the SM prediction. More precise: the p-value is defined as the probability to observe an excess as large as the number of observed/expected events (or even more) under the SM hypothesis.

Note that you can not only compute an observed p-value (when integrating the SM Poisson upwards from the number of observed events in data), but you can also define an expected significance. In that case you compute what the significance would be if you would see exactly as much events in data as you would expect on average from the alternative hypothesis. In our case that would mean that we would expect to see 15 events (10 SM + 5 Higgs). The expected significance is a measure of the 'strength' of the analysis, i.e. how well can we separate the Higgs signal on average from the Standard Model and people use this to optimize their analysis. And so will we in the exercises.

Example: 10 events expected, 12 observed, 5 from the Higgs: observed and expected p-values

If we expect 10 events from the Standard Model and observe 12 events we can use the Poisson distribution to compute the corresponding p-value:

p-value (observed) = \(\int_{12}^{\infty} Poisson(n|10) dn\) = 0.303

The observed p-value is computed by assuming we see 15 events

p-value (expected) = \(\int_{15}^{\infty} Poisson(n|10) dn\) = 0.083

Figure 1: Probability distribution two hypotheses

The smaller the p-value, the smaller the compatibility with the hypothesis under consideration and if the p-value drops below a pre-defined threshold we say that the numer of observed events is no longer compatible with the SM hypothesis and we can reject it: a discovery! We will say a bit more about it later as a similar procedure, with much less strict requirements, is used to reject alternative hypotheses: in our case the presence of a HIggs boson in the data.

An important remarks to make, we cannot repeat it enough, is that a p-value is not the probability that the null hypothesis is true or false. It's a measure of the compatibility of the observed data with the null hypothesis.

3. From p-value to significance

Rather than the p-value itself, we often use a related quantity called significance. This is the x-value (expressed in terms of the standard deviation σ) in a normal distribution for which the integral up to infinity is equal to the p-value. This allows us to quantify the excess (or deficit) with respect to a specific hypothesis either as a p-value or a significance in terms of numbers of sigma.

Example: from p-value to significance using the observed p-value in our example

Looking at the example mentioned before, we see that the expected p-value of 0.083 corresponds to a significance of 1.83 sigma.

Figure 1: Probability distribution two hypotheses

With the expected number of signal and background events given by s and b respectively, note that the difference between the two hypotheses is given by s (s+b minus b-only) and that the uncertainty on the background-only hypothesis is given by (\(\sqrt{b}\)). Assumig the Poisson is roughly a normal distribution, as estimate of the significance is given by

Significance \(\sim \frac{s}{\sqrt{b}}\).

Note that the smaller the p-value (the larger the significance) the smaller the compatibility with the hypothesis under consideration and if the p-value drops below a pre-defined threshold we say that the numer of observed events is no longer compatible with the Standard Model hypothesis and we can claim a discovery. It is clear that to reject the Standard Model we require very strong evidence. And indeed we set the threshold at an event that is be expected to occur by chance about one time in 3.5 million, under the null hypothesis. This is the famous 5σ significance that is generally accepted as the "discovery" threshold in particle physics.

3. Effect of increasing the data-set (increase in luminosity)

If the luminosity increases the difference between the two hypotheses becomes more pronounced if expressed in terms of the uncertainty on the background. As an example we give the expected significance for our example and the scenario in which we would have a four times larger data set.

Figure 1: Probability distribution two hypotheses

With four times more luminosity the expected significance is indeed increased, but as it is still below the 5 sigma threshold we do not expect to be able to make a discovery.

expected p-value = \(\int_{n_{\rm exp} = 60}^{\infty} Poisson(n | 40) dn\) = 0.0019 (2.9 sigma)

4. Discoveries and rejecting alternative hypotheses

Up to now we have talked about ‘incompatibility with the SM hypothesis’ and computed p-values and significances to quantify that (in)compatibility. We can use similar arguments to construct an (in)compatibility with an alternative hypothesis, in our case the existence of a Higgs boson.

To reject an alternative hypothesis we define a parameter similar to the p-value as: the probability to observe a deficit as large as the number of observed/expected events (or even less) under the alternative (Higgs) hypothesis. For an alternative hypotesis (typically a new particle or a specific coupling scale factor) we are less strict and rejection is typically done at a level of (incompatibility) of 5%. As a result we claim to reject the alternative hypothesis at 95% confidence level. This basically says that we do not believe in the hypothesis that the Higgs boson was actually there, but we siply observed a downward fluctuation.

Figure 1: Probability distribution two hypotheses

Observed incompatibility = \(\int_{0}^{n_{\rm obs}} Poisson(n | 15) dn\) = 26.8%.

In our case the value is 26.8% and that is still well compatible with the Standard Model (null) hypothesis. This means we cannot reject the Higgs hypothesis.

In many analyses we do not just put a binary yes/no, but will test hypothesis in which the Higgs cross-section is larger than that predicted from the SM. This will allow different theoretical alternatives the predict different cross sections to tested simultaneously.

As an example the plot below shows a model in which the Higgss cross section is scaled with a factor two relative to that predicted in the SM. In that scenario the (in)compatibility is 3.9% and this means we do not consider the observed number of events (12) to be compatible with (a downwards fluctuation) of the new model and we reject it. At 95% confidence level.