Let and be two families of models, indexed by
and a time series instance. We consider the following statements:
The sample was drawn from a model in
The sample was drawn from a model in
In order to decide between those statements, we consider their likelihood ratio.
Definition
We define the Likelihood ratio to be
or, in the continues case
We define the Log-likelihood ratio as .
For a given confidence level we accept the Hypythesis if
.
Example
Take , a time series and .
To test the hypothesis
: The instance was drawn from
: The instance was drawn from
we calculate the likelihood ratios as:
where is the average of , the variable
is half the distance between and and
is the sample mean.
For the second step we used the following simple identity:
Note, that if .
And if , i.e. is closer to
than to .
We accept the Hypothesis if , which is equivalent
to:
l2 == l ? True
Hypothesis H_1 accepted: False
Log likelihood ratio for plotted time series Y
Test for changes in mean
For given we consider the hypotheses:
: Constant mean model
: Change in mean at a time :
For an instance we calculate the log-likelihood ration to be
and using the notation from the last example we get
We introduce the notation so
that we can write
The total log likelihood ratio is computed by explicit maximization:
Note, that , since .
Online Variant
It turns out, that there is a simple recursion, which allows us to
compute the likelyhood ratio for an instance
of length from the knowledge of
for the instance of length .
Indeed, we have
Note, that .
The minimum-term we have
In case we have and in case
Since we always have , we get the total recursion:
Set then we get the recursion:
which is known as the CUSUM method.
View the version history of this post on GitHub. Comments have been disabled until the dust around the GDPR settled.