Thursday, 23 May 2013

Fitting the noise

One of the new additions to PyTrA is Markov Chain Monte Carlo model checking. To begin I will go through Bayesian statistics, coding this up in python, using the pymc library and comparing this with normal fitting techniques.

Bayesian Statistics

The technique relies on Baye's theorem. Most people know about simple probability theorem. All probabilities must add to one and you can model simple systems using this. However, what if we have some additional evidence either in the form of data or some prior knowledge about the model? We then can use Baye's theorem to describe this. Consider some data $D$ and a model $\theta$. The probability of the model given the data $P(\theta|D)$ is given by:

  • $P(\theta|D)$ is the Posterior distribution the probablity of the model parameter given the data which is what we want to find out.
  • $P(D|\theta)$ is the Likelihood estimate the probability of the data given the parameter or the compatibility of the data with the given parameters.
  • $P(\theta)$ is the Prior distribution the probability of the parameters apart from the data (for example we know that the pulse width of the pulse is not going to be negative).
  • $P(D)$ is the probability of the data and is the normalising factor. As we consider ratios of probability densities this will cancel out using MCMC.
One of the applications of Bayesian inference is the scientific method. Baye's theorem guides the certainty of the models based on new observations and experiments.

You could try and integrate over all possible values for the parameters in your model, however with multidimensional data this no longer becomes possible within a reasonable timescale. We can use Monte Carlo integration which uses random numbers to form and estimate for the density of the parameters. Briefly, a Markov Chain is a time series which depends only on the previous step.

Metropolis and Hastings came up with a method to perform Monte Carlo integration using a Markov Chain to get $P(D|\theta)$.
  1. Start with an initial point $\theta^{0}$.
  2. Pick a new point $\theta^{*}$  according to some jumping distribution $q(\theta^{1},\theta^{2})$.
  3. Calculate the ratio of the density after and before (acceptance ratio $\alpha$). $\alpha = \frac{p(\theta^{*})}{p(\theta^{t-1})}$.
  4. If the jump increases the density $\alpha$ greater than one accept candidate point otherwise choose a random number $u$ between 0 and 1. If $alpha$ greater than $u$, accept the candidate point.
  5. Return to step 2 until the chain converges.
Hastings generalised the acceptance ratio to take into account unsymmetrical transitions using a transition probability function $q(\theta^{1},\theta^{2})$.

Define our model

We with try and model the noise of the data and the parameters. Let us choose the exponential decay as an example $Aexp(-t/\tau)$. If we subtract the model from the data we assume that data is described by a Gaussian or normal distribution
$P(t; \sigma, \tau, A|D) \propto P(D|t; \sigma, \tau, A)P(t; \sigma, \tau, A)$ 
$P(D|t; \sigma, \tau, A)=Norm(\mu=(\sum_{i}A exp(-t_{i}/\tau)-D_{i}),\sigma^{2})$
The priors $P(\sigma, \tau, A)$ we will take as uniform distribution  for the parameters that they are positive. For non-linear regression it is best to use a uniform distribution that does not limit the chain but uses prior knowledge responsibly.

Code it up in Python

Initially just a simple MCMC sampler
import numpy as np
import matplotlib.pylab as plt
import numpy.random as ran
#Setting the random generator seed
def decay(t,tau,A):
    return A*np.exp(-(t)/tau)
def loggaussdis(d,f,sig):
    return np.sum(-(d-f)**2/sig**2)
def randchange(var,d):
    return (var+ran.uniform(-d,d))
#Generate test data
size = 1000
tdata = np.linspace(0, 100, size)
tau = 20
A = 0.03
sig = 0.002
Adata = decay(tdata,tau,A)+ran.normal(0,sig,size)
#Fitting using leastsquares fitting
#Monte Carlo scheme
simsize = 50000 #Iterations
tau_d = 0.5 #magnitude of change for tau
A_d = 0.01 #magnitude of change for A
tau = 19 #Initial value for tau
A = 0.025 #Initial value for A
#Results array for saving values
Res = np.ones((simsize,3))
for i in range(simsize):
    #Calculate density of current point
    dist1 = loggaussdis(Adata,decay(tdata,tau,A),sig)
    #Change parameters
    tau1 = randchange(tau,tau_d)
    A1 = randchange(A,A_d)
    #Calculate new density
    dist2 = loggaussdis(Adata,decay(tdata,tau1,A1),sig)
    #Due to densities being Gaussian this becomes subtraction
    #This is our alpha
    logB = dist2-dist1
    #enforce the prior that both must be possitive.
    if tau1<=0 or A1<=0:
        tau = tau
        A = A
        #If log B greater than 0 then accept change
        if logB>=0:
            tau = tau1
            A = A1
            # if closer to one then more likely to be a good step
            if ran.rand()<=np.exp(logB):
                tau = tau1
                A = A1
    #print to results matrix
    Res[i,0] = logB
    Res[i,1] = tau
    Res[i,2] = A
#Plotting of results
step = np.linspace(0, simsize, simsize)
plt.title("sample data")
Data of the exponential decay with overlaid noise

 MCMC simulation: bottom plots are the values of the parameters at the different iteration steps. The top graphs are the histograms of the distribution of the parameters.

This is made easier by the pymc library for python. We can also try and sample sigma and fit the noise as well as the model parameters.
Code for pymc
#Implementing it in pymc
import pymc
#sigma value for error out into dictionary
sig = pymc.Uniform('sig', 0.0, 0.005, value=0.0018)
d['sig'] = sig
A = pymc.Uniform('A', 0.02, 0.04, value= 0.025)
d['A'] = A
tau = pymc.Uniform('tau', 10.0, 30.0, value= 20.0)
d['tau'] = tau
#model defined as deterministic
def f(tdata=tdata,tau=tau,A=A):
    return A*np.exp(-(tdata)/tau)
#likelihood Normal centred on the data with an error 1/sig^2.
d['y'] = pymc.Normal('y', mu=f, tau=1/sig/sig, value=Adata, observed=True)
R = pymc.MCMC(d)    #  build the model

You then get plots of the different parameters.
 Plots for chain of A: top left if the value of the parameter at different iteration steps, bottom left autocorrelation plot, right is the histogram for the parameters.
Plots for the chain sigma the error of the noise we are modelling.
 Plots for the chain tau of the time decay of the exponenital.

Now we can define a number of iterations to discard the burn in number. The autocorrelation is also large this can be reduced by tuning the acceptance criteria or by removing every nth number of traces. We will use burn=1000 and thin=5 so we will discard the first 1000 iterations and only use every fifth trace.
Using the method stats() you can get the mean and standard deviation.
print 'A','mean =',R.stats()['A']['mean'],'stdev =',R.stats()['A']['standard deviation']
print 'sig','mean =',R.stats()['sig']['mean'],'stdev =',R.stats()['sig']['standard deviation']
print 'tau','mean =',R.stats()['tau']['mean'],'stdev =',R.stats()['tau']['standard deviation']
A mean = 0.0299488396915 stdev = 0.000293780592428
sig mean = 0.00202956207822 stdev = 4.62119652347e-05
tau mean = 19.7427378282 stdev = 0.274207478109

Let us compare this with a simple least squares fit which you can use to deterministically minimise the $\chi^{2}$.
import scipy
vals, cov = scipy.optimize.curve_fit(decay, tdata, Adata)
print 'tau','mean =',vals[0],'stdev =',cov[0,0]
print 'A','mean =',vals[1],'stdev =',cov[1,1]
tau mean = 19.7517453814 stdev = 0.0724630777926
A mean = 0.0299249150895 stdev = 8.25426324691e-08

The MCMC method gave values very close to the true value.  It was also able to recover the noise from the data, not only its values but also its structure.


Gamerman, D. and Lopes, H. F., "Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference," 2006, NW, Chapman & Hall/CRC

No comments:

Post a Comment