Bayesian Statistics
The technique relies on Baye's theorem. Most people know about simple probability theorem. All probabilities must add to one and you can model simple systems using this. However, what if we have some additional evidence either in the form of data or some prior knowledge about the model? We then can use Baye's theorem to describe this. Consider some data $D$ and a model $\theta$. The probability of the model given the data $P(\theta|D)$ is given by:
$P(\theta|D)=\frac{P(D|\theta)P(\theta)}{P(D)}$
- $P(\theta|D)$ is the Posterior distribution the probablity of the model parameter given the data which is what we want to find out.
- $P(D|\theta)$ is the Likelihood estimate the probability of the data given the parameter or the compatibility of the data with the given parameters.
- $P(\theta)$ is the Prior distribution the probability of the parameters apart from the data (for example we know that the pulse width of the pulse is not going to be negative).
- $P(D)$ is the probability of the data and is the normalising factor. As we consider ratios of probability densities this will cancel out using MCMC.
You could try and integrate over all possible values for the parameters in your model, however with multidimensional data this no longer becomes possible within a reasonable timescale. We can use Monte Carlo integration which uses random numbers to form and estimate for the density of the parameters. Briefly, a Markov Chain is a time series which depends only on the previous step.
Metropolis and Hastings came up with a method to perform Monte Carlo integration using a Markov Chain to get $P(D|\theta)$.
- Start with an initial point $\theta^{0}$.
- Pick a new point $\theta^{*}$ according to some jumping distribution $q(\theta^{1},\theta^{2})$.
- Calculate the ratio of the density after and before (acceptance ratio $\alpha$). $\alpha = \frac{p(\theta^{*})}{p(\theta^{t-1})}$.
- If the jump increases the density $\alpha$ greater than one accept candidate point otherwise choose a random number $u$ between 0 and 1. If $alpha$ greater than $u$, accept the candidate point.
- Return to step 2 until the chain converges.
Define our model
We with try and model the noise of the data and the parameters. Let us choose the exponential decay as an example $Aexp(-t/\tau)$. If we subtract the model from the data we assume that data is described by a Gaussian or normal distribution
$P(t; \sigma, \tau, A|D) \propto P(D|t; \sigma, \tau, A)P(t; \sigma, \tau, A)$
$P(D|t; \sigma, \tau, A)=Norm(\mu=(\sum_{i}A exp(-t_{i}/\tau)-D_{i}),\sigma^{2})$
The priors $P(\sigma, \tau, A)$ we will take as uniform distribution for the parameters that they are positive. For non-linear regression it is best to use a uniform distribution that does not limit the chain but uses prior knowledge responsibly.
Code it up in Python
Initially just a simple MCMC sampler
import numpy as np import matplotlib.pylab as plt import numpy.random as ran #Setting the random generator seed ran.seed(6041510) #Functions def decay(t,tau,A): return A*np.exp(-(t)/tau) def loggaussdis(d,f,sig): return np.sum(-(d-f)**2/sig**2) def randchange(var,d): return (var+ran.uniform(-d,d)) #Generate test data size = 1000 tdata = np.linspace(0, 100, size) tau = 20 A = 0.03 sig = 0.002 Adata = decay(tdata,tau,A)+ran.normal(0,sig,size) #Fitting using leastsquares fitting #Monte Carlo scheme simsize = 50000 #Iterations tau_d = 0.5 #magnitude of change for tau A_d = 0.01 #magnitude of change for A tau = 19 #Initial value for tau A = 0.025 #Initial value for A #Results array for saving values Res = np.ones((simsize,3)) for i in range(simsize): #Calculate density of current point dist1 = loggaussdis(Adata,decay(tdata,tau,A),sig) #Change parameters tau1 = randchange(tau,tau_d) A1 = randchange(A,A_d) #Calculate new density dist2 = loggaussdis(Adata,decay(tdata,tau1,A1),sig) #Due to densities being Gaussian this becomes subtraction #This is our alpha logB = dist2-dist1 #enforce the prior that both must be possitive. if tau1<=0 or A1<=0: tau = tau A = A else: #If log B greater than 0 then accept change if logB>=0: tau = tau1 A = A1 else: # if closer to one then more likely to be a good step if ran.rand()<=np.exp(logB): tau = tau1 A = A1 #print to results matrix Res[i,0] = logB Res[i,1] = tau Res[i,2] = A #Plotting of results step = np.linspace(0, simsize, simsize) plt.figure() plt.plot(tdata,Adata,'o') plt.title("sample data") plt.figure() plt.subplot(221) plt.hist(Res[:,1],50) plt.subplot(222) plt.hist(Res[:,2],50) plt.subplot(223) plt.plot(step,Res[:,1]) plt.title("tau") plt.subplot(224) plt.plot(step,Res[:,2]) plt.title("A") plt.show()
Data of the exponential decay with overlaid noise
MCMC simulation: bottom plots are the values of the parameters at the different iteration steps. The top graphs are the histograms of the distribution of the parameters.
You then get plots of the different parameters.
Now we can define a number of iterations to discard the burn in number. The autocorrelation is also large this can be reduced by tuning the acceptance criteria or by removing every nth number of traces. We will use burn=1000 and thin=5 so we will discard the first 1000 iterations and only use every fifth trace.
Let us compare this with a simple least squares fit which you can use to deterministically minimise the $\chi^{2}$.
This is made easier by the pymc library for python. We can also try and sample sigma and fit the noise as well as the model parameters.
Code for pymc
#Implementing it in pymc import pymc #sigma value for error out into dictionary d={} sig = pymc.Uniform('sig', 0.0, 0.005, value=0.0018) d['sig'] = sig A = pymc.Uniform('A', 0.02, 0.04, value= 0.025) d['A'] = A tau = pymc.Uniform('tau', 10.0, 30.0, value= 20.0) d['tau'] = tau #model defined as deterministic @pymc.deterministic(plot=False) def f(tdata=tdata,tau=tau,A=A): return A*np.exp(-(tdata)/tau) #likelihood Normal centred on the data with an error 1/sig^2. d['y'] = pymc.Normal('y', mu=f, tau=1/sig/sig, value=Adata, observed=True) R = pymc.MCMC(d) # build the model R.sample(10000) pymc.Matplot.plot(R) plt.show()
You then get plots of the different parameters.
Plots for chain of A: top left if the value of the parameter at different iteration steps, bottom left autocorrelation plot, right is the histogram for the parameters.
Plots for the chain sigma the error of the noise we are modelling.
Plots for the chain tau of the time decay of the exponenital.
Using the method stats() you can get the mean and standard deviation.
print 'A','mean =',R.stats()['A']['mean'],'stdev =',R.stats()['A']['standard deviation'] print 'sig','mean =',R.stats()['sig']['mean'],'stdev =',R.stats()['sig']['standard deviation'] print 'tau','mean =',R.stats()['tau']['mean'],'stdev =',R.stats()['tau']['standard deviation']
A mean = 0.0299488396915 stdev = 0.000293780592428 sig mean = 0.00202956207822 stdev = 4.62119652347e-05 tau mean = 19.7427378282 stdev = 0.274207478109
Let us compare this with a simple least squares fit which you can use to deterministically minimise the $\chi^{2}$.
import scipy vals, cov = scipy.optimize.curve_fit(decay, tdata, Adata) print 'tau','mean =',vals[0],'stdev =',cov[0,0] print 'A','mean =',vals[1],'stdev =',cov[1,1]
tau mean = 19.7517453814 stdev = 0.0724630777926 A mean = 0.0299249150895 stdev = 8.25426324691e-08
The MCMC method gave values very close to the true value. It was also able to recover the noise from the data, not only its values but also its structure.
References
Gamerman, D. and Lopes, H. F., "Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference," 2006, NW, Chapman & Hall/CRC