October 20, 2018

Markov Chain Monte Carlo made easy: Gibbs sampling.

In a previous post we've introduced Monte Carlo techniques and hint its many applications. In another postwe've shown a simple Markov Chain. The core of Markov Chain Monte Carlo methods is coming up with a function that makes a probabilistic choice about what state to go next in a Markov Chain. So that, similarly to the pi example, each state is visitated in proportion to the target function, as a result of that estimating the desired parameters. A Gibbs sampling is just a method that does have these requisites.

The central idea in Gibbs sampling is that, instead of jumping to the next state at once, a separate small (probabilistic) jump is made for each parameter (k) in the model, where each choice depend on all the other parameters. The algorithm is given by:


with regard to z are the k parameters in the model, and T are the transitions or times that the model is sampled.

To sum up, Gibbs sampling walks through a k-dimensional state space. Every point in the walk is a collection of values for the random variables Z.

Weather forecast with a Markov Chain

Markov Chains are a computational tool useful for modelling systems made up of linked events. For example, take a simple weather forecast with three states: rainy, sunny, and cloudy. If we'd think about it we'll soon realise that the weather forecast system behaves differently to the tossing coin example given at basic statistical courses. In this real world example, the events are not independent of the previous state. For example, that today has been rainy might signal that tomorrow is gonna be rainy (specially if you live in England!). An structure of a Markov Chain for the weather forecast system can be build by drawing dots (aka states) and arrows (aka transitions), here is an example with made up probabilities for rainy, sunny and cloudy:


Double-check that because arrows are probabilities the number associated with them must lie between 0 and 1, (and the sum of all the arrows stemming from a dot must add up to 1!).

OK, now we have written down the Markov diagram we can fairly easily check the probability of tomorrows raining given that today is raining, 0.5. Besides, we may also be interested in which is the probability of rain in two day if it's cloudy today. To do so, we'd add up all the paths that lead from cloudy today to rainy in two days, which amounts to 0.42. This will soon become cumbersome to calculate by heart, that's why it's so convenient to arrange the Markov Chain in a matrix P that predicts tomorrow's weather, and use matrix arithmetic to calculate the day after tomorrow's weather, given by PXP = P^2


Similarly, P3 would give the probabilities for three days, and so on. "The entire future unfolds from this one matrix".

Given, the simple example given the successive powers of the matrix rapidly converge to a configuration in where all the columns and rows remain stationary:


There is a simple interpretation for that behaviour of the Chain. If we let the system evolve long enough the probability of a given state no longer depends on the initial state. In other words, knowing that today today is rainy may offer a clue on tomorrow's weather, but it's not much helpful in predicting the weather in one month. For such an extended forecast, we may as well consult the long-term averages, which is the values where the Markov Chain converges.

It's a pleasure introduce Markov Chains, but if you're looking for more information check out this resource where I have adapted the example from: First Links in the Markov Chain (https://raichev.net/markov/misc/markov_chain.pdf) .

October 18, 2018

A low–tech Monte Carlo technique to approximate π

Monte Carlo algorithms are used in solving involved integrals with no close-form solution. For no mathematicians this first sentence may have already appeared difficult and cumbersome. However, we should think of Monte Carlo techniques as a powerful ally in real difficult problems. Lets explore an easy example of Monte Carlo technique to get familiar with it. Suppose that you'd like to estimate the value of π. Draw the following perfect square on the ground and inscribe a circle in it:


Now take a bag of rice and scatter 20 grains uniformly at randominside the square:


Now assuming that the scattering was random the ratio between the circle's grains (C) and the square's grains (S) should approximate the ratio between the are of the circle and the are of the square given by:

C/S = π(d/2)^2/d^2

Solving for π we get:

π ~ 4C/S

Which in the approximation of our example is: 4*15/20 = 60/20 = 3.

We have approximated the value of π to be 3, not too bad for a Monte Carlo simulation with only 20 random points.

(The figure was adapted from https://towardsdatascience.com/a-zero-math-introduction-to-markov-chain-monte-carlo-methods-dcba889e0c50, and the text from GIBBS SAMPLING FOR THE UNINITIATED, go visit these resources if you'd like to learn more about MCMC)

September 25, 2018

QSP for AMR: Modelling how the drugs get into the bugs

Writing about web page https://warwick.ac.uk/fac/sci/eng/qsp-uk_network_satellite_conference

What's quantitative & systems pharmacology?

“Quantitative and Systems Pharmacology (QSP) is an emerging discipline focused on identifying and validating drug targets, understanding existing therapeutics and discovering new ones.”-Quantitative and Systems Pharmacology in the Post-genomic Era: New Approaches to Discovering Drugs and Understanding Therapeutic Mechanisms.

AMR is gaining increasing importance in healthcare settings. But what’s AMR?

Antibiotics interfere with the complex “machinery” inside the bacteria for example by interfering with its metabolism, slowing down their growth significantly, so they are less of a thread. Other antibiotics target DNA which prevents from replicating and prevent bacteria from multiplying ultimate killing them. Or by simply reaping the outer layer of bacteria to shred so their inside spill out dying quickly all of this without bothering body cells.

But now evolution is making things more complicated, by small random change, a small amount of the bacteria might find a way to protect themselves. For example, by intercepting the antibiotic and change the molecule so it becomes harmless or by investing energy in pumps that eject the antibiotic before they can do damage.

Bacteria have two kinds of DNA the chromosome and small floating parts called plasmid with which they can exchange useful immunities or in a process called transformation bacteria can harvest dead bacteria and collect DNA pieces. This even works between different bacteria species and can lead to superbugs: bacteria that are immune to multiple kinds of antibiotics. A variety of superbugs already exist in the world especially hospitals are the perfect breeding grounds for them.

As a society, we have to change habits on the use of antibiotics and keep them as a last resort drug. In addition, interdisciplinary research is needed to keep developing new antibiotics.

Additional topics:

XChem: new experimental opportunities for testing theory

This team is making breakthrough discoveries in the fields of macromolecular crystallography, imaging and microscopy, biological cryo-imaging, magnetic materials, structures and surfaces, spectroscopy, and crystallography, which are generating high-throughput data that may accelerate the discovery of new medicines.

Moreover, they are committed to open data standards and all 3D structures of human proteins that are being elucidated are published for data analyst to test potential novel therapies in-silico. Cancer related proteins including human protein kinases, metabolism-associated proteins, integratl membrane proteins and proteins associated with epigenetics are the focus of the team and more information can be found in their website.

Pharmacokinetic–Pharmacodynamic Modeling in Pediatric Drug Development, and the Importance of Standardized Scaling of Clearance

Since modelling can be used readily to extrapolate results in adults to children hereby avoiding clinical trials in children there is a huge interest by all stakeholders to clarify when that is an appropriate practice. In principle, extrapolation should be done whenever is reasonable to assume that children in comparison to adults have a similar disease progression, response to intervention, exposure-response. However, if the exposure-response is dissimilar but there is a PD measurement that can be used to predict efficacy in children it would is still possible to conduct partial extrapolation. The decision tree below summarises the idea:

Decision tree pediatrics (E. Germovsek et al.)

The history of paediatrics did not start taking into consideration the complex maturation that occurs in human beings. Instead, early in time dose was simply scaled down linearly with weight. This wrong practice lead to the occurrence of serious adverse event such as the gray baby syndrome and kernicterus. One of the first achievements in modelling the dose in paediatrics was the use of Body Surface Area, Crawford et al., which improves dramatically the efficacy and safety profile. More recently a combination of allometric weight scaling with a sigmoidal function has been proposed to describe the changes in Cl due to age and weight:


On the other hand, for extrapolation, we are instead aiming for the use of modelling techniques that comprises individual variation. A prominent example of that is Non-Linear-Mixed Effect Modelling (NLME) where all the study data are fitted simultaneously in one model, but the PK parameters may vary between individuals (VBI). This approach has become standard practice because it provides unbiased estimates through simultaneous estimation of parameter-level interindividual variability and observation level residual variability.

August 02, 2018

Data Challenge

Writing about web page https://www.kaggle.com/competitions

Today is the day to start a Data Challenge, during 5 days we are going to go through an Introduction to Data Science in Python, Regression Challenge in R and finally an introduction to Matlab on Friday. I hope this will be very enjoyable!

Challenge in Python : Data cleaning

This is presented as an introduction to Python the first challenge is to explore the dataset, it is fairly easy, however as you know, easier things are the best to learn and understand, of course, Leonardo Da Vinci before painting La Mona Lisa needed first to learn to draw.

The dataset that I have chosen is the Adverse events dataset, many other are available in Kaggle, for example here: https://www.kaggle.com/rtatman/fun-beginner-friendly-datasets/

The solution consists of loading the data, and use describe().

We need to note that describe() only works on continuous variables if we instead are interested in, for example, categorical variables we can use count().

Here you can find my solution!


Challenge in R: Regression modelling

Regression is the model of output variables (y) from input variables (x) . There are many different ways to model regression and an important kind of regression are so called "generalised linear models".

Three kinds of regression are:

-Linear: Prediction of a continous variable.

-Logistic> Prediction of a categorical variable, for example a binary output 0, 1.

-Poisson: Prediction of a count variable.

Github repo

This is a link to the progress line of this challenge, where I will upload all the problems for this challenge.


June 2019

Mo Tu We Th Fr Sa Su
May |  Today  |
               1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Search this blog



Blog archive

RSS2.0 Atom
Not signed in
Sign in

Powered by BlogBuilder