Earlier on, we learnt about the Acceptance Rejection Sampling method as a subset of Monte Carlo methods.

The Monte Carlo method is unique as it directly makes use of random numbers inputted into the distributions to see if it fits and allowed us to compute the definite integral.

  1. However, this is costly in terms of time to generate and input numbers
    1. As the number of trials conducted (N) increases, the errors of the monte-carlo method reduce by a factor of \(\frac{1}{\sqrt{n}}\)
    2. In other words, reducing errors by a factor of 10 would require 100 times more samples, leading to an “exponential” requirement for the number of samples just to reduce ⬇️ error
  2. Additionally, each random number (from a simulation) has a variance which limits the precision of the simulated result

Variance Reduction helps to address this!

Pointer to take note!!:

  • You cant get the variance reduction for the integrand (integral of function you want to integrate), only for estimates of a function (eg. monte carlo function for estimation)

Common Random Numbers (CRN)

Imagine conducting two slightly different experiments in chemistry under similar conditions (same r.t.p and procedures) with a similar objective.

If there are any differences in output results, it will likely be due to different experimental methods!

We can implement this in Kotlin by first creating two different functions. They reflect the different “experiments” which we will sample. These functions are described in Kotlin in the S2 library as UnivariateRealFunction which are functions \(y =f(x)\) which we define that take a “real” input and outputs a “real” value.

To define the “experiements”, we override the existing evaluate method with our own custom function.

					%use s2

var f = object : AbstractUnivariateRealFunction(){
    override fun evaluate(x: Double): Double{
        var fx: Double = 2.0 - Math.sin(x)/x
        return fx

var g = object : AbstractUnivariateRealFunction(){
    override fun evaluate(x: Double): Double{
        var gx: Double = Math.exp(x*x) - 0.5
        return gx
var X1 = UniformRNG()

Next, we will then implement the Variance Reduction method using Common Random Numbers. The first one will use two independent streams of random numbers X1 and X2 (through the system h).

Meanwhile, the second implementation will be using correlated sequences of X1 (which is the same sequence used).

					var X2 = UniformRNG()

var h = object : AbstractUnivariateRealFunction(){
    override fun evaluate(x: Double): Double{
        return X2.nextDouble()

var crn1 = CommonRandomNumbers(f,g,X1,h)
var estimator1 = crn1.estimate(100_000)
println("d = %f, variance = %f".format(estimator1.mean(),estimator1.variance()))

var crn2 = CommonRandomNumbers(f, g, X1) // use X1 for both f and g
var estimator2 = crn2.estimate(100_000)
println("d = %f, variance = %f".format(estimator2.mean(),estimator2.variance()))

We run the code above. Notice how the variance for the second estimator is lower than the first, indicating that it is more accurate.

					d = 0.091600, variance = 0.227742
d = 0.092050, variance = 0.182357


The second CRN uses the same sequence of random numbers \(X_1\) and leads to better accuracy then two independent lists of random numbers \(X_1\) and \(X_2\). This is likely because both rngs are the same, leading to the outcome of the functions \(f(X_1), g(X_2)\) being more related (co-variance) and any differences are due only to the functions and not the random numbers.

function f in blue: \(2.0-\frac{\sin{x}}{x}\)

function g in purple: \(x^2-0.5\)

We can see that these two streams of numbers have \(Cov(f(X_1), g(X_2))\) > 0, as between x=0 and x=1, both functions \(g\) and \(f\) are increasing.

Now, try changing the function f or g such that their co-variance is negative, (such as using -sin(x)+2). What happens?

The main usage behind CRN comes when we have 2 functions, \(f\) and \(g\), and we aim to deduce the expectation of f-g. Think for example, a study which aims to model extreme weather patterns☁️. In order to ensure that the simulations used in the models are accurate, we aim not to evaluate each model separately, but by how easy it is for one can get differences between the outputs of the two models for the same scenario!

In order for such a result to be trustable, we need the variance \(var(f-g)=var(f)+var(g)+2cov(f,g)\) to be small. This is achieved by having \(2cov(f,g)\) to be as small as possible, where the 2 functions are unrelated.

Now: Try changing the earlier functions to fit this criteria and see what happens to the variance!

Antithetic Variables

If X and Y are independent, then :

\(var(X + Y) = var(X) + var(Y) + 2cov(X,Y)\)

Covariance of two variables in a Monte Carlo Simulation is given by:

\(Cov(X,Y) = p_x,_y \times\sigma_x\times\sigma_y\) and \(p\) lies between -1 and 1.

If P is a negative value, then the variance of X + Y is smaller. This is the fundamental idea behind Antithetic Variables.

Another idea to look at it is that we create two “samples” from one in one trial of the Monte Carlo Simulation simply by calculating the “opposite path” . Since we already know the variance of the “original or primary “path, we can have 2x no. of samples while having the same variance.

The following code computes the integral \(I=\int_0^1 \frac{1}{1+x}dx\) = \(ln 2\)

					%use s2

var f = object : AbstractUnivariateRealFunction() {
    override fun evaluate(x: Double): Double {
        var fx = 1.0 / (1.0 + x)
        return fx

var uniform = UniformRNG()

var av = AntitheticVariates(f, 

var estimator = av.estimate(1500)
println("mean = %f, variance = %f".format(
					mean = 0.693248, variance = 0.000582


Control Variates – method to reduce variance of monte carlo estimate

Given a distribution \(X\) which consists of a sequence of Random Variables \(X_i\). In order to find the expected value or “mean” of the distribution, we take samples of the distribution.

By dividing the total (summation of the each sample’s value) by the number of samples taken, giving us the average of the distribution.

\(E[X] \approx \frac{1}{n}\sum_{i=1}^{n}X_i\)

As the name suggests, we then create a control variable which relates to this distribution \(X\)
We let the term C be the name for the control variable, with a mean E(C). b is just a constant – which can take on any real number

\(Y = X-b(C-E(C))\)

The expected value (aka mean) of resultant value Y is \(E(Y)=E(X)\) because the term \(E(b(C-E(C)))=0\)  since \(C-E(C)=0\) and \(b\) is not a distribution, and hence has no mean.

Now that we know that the resulting distribution Y can take on different values while still having the same expected value as X, we can continue to tweak the values of \(C\) and \(b\) to reduce the variance of \(Y\) to be lower than that of the original distribution \(X\)

The variance for instance, of Y is

\(Var(Y) = \sigma_X^2+b^2\sigma_C^2-2bCov(X,C)\)

Based on this, we need the value of \(2bCov(X,C)\) to be large but smaller than that of \(\sigma_X^2+b^2\sigma_C^2\).

In other words:
– \(b > 0\) and
– a large value of \(Cov(X,C)\)

					//Function f which represents 1/(1+x)
var f = object : AbstractUnivariateRealFunction() {
    override fun evaluate(x: Double): Double {
        var fx = 1.0 / (1.0 + x)
        return fx
//Function g which represents known integral of (1+x)
var g = object : AbstractUnivariateRealFunction() {
    override fun evaluate(x: Double): Double {
        var gx = 1.0 + x
        return gx

var uniform = UniformRNG()

var cv = ControlVariates(f, g, 1.5, -0.4773, uniform)
var estimator = cv.estimate(1500)

println("mean = %f, variance = %f, b = %f".format(
					mean = 0.692015, variance = 0.000601, b = -0.472669


Importance Sampling

In a normal distribution, more values cluster towards the mean, and gradually fall off with increased distance from the mean. Thus values, which are closer towards the mean of the distribution help to contribute in creating the iconic “bell”-shape we recognise!

If we want to reduce the variance of such a normal distribution, we can “cheat” by sampling only values that contribute towards creating this shape, by sampling them more frequently. This is importance sampling.

Idea of Importance Sampling

Firstly, let’s say that we want to compute an approximation of an integral of the function h(x):
\(I = \int_{a}^{b}h(x)dx\)

The probability density function for the function h(x) is given by \(f_x(x)\), defined over the interval [a,b].

It is given that the expected value of a function of a random variable \(E[h(x)]\) with probability density function \(f_x\) is:

\(E[h(x)]= \int h(x)f_x(x)dx\)

  • h(x) represents the the function which we aim to calculate the expectation of.

However, the smart and intuitive part comes here. Since we find it difficult to sample from the PDF, we can make use of another function \(g\)

\(\begin{aligned} &= \int h(x)f_x(x)dx \\&= \int h(x)\frac{f_x(x)}{g(x)}g(x)dx \\ \end{aligned}\)

The “weight” which allows us to control the samples taken from distribution \(g(x)\) is \(\frac{f_x(x)}{g(x)}\).

Example: Understanding the usefulness of importance sampling for monte carlo integration 

reference:  Scratchapixel

To approximate the following integral:

There are 2 possible methods using monte carlo integration:

  1. a uniform distribution
    1. aka. shape does not change, just a straight line
    2. We can use \(y=\frac{2}{\pi}\)
  2. distribution with probability density function close to that of sin(x)
    1. aka shape of the distribution is similar to that of sin(x)
    2. We can use \(y=\frac{8x}{\pi^2}\)
    3. This is the main idea of importance sampling!!

To implement it for the uniform distribution, we conduct monte carlo approximation as shown by the following estimator:


  • where \(X_i\) is drawn from the distribution from \(i=0\) to \(i=N-1\)
  • Since N is a very large number, we may not get a low enough variance without a high enough N

To implement monte carlo integration for the second method, we can use inverse transform sampling (ITS) method. Since the second distribution is non-uniform, ITS will allow us to generate random numbers which reflect the distribution that underlies it.

ITS involves first computing the CDF of the function \(y=\frac{8x}{\pi^2}\) and then inverting it. This gives us \(\frac{\pi}{2}\sqrt{\epsilon}\) where \(\epsilon\) is the random number \(y=\frac{4x}{\pi^2}\).

Read here for full derivation.

%use s2

var rng = UniformRNG()
val PI = 3.14159

val N = 16

val num_estimates = 20
var ErrorUniform = DoubleArray(num_estimates)
var ErrorImportance = DoubleArray(num_estimates)

for (n in (0..num_estimates-1)){
    var sumUniform = 0.0
    var sumImportance = 0.0
    for (i in (0..N-1)){
        var r = rng.nextDouble()
        sumUniform += sin(r*PI*0.5)
        var xi = sqrt(r)*PI*0.5
        sumImportance += sin(xi)/((8*xi)/(PI*PI))
    sumImportance *= 1.0/N
    ErrorUniform[n] = sumUniform
    ErrorImportance[n] = sumImportance
    println("%f %f".format(sumUniform, sumImportance))
					0.967494 1.004909
0.978602 1.008780
0.995603 1.000437
0.939952 1.028096
1.203577 0.950478
1.253986 0.921405
1.132137 0.968181
0.984656 1.007120
1.163299 0.960258
0.844957 1.042825
0.804111 1.058342
1.002740 1.000375
1.071653 0.989673
0.773341 1.057234
1.092251 0.966760
0.887380 1.029002
1.086986 0.972768
0.835032 1.049370
0.869526 1.032241
1.040903 0.989092
%use lets-plot,s2

val data = mapOf<String, List<*>>(
    "x" to List(num_estimates) { i:Int-> i }+List(num_estimates) {i:Int-> i}, 
    "combined Errors" to ErrorUniform.toList()+ErrorImportance.toList(),
    "method" to List(num_estimates) { "Uniform" }+List(num_estimates) { "Importance" }, 
// construct a line plot
val plot = ggplot(data) { x = "x"; y = "combined Errors"; color="method" } + // supply data; set legends
    ggsize(500, 250) + // plot size
    geom_line() // draw a basic line plot

As we can see from the line graph above, the importance sampling method fluctuates lesser (lower variance) compared to uniform sampling, and is much closer to the true value of 1 (value when you integrate \(sin(x)\) from 0 to \(\frac{\pi}{2}\))

We can also implement importance sampling in a more simplified manner NMDEV S2. According to the function constructor, it takes in a univariate function h, w and random number g, similar to the equation above. We can make use of the estimate method to solve this problem.