In the previous chapter, the generated numbers have a chance of occurring evenly. But what if we want some numbers to occur more frequently than others? This idea of assigning weights to some numbers more than others is called a probability distribution.

Probability distributions are often used in simulations of real-life phenomena!

A probability distribution is a mathematical function that gives probabilities of occurrence of different possible outcomes for an experiment.

Types of Random Variables

and functions 📈 to illustrate them

Discrete Random Variables(DRV) 

  • A kind of variable that can only take on specific values, such as integers 0 and 1.
  • Think any classic game with a dice 🎲. A dice is a quick and easy way to generate random numbers where every possible integer numbers has a equal chance of appearing!

The probability mass function (PMF) for a discrete value describes the probabilities for each event occurring. For instance, let us create a probability table for likelihood of a dice roll, where x is the outcome of the dice roll.

\(\Pr(X = x)\) or PMF\(\frac{1}{6}\)\(\frac{1}{6}\)\(\frac{1}{6}\)\(\frac{1}{6}\)\(\frac{1}{6}\)\(\frac{1}{6}\)
  • Since the probability of any given value stays constant for this particular scenario, the PDF stays the same at 1/6 for each possible outcome.
  • Notice that there are only 6 possible outcomes, thus this is a kind of discrete variable!

Additionally, the cumulative distributive function (CDF) for a discrete value measures the probability that a chosen value or values less than the chosen value occurs in the sample distribution. Adding on to the table earlier, we get:

\(\Pr(X \leq x)\) or CDF\(\frac{1}{6}\)\(\frac{2}{6}\)\(\frac{3}{6}\)\(\frac{4}{6}\)\(\frac{5}{6}\)\(\frac{6}{6}\)

From a visual perspective, the CDF is useful as it allows us to easily find outliers(1) or clusters(2) of data.

Previous slide
Next slide
Instead, we can look at the valley between steeper gradients in the CDF to find clusters where data occur

Previous slide
Next slide

Continuous Random Variables (CRV) 

Another kind of variable, known as a continuous random variable, which can take on infinitely many values, between a specified maximum and minimum.

  • Eg. time taken to travel from one place to another is a continuous variable as it can take on infinite possible duration of times (seconds, milliseconds, so on.)

For any continuous random variable, which can take on an infinite range of values, the probability that a generated random number converges to a specific value is 0.1 Hence, the probability density function (PDF) for a continuous random variable* gives the probability of a random variable falling within a particular range of values in the sample distribution.

This is shown as the area under the continuous variable’s probability density function, as follows:

\(\Pr(a\leq X \leq b) = \int_{a}^{b}{f_x(x) dx} \)


  • is the integral or area from \(x=a\) to \(x=b\)
  • \(dx\) integrate with respect to \(x\)
  • \(f_x(x)\) is the probability density function

1  \(f_x(X = x) = 0\) where \(f_x\) represents the probability density function.

Meanwhile, the Cumulative Distribution Function (CDF) of a continuous random variable is a function derived from the probability density function for a continuous variable (as shown above). The cumulative distribution function \(F_X(x)\) is defined for any random variable \(X\) as 

\(F_X(x) = \Pr(X\leq x)\)

which is the probability that any random variable \(X\) is less than or equal to \(x\)

How is PMF&CDF related for DRV and PDF&CDF for CRV?

For a continuous random variable (CRV)
If we replace the equation for the PDF of a CRV with the one above, we get :

\(\begin{aligned}\Pr(a\leq X \leq b)&=\Pr(X\leq b)-\Pr(X\leq a)\\&=F_X(b)-F_X(a)\end{aligned}\)

This means that the definite integral (for a continuous random variable) probability density function is the same as the differences of the cumulative distributive functions.

A visual understanding would be to look at the Probability Density Function (PDF) of a CRV. If we take the total area under the graph, it would add up to 100% or 1.

On the other hand, the Cumulative Distributive Function (CDF) gives the area under the probability density function (the line) from either – to \(x\), the chosen value.

Often in statistics, we want to query areas quickly under the Probability Density Function, eg. \(\mu-\sigma\le x\le \mu+\sigma\) (1 standard deviation from the mean). Instead of computing individual CDF, we can look for the PDF directly. Thus, the PDF is usually more frequently used.

For a DRV,
The value of \(F_X\) (aka the CDF) increments at specific intervals2 equivalent to \(P(X=x)\) (aka the PMF) at all possible values of X.

2 Recall in the probability table earlier for a dice 🎲 that for a DRV, the CRV increments at 1/6 or the individual probabilities

Wait, why is the P(X=x) called PMF and not PDF for a discrete random variable?

Probability Mass Function (PMF) is often confused with Probability Density Function(PDF). We will see why PDF is usually not used to describe DRV below!

First we take a look at the distribution for a continuous random variable. The units of the y-axis is a kind of “probability per unit length” as it is the output of the probability density function.

The area between a and b is simply the integral of the probability density function.

For a discrete random variable, the units of the y-axis is simply a probability.

The area between a and b is taken by adding up individual probabilities for values between a and b, which is like integrating the probabilities over a region x. 

In the distribution for our discrete random variable, our y-axis is now only a probability yet it represents our “density function” over an object. 

We thus term this function (“finding area of the probability”) as a probability mass function instead, similar to how integrating density gives rise to a mass in physics

The linear problem: A Thought Experiment 🤔

Sometimes, not all random values have an equal chance of being chosen.

Imagine for instance if we were to attempt to simulate the potential of Covid-19 variant to spread through a city population.

Modelling complex daily interactions has to take into account multiple random numbers simulating the timings or places where different individuals interact.

Adding effects of pandemic public health measures – 💉😷🧍▫️▫️🧍 will further adjust the distribution of probabilities for random numbers, which will affect the random numbers generated as time progresses in the simulation. 2

One method to generate random number samples that reflects the underlying distribution is to use inverse transform sampling (ITS). ITS is a method to generate samples at random from any probability distribution given its cumulative distribution function.

2 For those interested, a simulation has been conducted on the spread of Covid-19 in Singapore using various ways of generating random numbers.

How ITS works

Continuing from our earlier example of simulating the spread of Covid-19. Let’s model the number of people who visit the MRT on a daily basis. We can model the daily number using a Poisson Distribution. The poisson distribution expresses probability of X no. of events occuring in a given duration if these events occur with (1) a known constant rate (eg. frequency) and (2) each event is independent of the previous one.

1 While the second assumption is likely to be incorrect, we will use it as a scenario for now.

Let’s visualize how ITS works!

First, we plot the equation of the Poisson distribution in blue – which is given by:

\(\Pr(X{=}k)= \frac{\lambda^k e^{-\lambda}}{k!} \)


  • \(k\) is the expected integer number of occurrences
  • \(\lambda\) is the average number of events per interval

Next, we plot the continuous distribution function of the Poisson distribution in green:

\(e^{-\lambda} \sum_{i=0}^{\lfloor k\rfloor} \frac{\lambda^i}{i!} \)

\(\lfloor k\rfloor \) is the floor function of k

a floor function denotes the greatest integer \(\le\) x

  • The CDF is not a smooth curve as it takes on only integer values (desmos calculates only individual points for this function).
  • Try drawing a horizontal line (y = ?) to find the intersection for the different random numbers possible on the axis
  • There is a higher probability for getting values 0 < x < 1 compared to values x > 3 (for this lambda value) as the probabilities are very slim
    • This is shown in the image below:
In other words, each additional movement of the line y = ? produces a much larger change in the corresponding x value (which is reflected in the decrease in gradient of the CDF). 

How can we make random numbers given a certain probability using this distribution? 🤔

I know, we can simply “swap” X and Y in the CDF function above, by performing the inverse. This makes the probability the independent variable and a random number the output (dependent variable). 

Drawbacks of this method

However, notice that if we reduce the value of λ, the values become more clustered. In order to get an accurate average, we are forced to take more samples, which will take more time ⏱️ (which is computationally intensive)

Other methods: The Acceptance Rejection Sampling method

This method comes from a broader class of methods grouped under the Monte Carlo Simulations.

Imagine wanting to find the radius of a circle. We get a perfectly circular dish with radius X cm and square with sides length X cm. We lay these two dishes a distance away from each other, and begin by randomly depositing marbles over the areas of similar mass into the dishes. By the end of the experiment, we can take the value of

\(\frac{\textrm{mass of circular dish}}{\textrm{mass of square dish}}\approx\frac{\pi X^2}{X^2}\approx\pi \)

This example can be seen at

Similarly, the Acceptance Rejection Sampling Method replaces the dishes with the probability density function of a variable, and samples uniformly within the maximum boundaries of the graph. It rejects values that fall outside of the distribution, and returns x-values which fall within the graph.

Intuitively, one realises that if the distribution is highly limited or concentrated in area, there may a lot of unwanted values during sampling.

We will be implementing this method to solve some problems when we sample from multivariate distributions as well as in the quiz, so stay tuned!

Main Ideas of this chapter!

  • random numbers generated from random variables fall into 2 types – discrete and continuous
  • we can use functions such as Probability density function and cumulative distributive functions to analyse characteristics of how these random variables are related to their distributions
  • inverse transform sampling and monte carlo methods (eg. ARS) are one possible way to generate numbers that follow a certain distribution