IA Probability – Full Text | Micfong's Space

← IA Problems – Full Text

These are Zixuan’s notes for Part IA – Probability at the University of Cambridge in 2026. The notes are not endorsed by the lecturers or the University, and all errors are my own.

The latest version of this document is available at academic.micfong.space. Please direct any comments to my CRSid email or use the contact details listed on the site.

This document is typeset using Typst. All figures are created using Inkscape.

Lecture 1 · 2026-01-23

1 Basic Concepts

We shall begin by general definitions in probability theory.

1.1 Probability Spaces

Definition 1.1 (Probability Space)

Suppose is a set and is a collection of subsets of . Then we call a -algebra if

[ the whole space is in . ]
If , then [ closed under complements. ]
If is a countable collection of sets in , then [ closed under countable unions. ]

Let be a -algebra on . A function

is called a probability measure if

Countable additivity. For any countable disjoint collection of sets in , we have

We say to be the probability of the event .

We call a probability space.

Definition 1.2 (Outcomes and Events)

Suppose is a probability space.

The elements of are called outcomes, and the elements of are called events.

Remark. We talk about probabilities of events instead of outcomes.

When is countable, we take to be the power set of .

Proposition 1.3 (Properties of Probability Measures)

Suppose is a probability space. Then for any , we have

Example 1.4

Consider rolling a fair die. We have
Consider

This models the experiment of picking a uniformly random element from . We have
Consider picking balls from a bag. Suppose we have balls labelled . We pick balls at random without replacement. Then we have
Consider a well-shuffled deck of 52 cards. Then
Largest digit problem. Consider a string of random digits from . Then

Then we define

And so

Another example is the event

Then
Birthday problem. There are people. What is the probability that at least two people share a birthday?

We may assume that there are 365 days in a year, and each person’s birthday is equally likely to be any of the 365 days, independently of other people.

We have

Then

Let be the event that at least two people share a birthday. Then we have

Thus

1.2 Combinatorial Analysis

Let be a finite set with .

We want to partition into disjoint subsets with and . Consider the number of ways to do this.

Let be the number of ways to do this. Then
Let . We say that is
- strictly increasing if
- increasing if
Each strictly increasing functions can be determined by its range. Hence the number of strictly increasing functions is .

Lecture 2 · 2026-01-26

For increasing functions, define a bijection

by

Hence the number of increasing functions is .

1.3 Stirling’s Formula

Notation. Let , be 2 sequences. We write

if as .

Theorem 1.5 (Stirling's Formula)

As , we have

We shall first consider a weaker statement.

Proposition 1.6

Proof. Define

For , we shall write as the integer part of . Then

Integrating from to , we have

Dividing by , we have

Proof. [of Stirling’s Formula 1.5, non-examinable proof] For all that is twice differentiable, for all we have

This can be checked using integration by parts twice.

Take . Then we get

Therefore,

where

We have

Now, for ,

So the series converges. Let

Therefore,

We have shown that

It remains to show that . We already know that

Consider

We shall use a different method to show that

This would imply that .

Consider

Using integration by parts, we have the recurrence relation

Thus .

Similarly,

Now, if we have that as , then we have

In order to show that as , we note that

Recall that

Since is decreasing in ,

This completes the proof.

Lecture 3 · 2025-01-28

2 Axiomatic Approach

Recall our definition of probability spaces in Definition 1.1.

2.1 Probability Measure

There are several important properties of the probability measure .

Proposition 2.1 (Countable Subaddivity)

If is a collection in , i.e. , then

Proof. Define and for ,

Then is a disjoint collection in and .

because for all .

Proposition 2.2 (Continuity of Probability Measures 1)

Let where and for all . We call an increasing sequence in .

Then is an increasing sequence in and converges to .

Proof. Define , and for ,

Then is a disjoint collection, and

Now we also have

Hence

Proposition 2.3 (Continuity of Probability Measures 2)

Let where and for all . We call a decreasing sequence in .

Then is a decreasing sequence in and converges to .

Proof. Taking complements, we have that is an increasing sequence in . By Continuity of Probability Measures 1 2.2, the result follows.

2.2 Inclusion-Exclusion Formula

Proposition 2.4 (Inclusion-Exclusion Formula)

Let . Then

In general, for events , we have

Proof. We shall prove this by induction on . The base case has already been shown.

Assume that the formula holds for events. We shall show it holds for events.

By induction hypothesis,

Plugging these into the previous equation gives the result.

Lemma 2.5 (Bonferroni Inequalities)

Let . We have

Let be events in . Then for any ,

Proof. We shall prove this by induction. The base case is clear.

Suppose the result holds for events. We shall show it holds for events.

Assume that is odd. Then

By the induction hypothesis, since is odd,

Substituting these into the previous equation gives the result. The case where is even is similar.

Remark. If is a finite set and is a probability space with , and define

Take , then by the inclusion-exclusion formula on , we can get

This is the inclusion-exclusion principle in combinatorics.

Lecture 4 · 2025-01-30

Example 2.6 (Counting Surjections)

Let

We want to find .

Define

Then

Note that we have

Example 2.7 (Counting Derangements)

A derangement is a permutation with no fixed points. Let

Pick a permutation at random. Consider the probability that it is in . For , let

Then

Using Inclusion-Exclusion Formula 2.4, we have

Therefore,

Note that as , .

2.3 Independence

Definition 2.8 (Independence of Events)

Let be a probability space. Let . We say that and are independent if

We write .

Let be a collection of events in . We say that are mutually independent if for any finite subset ,

Remark. Pairwise independence does not imply mutual independence.

Example 2.9

Consider tossing a fair coin twice. Let

Define events

Then we have

Thus, are pairwise independent but not mutually independent.

Proposition 2.10

If , then .

Proof.

2.4 Conditional Probability

Definition 2.11 (Conditional Probability)

Let be a probability space. Let where . The conditional probability of given is

In particular, if , then

Proposition 2.12 (Countable Additivity for Conditional Probability)

Let is a disjoint sequence in . Then for some where ,

Proof.

Proposition 2.13 (Law of Total Probability)

Suppose is a disjoint collection if such that and for all . Then for any ,

Proof.

Proposition 2.14 (Bayes' Formula)

Consider to be a collection of disjoint events in such that and for all . Then for any where ,

Proof. By Conditional Probability 2.11,

By Law of Total Probability 2.13, we have

Also,

Plugging these into the first equation gives the result.

This formula is the basis of Bayesian statistics; if we know the probabilities of and we have a model which gives us , then we can compute the posterior probability .

Example 2.15 (False Positives for a Rare Condition)

Suppose that a rare condition affects of the population, i.e. . A test for the condition has a true positive rate and a false positive rate. [For affected individuals, the test is positive with probability , and for unaffected individuals, the test is positive with probability .]

An individual takes the test and tests positive. Consider the probability that they actually have the condition.

Define

We have

Then

Note that we can rewrite this as

Since and are both very close to , so the relavent ration is approximately

But and hence the posterior probability is still quite small.

Lecture 5 · 2026-02-02

Example 2.16

Consider the probability that a person have 2 boys given that,

They have exactly 2 children, one of whom is a boy.

Since no further information is given, we assume that all outcomes are equally likely.

Let

We want to find
They have exactly 2 children, the elder of whom is a boy.

We want to find
They have exactly 2 children, exactly one of them is a boy, who was born on a Thursday.

Let denote the event that a child is a boy born on a Thursday, and denote the event that a child is a boy not born on a Thursday. We want to find

Example 2.17 (Simpson's Paradox)

Suppose the following data is collected for applicants to a Cambridge college:

		Admitted	Rejected	Admission Rate
All Applicants	State	25	25	50%
All Applicants	Independent	28	22	56%
London Schools	State	15	22	41%
London Schools	Independent	5	8	38%
Cambridge Schools	State	10	3	77%
Cambridge Schools	Independent	23	14	62%

Note that within each school type, state school applicants have a higher admission rate than independent school applicants. However, when considering all applicants, independent school applicants have a higher admission rate than state school applicants.

This phenomenon is called confounding in statistics. It arises when we aggregate data from disparate populations.

In terms of conditional probability, let

We have

but

Note that

Now, if we (falsely) assume that , then

Nonetheless, this does not hold in our example.

2.5 Discrete Probability Distributions

Definition 2.18 (Discrete Probability Distribution)

Let be a probability space, where is countable with

If we know for all , then ,

In this case, we say that a discrete probability distribution.

We write for all .

Proposition 2.19 (Properties of Discrete Probability Distributions)

Let be a discrete probability distribution on a countable set . Then

for all .
.

Let us see some examples of discrete probability distributions.

Definition 2.20 (Bernoulli Distribution)

A Bernoulli distribution models the outcome of a single binary experiment, such as a biased coin toss.

The Bernoulli distribution with parameter is defined by

Definition 2.21 (Binomial Distribution)

A binomial distribution models the number of successes in independent Bernoulli trials, each with success probability .

The binomial distribution with parameters and is defined by

where is called the binomial distribution. Note that in the case of a coin toss,

Definition 2.22 (Multinomial Distribution)

A multinomial distribution models the number of occurrences of each outcome in independent trials, each with possible outcomes with probabilities .

The multinomial distribution with parameters and where is defined by

For example, suppose that there are boxes and we throw balls into these boxes independently, where each ball lands in box with probability . Then

Lecture 6 · 2026-02-04

Definition 2.23 (Geometric Distribution)

A geometric distribution models the number of Bernoulli trials needed to get the first success, where each trial has success probability .

The geometric distribution with parameter is defined by

Note that,

This models, for example, the number of times we need to toss a biased coin with heads probability until we get the first heads.

Definition 2.24 (Poisson Distribution)

A Poisson distribution models the number of events occurring in a fixed interval of time or space, where these events occur with a known constant mean rate and independently of the time since the last event.

The Poisson distribution with parameter is defined by

Note that

Proof. Consider the number of customers arriving at a shop in a time interval . We can discretise the time interval to with , where each interval has a small probability of a customer arriving. Then

3 Discrete Random Variables

Definition 3.1 (Random Variable)

Consider a probability space . A random variable is a function , satisfying that ,

We write and , .

Definition 3.2 (Indicator)

Let . The indicator of is a random variable defined by

Definition 3.3 (Probability Distribution Function)

Suppose is a random variable. Define the probability distribution function of to be

where .

Definition 3.4 (Multidimensional Random Variable)

is called a random variable in if

is a function such that ,

Equivalently, all are real random variables.

Definition 3.5 (Discrete Random Variable)

A random variable is called discrete if its range is countable.

Suppose it takes values in a countable set , then for every we write

We call the probability mass function of , or the distribution of . Note that ,

Recall that two events and are independent if .

Definition 3.6 (Independence)

Let be discrete random variables with values in . They are independent if for any ,

Example 3.7

Consider tossing a -coin times independently. let with , where

and is the result of the -th toss. Then

For all define random variables with . Then

Note that is a Bernoulli random variable with parameter .

Claim. are independent random variables.

Proof.

Lecture 7 · 2026-02-06

3.1 Expectation

Definition 3.8 (Expectation for Non-Negative Discrete Random Variables)

Let be a countable set and be a discrete random variable.

For , the expectation of is defined by

Alternatively, consider

Then

So the expectation of is the weighted average of the values taken by , with weights given by the probabilities of taking those values.

Example 3.9

Consder . Then ,

Then, for the expectation,

Example 3.10

Consider with . Then for all ,

Hence

Notation. Let be a random variable. Define

Then

Definition 3.11 (Expectation for General Discrete Random Variables)

Suppose is discrete. We can define and as in Definition 3.8.

If at least one of and is finite, then we can define the expectation of by

Otherwise, is not defined.

Definition 3.12 (Integrable Random Variable)

A random variable is integrable if .

Proposition 3.13

If is well-defined, then we have

Remark. We shall assume that whenever we write , it is well-defined.

Proposition 3.14 (Properties of Expectation)

If , then .
If and , then .
If , then and .
If and are random variables, then .
Let and be integrable random variables. Then
Suppose that are non-negative random variables. Then

Proof. Suppose that is countable.

For (6), we have

Example 3.15

Let . Then

Proposition 3.16

Let and consider defined by . Then is a random variable, with

Proof. Let . Then we have

Hence,

Proposition 3.17

Suppose and takes integer values. Then

Proof. Note that for any ,

Since and , we have

Taking expectation on both sides, we get

Lecture 8 · 2026-02-09

With expectation, we can form another proof of the inclusion-exclusion formula.

Proof. [of Inclusion-Exclusion Formula 2.4] Let . Then

More generally, if we have , then

Taking expectation on both sides, since , we get

3.2 Variance and Covariance

Definition 3.18 (Moment)

Let be a random variable and . We call the -th moment of , as long as it is well-defined.

Definition 3.19 (Variance)

The variance of is defined by

The variance is a measure of concentration of the distribution of around its expectation. The smaller the variance, the more concentrated the distribution is around its expectation.

Definition 3.20 (Standard Deviation)

The standard deviation of is defined by .

Proposition 3.21 (Propositions Of )

.
If , then .
If , then and .
.
, with the minimum attained at .

Proof. For (4), we have

For (5), define . Then

Hence, taking derivative with respect to , we get

Therefore, if and only if . Moreover, , so is convex and the minimum is attained at . Hence

Example 3.22

Consider . Then and

Note that

Hence,
Consider . Then and

Hence,

Definition 3.23 (Covariance)

Let and be random variables. The covariance of and is defined by

It is a measure of the dependency between and .

Proposition 3.24 (Properties of Covariance)

.
.
.
Let . Then and .
.
Let . Then .
Let be random variables. Then .

More generally, for all , we have

In particular,

Proof. For (3), we have

For (5), we have

Recall that if are discrete random variables, then they are independent iff for all ,

Proposition 3.25

If are independent, then is independent of .

Proof. We need to show that

We have

Lemma 3.26

Let and be 2 independent random variables, and . Then

Proof. Let and define . Then

Lemma 3.27

Let and be 2 independent random variables. Then

Important. The converse of the above lemma is not true.

Example 3.28

Let be independent random variables. Let

Then . Moreover,

So,

However,

Hence and are not independent, even though .

Lecture 9 · 2026-02-11

Corollary 3.29

Let be independent random variables. Then

Example 3.30

Consider random variables . Then we have as seen before.

We also know that where are independent and .

Hence, by the above corollary, we have

3.3 Inequalities

Proposition 3.31 (Markov's Inequality)

Let be a non-negative random variable and . Then

Proof. Observe that

Taking expectation on both sides, we get

Proposition 3.32 (Chebyshev's Inequality)

Let be a random variable and . Then

Proof. Note that

Proposition 3.33 (Cauchy-Schwarz Inequality)

Let and be random variables. Then

Proof. Assume that and , otherwise there is nothing to prove. Then

Assume that and , otherwise this is the trivial case. Assume WLOG that and are non-negative.

Let and consider . Then

Then

This function is minimised at where . Hence, using the fact that , we get

Remark. The equality holds when , so

Definition 3.34 (Convex Function)

A function is called convex if , , we have

Proposition 3.35 (Jensen's Inequality)

Let be a random variable and be a convex function. Then

Proof. We first need an additional claim.

Claim. Let be a convex function. Then is the supremum of all the lines below it, i.e. for all , such that

Proof. Let and . Then for some , we have

By the convexity of , we have

Hence, using the fact that , we get

Let . Then ,

So, this gives that ,

Hence, we can take and get the desired result.

Let . By the claim, such that

Then we have . Taking expectations on both sides, we get

Remark. For the equality case, let be convex with the extra property that for , , such that for all and .

We wish to find a condition of such that we have

We have . Taking expectations gives

Hence, the equality forces .

Since , we must have . By the extra property of , this is equivalent to .

Lecture 10 · 2026-02-13

Proposition 3.36 (Am-GM Inequality)

Let , then

Proof.

Claim. Let be a convex function. Then , we have

Proof. Let be a random variable with for all . Then

Let with . This is a convex function. Hence, by the above claim, we have

Rearranging gives the desired result.

3.4 Multiple Discrete Random Variables

3.4.1 Joint Distribution and Conditional Distribution

Recall that if is a discrete random variable, then the distribution of is given by .

Definition 3.37 (Joint Distrbution and Marginal Distribution)

Let be discrete random variables. Their joint distribution is defined to be

In particular,

where is called the marginal distribution of .

Definition 3.38 (Conditional Distribution)

Let and be two discrete random variables. The conditional distribution of given is defined by

By the law of total probability, we have

3.4.2 Distribution of the Sum of Random Variables

Let and be two independent random variables. We wish to find the distribution of .

Example 3.39 (Independent Poisson Random Variables)

Let and with Then

So . Hence

3.4.3 Conditional Expectation

Recall that for ,

Definition 3.40 (Conditional Expectation)

Let be a discrete random variable and with . The conditional expectation of given and event is defined by

Note that if , we recover

Proposition 3.41 (Law of Total Expectation)

Let be a discrete random variable and a partition of with for all . Then

Proof. We have

In particular, for the conditional probability of given , we have

Let since this is a function of . Consider what is meant by .

We will define as a random variable, which is a function of .

Definition 3.42 (Conditional Expectation Given a Random Variable)

Let and be discrete random variables. The conditional expectation of given is defined by

where .

Remark. The above notation can be confusing. is a random variable, which is a function of . In particular, if , then .

Bear in mind that random variables are functions. We have , and so really means .

Example 3.43

Consider tossing a -coin times independently. For , let

We wish to find . Let

So, .

Proposition 3.44

Let and be discrete random variables. Then for some constant ,

.
.

In particular, .
.

Lecture 11 · 2026-02-16

Proposition 3.45 (Tower Property)

Let and be discrete random variables. Then

Proof. We have

Taking expectation on both sides gives

Proposition 3.46

Let and be independent discrete random variables. Then

Proof. We have

Proposition 3.47

Let be independent discrete random variables. Then for any random variable ,

Proof. Let and .

Claim. is independent of .

Proof. We need to show that

We have

Then, by Proposition 3.47, we have .

By Proposition 3.46, we have . Hence, .

Proposition 3.48

Let be two random variables and . Then

Proof. We have

Corollary 3.49

Let be two random variables. Then

Example 3.50

Consider tossing a -coin times independently. For , let

We wish to find . By symmetry, for all . Hence,

So, .

3.5 Random Walks

Definition 3.51 (Random Process / Stochastic Process)

A random process or stochastic process is a sequence of random variables .

Definition 3.52 (Random Walk)

A random walk is a random process that can be expressed as

where is a deterministic constant and are independent and identically distributed (i.i.d.) random variables.

The steps of the random walk are the random variables .

3.5.1 Simple Random Walk

Definition 3.53 (Simple Random Walk)

A simple random walk is a random walk satisfying .

If , then we call it a simple symmetric random walk.

Example 3.54 (Gambler's Ruin)

Consider as a simple random walk, which is the fortune of a gambler who starts with at time and at every time step, he wins with probability and loses with probability .The game ends if he reaches or if he reaches , whichever comes first.

Notation. We will denote , and .

Let . Then and .

Then,

So we can solve the following system of equations to find for all :

Lecture 12 · 2026-02-18

For , we get a simple symmetric random walk (SSRW), in which case we have

This leads to

Hence,

Considering boundary conditions, gives . Hence, for a SSRW, we have
For , we need to try a solution of the form for some . Then

The general solution is of the form for some constants . Using the boundary conditions, we get

3.5.2 Time to Absorption

Let be a simple random walk. We are interested in the time to absorption, which is the duration of the game. Let the time to absorption be denoted by . Then

We would like to consider the expected time to absorption, i.e. .

Notation. We will denote .

By the law of total expectation, we have

Boundary conditions are .

For , try a solution of the form . Then

The general solution will be of the form . Using boundary conditions, we get
For , try as a solution. Then . Hence

Solving this gives

3.6 Probability Generating Functions

3.6.1 Introduction

Definition 3.55 (Probability Mass Function)

Let be a discrete random variable. The probability mass function of is defined by

Definition 3.56 (Probability Generating Function)

Let be a discrete random variable taking values in . The probability generating function (PGF) of is defined by

where .

Note that converges absolutely for since and . Hence the radius of convergence of is at least . Therefore, is well-defined for .

Remark. In this section, we will only consider discrete random variables taking values in .

Theorem 3.57

The PGF of a random variable uniquely determines the distribution of .

Proof. Let and be 2 probability distributions with the same PGF, i.e. for all ,

We need to show that for all . We shall show this by induction. For ,

Assume that for all . Then

Dividing both sides by and letting gives

Example 3.58

Consider . Then

Remark. Suppose that are independent random variables with PGFs

Consider the PGF of :

Example 3.59

Consider and with . Then

Hence .

Example 3.60

Consider . Then

Example 3.61

Consider . Then

Example 3.62

Consider and with . Then

Hence, .

Theorem 3.63

Let be the PGF of a random variable . Then

Proof.

Assume that . Take . Then

So is increasing in and bounded above by . Hence, .

Take . There exists such that

Then,

Hence, .
Assume that . Then such that

Then,

Hence, .

Theorem 3.64

Let be the PGF of a random variable . Then

Lecture 13 · 2026-02-20

3.6.2 Sum of a Random Number of Random Variables

Let be independent and identically distributed random variables and let be an independent random variable taking values in .

Let

Lemma 3.65

Let be the PGF of and be the PGF of .

Then the PGF of is given by

Proof.

Proof 1

We have

Proof 2

We have

Note that

Hence, we can write . Taking expectation on both sides gives

We have

3.7 Branching Processes

Let , and be the distribution of the number of offspring of the individual in the first generation. [ is the number of individuals in the -th generation.]

Each individual produces an independent number of offspring with the same distribution as .

Let be a family of independent random variables such that has the same distribution as for all . Then we can write

We are interested in the probability of extinction.

3.7.1 Generating Functions of Branching Processes

Theorem 3.66

Let be a branching process with offspring distribution . Then,

Proof. We will show this by induction. For , we have . Assume that . Then

Note that

Thus,

Taking expectation on both sides gives

Theorem 3.67

Let and . Then

Proof. We have

Note that,

Hence,

3.7.2 Extinction Probability

Let

and .

Since , is an increasing sequence that converges to , by the continuity of probability measures.

[Recall that for all implies that .]

Proposition 3.68

Let be the PGF of . Then , and

Proof. If we were given that , then since is continuous as , we have . So we just need to show that .

Proof 1

We have

Proof 2

Conditioning on , let be independent and identically distributed branching processes with the same offspring distribution as . Then

Hence,

Lecture 14 · 2026-02-23

Note that in Theorem 3.66, we have

So, in order to evaluate , consider the following cases:

If , then as , so we expect that .
If , then as , so we expect that .
If , then for all , which does not give us an intuition on .

Note that the relation always has a solution at , as shown in the following graphs:

Note that the gradient of the tangent at is . Hence, the first two cases can be justified.

Theorem 3.69

Let be a branching process with offspring distribution . Assume that .

Then the extinction probability is the minimal non-negative solution to the equation .

Moreover, iff .

Proof. We know that in Proposition 3.68.

Let be the smallest non-negative solution to . We will show that .

We will prove by induction that for all , which will imply that as , but since is the smallest non-negative solution to , we have . Hence, .

For , we have . Assume that . Then

since is increasing in . Hence, .

Now, we will show that iff .

Assume that , then , and then

Then in this case, . Then,

Because , we have , and so .

Now assume that and , we shall show that iff . Define

Then . We shall first show that can have at most one more root in . We have

because implies that there exists such that .

Hence is strictly increasing in . Therefore, can have at most one more root other than , due to Rolle’s theorem. [If not, then such that . Then by Rolle’s theorem, there exists such that . Also, such that . This contradicts the fact that is strictly increasing in .]

We therefore have two cases:

has no other root other than . Then . We need to show that . We have

This is because and . So

Hence

So , and hence .
has another root . So has to be the extinction probability . We need to show that . We have

By Rolle’s theorem, there exists such that . We also have

Since is strictly increasing in , we have . Hence, .

4 Continuous Random Variables

4.1 Probability Distribution Function

Consider a probability space . Recall that a random variable is a function

where , the set is an event in .

Recall Definition of Probability Distribution Function 3.3. Let be the probability distribution function of . We have the following properties of :

Proposition 4.1 (Properties of Probability Distribution Function)

Let be the probability distribution function of a random variable . Then, we have the following properties:

is increasing, i.e. for all .
For all with , we have
is right-continuous and always has left limits, i.e. for all , we have
.
and .

Lecture 15 · 2026-02-25

Proof.

For (2), we have

For (3), let be a decreasing sequence with . Then, we want to show that . Define

Note that , and hence by the continuity of probability measures.

Now, we have , so . Therefore, .

Since is increasing, left limits always exist, and so .

For (4), consider

Define . Then we have . Hence

Definition 4.2 (Continuous Random Variable)

A random variable is said to be continuous if its probability distribution function is continuous. Equivalently, for all ,

For the purpose of this course, we will only consider continuous random variables whose PDF is differentiable. These are also known as absolutely continuous random variables.

Definition 4.3 (Probability Density Function)

Let be a continuous random variable with probability distribution function . If is differentiable, then the probability density function of is defined as

Proposition 4.4 (Properties of Probability Density Function)

Let be a continuous random variable with probability density function . Then, we have the following properties:

for all .
.
for all . More generally, for every ,

[If is discrete, then , so the above formula can be seen as a continuous analogue of the discrete case.]

Intuitively, can be thought as being proportional to the probability that takes values around . Mathematically, for small, we have

4.2 Uniform Distribution

Definition 4.5 (Uniform Distribution)

A uniform distribution on is a continuous random variable , denoted , with probability density function

Proof of density validity. We have

Suppose . Then for every ,

Otherwise, if , then , and if , then .

Note that for , we have for all .

4.3 Exponential Distribution

Definition 4.6 (Exponential Distribution)

An exponential distribution with parameter is a continuous random variable , denoted , with probability density function

Proof of density validity. We have

Suppose . Then for every ,

Remark. Let , and with . Then, for every ,

Note that is a geometric random variable with parameter . As , .

Hence, converges in distribution to as . So, we can think of an exponential distribution as a continuous analogue of a geometric distribution.

Proposition 4.7 (Memoryless Property of the Exponential Distribution)

Let with . Let . Then

It is significant that the exponential distribution is the only continuous distribution with the memoryless property.

Theorem 4.8

Let be a positive random variable which is not identically zero or . Then has the exponential distribution iff has the memoryless property, i.e. for all ,

Proof.

[] This is shown in Memoryless Property of the Exponential Distribution 4.7.

[] Suppose that has the memoryless property. Let and

Then . For every , .

In particular, when , we have .

Let . We have

and so .

By our definition, for any ,

so . Therefore, for all . We shall extend this to .

Let , and such that and . Then

Taking , we have .

Lecture 16 · 2026-02-27

4.4 Expectation and Variance of a Continuous Random Variable

Definition 4.9 (Expectation of a Continuous Random Variable)

Let be a continuous random variable with density with . The expectation of is defined as

Let be a non-negative function on . Then, we define

For a general random variable , define and . Then, . If or , then we define

Remark. As in the discrete case, we have

Lemma 4.10

If is a continuous random variable with , then

Proof. We have

Recall that in the discrete case, we have

Lemma 4.11

If is a continuous random variable with , then for any outcome ,

Definition 4.12 (Variance of a Continuous Random Variable)

Let be a continuous random variable with . The variance of is defined as

Example 4.13

Consider . Then the density of is given by

Hence
Consider . Then for , and . Hence

4.5 Normal Distribution

4.5.1 Introduction

Definition 4.14 (Normal Distribution)

An normal distribution with parameters and is a continuous random variable , denoted , with probability density function

Proof of density validity. We have

Consider . We have

Changing to polar coordintates with and , we have

Since , we have . Hence, is a valid probability density function.

Proposition 4.15

Let with and . Then

Proof. We have

Hence, . Moreover,

Hence .

4.5.2 Linear Transformations of Normal Distributions

Theorem 4.16

Let have density , and let be a function which is strictly monotone and is differentiable. Then has density

Proof.

Suppose that is strictly increasing. Then
Suppose that is strictly decreasing. Then

Hence the result follows in either case.

Proposition 4.17 (Linear Transformations of Normal Distributions)

Let with and . Let with . Then

Proof. Define and . We have and . Hence,

So .

Remark. If , then

Example 4.18

Let , and consider

Let

Note that

Since , we have . Hence,

We have existing tables of values, so we can compute for any , in particular,

Lecture 17 · 2026-03-02

Definition 4.19 (Median)

Let be a continuous random variable. The median of , denoted by , is the value such that

Alternatively,

where is the density of .

Example 4.20

Suppose that . Then

Hence the median of is .

4.6 Multivariate Density Functions

4.6.1 Introduction

Definition 4.21 (Multivariate Density Function)

Let be a random vector. We say that has a multivariate density function if there exists a non-negative function such that for all ,

The probability distribution function of is defined as

Therefore,

More generally, for ,

Definition 4.22 (Independence of Continuous Random Variables)

Let be continuous random variables. They are independent if for all ,

Theorem 4.23

Let be a random vector with density .

Suppose are independent with densities , then for all ,
Conversely, suppose factorises as for some non-negative functions on . Then are independent with densities proportional to .

Proof.

We have

So .
We have

Note that

Hence

So, are independent with densities proportional to .

4.6.2 Marginal Density Functions

Definition 4.24 (Marginal Density Function)

Let be a random vector with density . The marginal density function of is defined as

Proof. Suppose has density . Then consider

So the density of is

4.6.3 Sum of Independent Random Variables

Suppose and are independent random variables with densities and respectively. We want to find the density of .

Recall that in the discrete case, we used the discrete convolution formula

For the continuous case, we have the following analogue of the discrete convolution formula.

Proof. We have

So the density of is given by the convolution formula

4.6.4 Conditional Density Functions

Definition 4.25 (Conditional Density Function)

Let and be two random variables with joint density and marginal densities and respectively.

The conditional density function of given is defined as

Proposition 4.26 (Law of Total Probability for Continuous Random Variables)

Let and be two random variables with joint density and marginal densities and respectively. Then, for every ,

Similarly, we can define the conditional expectation of given .

Definition 4.27 (Conditional Expectation of a Continuous Random Variable)

The conditional expectation of given is defined as , where

4.6.5 Transformation of Random Variables

Theorem 4.28

Let be a random variable with values in and density . Let be a bijection with a continuous derivative on , and

Then, the random variable has density

where is the Jacobian determinant of .

Lecture 18 · 2026-03-04

Example 4.29

Let be independent. Then, consider in polar coordinates with and . We have

We want to find the density of . We have

where

and . Hence,

where and .

Hence, and has density for , and they are independent, by Theorem 4.23 (2).

4.7 Order Statistics of a Random Sample

Definition 4.30 (Order Statistics)

Let be i.i.d. random variables with probability distribution function and density . Order them from the smallest to the largest as

Lete . Then, are called the order statistics of the random sample .

We aim to find the density of . For the minimum , we have

For the maximum , we have

In order to find with , we have

Hence,

4.7.1 Order Statistics of Exponential Distributions

Let and , where and . Let . Then

Hence, .

If are independent random variables with , then .

Now, consider be i.i.d. random variables with . Let be the order statistics of .

Let . Note that we have found the distribution of above.

Consider the joint density of . We have

Hence, with the transformation of and , we have

Hence, and they are independent, by Theorem 4.23 (2).

4.8 Moment Generating Functions

Definition 4.31 (Moment Generating Function)

Let be a random variable with density . The moment generating function (MGF) of is

whenever the integral is finite. Note that .

Theorem 4.32

The MGF uniquely determines the distribution of a random variable, provided it is defined for an open interval of values of .

Theorem 4.33

Suppose that the MGF is defined for an interval of values of , then

4.8.1 Gamma Distribution

Example 4.34 (Gamma Distribution)

For and , the gamma distribution with parameters and is a continuous random variable with density

Proof of density validity. We have

We denote . Then,

If , then .

Lecture 19 · 2026-03-06

Proposition 4.35

If are independent random variables, then

Suppose and , where , and . Consider the density of .

We aim to show this by Theorem 4.32. Consider, for ,

Hence

Suppose are i.i.d. with . Then, .

Remark. One could also define with by replacing in the density by

We say if for .

Cauchy Distribution (Non-Examinable). The Cauchy distribution is defined as the distribution with density

The MGF is

Hence, all have the same MGF. However, it is not the case that all have the same distribution. So the assumption that the MGF must be finite for an open interval of values of is necessary in Theorem 4.32.

4.8.2 MGF of the Normal Distribution

Recall that if , then

Hence, the MGF of is

Note that, the exponent is

Hence,

Let and , and . Then

Hence .

4.8.3 Multivariate Moment Generating Functions

Definition 4.36 (Multivariate Moment Generating Function)

Let be a random variable. The MGF of is defined to be

where .

Theorem 4.37

For a multivariate random variable, if the MGF is finite for an open set of values of , then it uniquely determines the distribution of the random variable.

In this case,

Proposition 4.38

Let be a random variable in . Then

iff are independent.

Proof.

[] This is a direct consequence of the definition of MGF.

[] If are independent, then factorises. If factorises, then by Theorem 4.32, are independent.

4.9 Multidimensional Gaussian Random Variables

4.9.1 Introduction

Definition 4.39 (Gaussian Random Variable)

A random variable with values in is called Gaussian (or normal) in if it can be written as

where , , .

If , then the density of is, for ,

Definition 4.40 (Gaussian Vector)

Let be a random variable. We say that is a Gaussian vector (or Gaussian in ) if for all ,

is a Gaussian random variable in .

Proposition 4.41

Let be a Gaussian vector. Let be an matrix and . Then is also a Gaussian vector.

Proof. Let . We need to show that is a Gaussian random variable in . Note that

Letting , we have

Since is a Gaussian vector, is a Gaussian random variable in . Note that is a constant. Hence, is also a Gaussian random variable in .

Definition 4.42

Define

Note that

Hence, is a symmetric matrix.

Lecture 20 · 2026-03-09

Consider the random variable for some . We have

Proposition 4.43

is a non-negative definite matrix, i.e. for any ,

Proof. Note that . Since , we have .

Consider the MGF of . We have

Note that is . So

[Recall that if , ]

We have seen that the MGF uniquely characterises the distribution if defined for an open set of values. Hence, to characterise a Gaussian vector, we only need the mean and the covariance matrix .

We have determined the MGF of a Gaussian vector purely from the definition of a Gaussian vector, and we will consider its density function later.

4.9.2 Construction of a Gaussian Random Vector

Lemma 4.44

Let be i.i.d. with . Let . Then, is a Gaussian vector.

Proof. We need to show that , is normal in .

The MGF of is

So , and hence is a Gaussian vector.

Remark. We have

We write that

Let and let be a non-negative definite matrix. We want to construct a Gaussian vector with mean and (co)variance matrix , using the standard Gaussian vector .

[Note that in the case, we can construct by letting .]

Note that we will need some form of “square root” of .

Definition 4.45

Let be a non-negative definite matrix. Consider

and is a diagonal matrix with

Then, the square root of is defined as

Note that .

Lemma 4.46

Let , be a non-negative definite matrix. Let be i.i.d. with , and let .

Let be the square root of . Then, is a Gaussian vector with mean and covariance matrix . i.e., .

Proof. is a Gaussian vector as a linear transformation of a Gaussian vector. We have

4.9.3 Density of a Gaussian Vector

Let . We want to find the density of .

[In the case, we have ]

We shall consider two cases

is positive definite, i.e. . We can write

Note that gives . Hence,

Lecture 21 · 2026-03-11

is non-negative definite, and such that .

By an orthogonal change of basis, we could assume that

Let

We can then write

Proposition 4.47

Let be a Gaussian vector. Let and .

If are independent, then is a diagonal matrix.

Proof. Since for , the matrix is diagonal due to independence.

Proposition 4.48

Let be a Gaussian vector. If is a diagonal matrix and strictly positive definite, then are independent.

Proof 1. We have

Let . We have

Since factorises, by Theorem 4.23 (2), are independent, and .

Proof 2. Consider the MGF of . We have

since we have

Hence

Since factorises into a product of functions of ‘s, by Theorem 4.32, are independent, and .

Therefore, we can conclude that if is a Gaussian vector, then are independent iff for all .

4.9.4 Bivariate Gaussian Distribution

Definition 4.49

Let be a Gaussian vector in . Let , and

Then is called a bivariate Gaussian vector with parameters and .

Proposition 4.50

Proof. By Cauchy-Schwarz inequality, the result follows.

Note that we can write the covariance matrix of as

Proposition 4.51

Let and Then the matrix

is non-negative definite.

Proof. Consider any . Then

If , the second line gives .
If , the first line gives .

Now, consider . We can write

Let . Then is a Gaussian vector since

Note that

Taking gives . Hence, and are independent. We have

5 Convergence Results and Limit Theorems

5.1 Convergence Results

Definition 5.1 (Convergence in Probability)

A sequence of random variables converges to a random variable in probability and we write

if as ,

Theorem 5.2 (Weak Law of Large Numbers)

Let be i.i.d. random variables with mean . Let . Then

i.e.

This is called the weak law of large numbers (WLLN).

Lecture 22 · 2026-03-13

Proof. Assume that . We need to show that

We have

By definition, we have . So

Definition 5.3 (Convergence Almost Surely)

Let be a sequence of random variables and be a random variable. We say that converges to almost surely (a.s.) or with probability if

We write as almost surely.

Remark. The above statement, more precisely, means that

where and .

To compare with Definition 5.1, this is equivalently saying that

Note that the quantifiers are now inside .

[It can be clearer to see the difference between convergence in probability and convergence almost surely without any shorthands. Let the underlying probability space be . Let and the sequence be random variables, where .

Convergence in probability states that ,

Convergence almost surely states that

i.e. convergence in probability cares about that as , (at each snapshot of big enough) the percentage of the population that behaves badly (away from ) shrinks to zero; whereas convergence almost surely cares about that the percentage of the population that converges to as (without leaving there) is 1. This is a much stronger condition. For a sequence to converge almost surely, the individuals can’t keep jumping away from the target infinitely often, but they could for convergence in probability.

This is indeed a challenging topic. See Wikipedia, Maths Stack Exchange for further explanation.]

Proposition 5.4

If as almost surely, then .

Proof. We need to show that as . We have

Note that . Hence

By assumption, the RHS is exactly .

Remark. In general, almost sure convergence implies convergence in probability, but not the other way around.

Theorem 5.5 (Strong Law of Large Numbers)

Let be i.i.d. random variables with mean . Let . Then

This is called the strong law of large numbers (SLLN).

Proof. [Non-examinable.]

Assume that . Set . Then and

Set . We need to show that .

The goal is to show that since this will imply that with probability 1, as .

It suffices to show that . We have

By definition, we have

where is a sum of terms of the form

for distinct . By independence and the fact that , we have . Moreover,

Therefore,

5.2 Central Limit Theorem

Let be i.i.d. random variables with mean and variance . Set . By SLLN 5.5, we expect that

Note that

Hence has expectation and variance .

Also, we have

So, if are i.i.d. with , then

The normal distribution is universal as it appears as the limit of the distribution of no matter what the distribution of is, as long as are i.i.d. with mean and variance . This is the content of the central limit theorem (CLT).

Definition 5.6 (Convergence in Distribution)

A sequence of random variables converges to in distribution as and we write

for all where is continuous.

Lecture 23 · 2026-03-16

Proposition 5.7

If , then .

Proof. Let be a continuity point of . We need to show that . Let .

We have

So we have

Also,

Hence

Therefore, letting , we have

So as required.

Theorem 5.8 (Central Limit Theorem)

Let be i.i.d. random variables with mean and variance . Set . Then

i.e. ,

Proof. We will need the following continuity property for moment generating functions.

Theorem 5.9 (Continuity Property for MGFs)

Suppose are random variables with for and is a random variable with for . Assume for some .

If as for all in , then

The proof is beyond the scope of this course.

Consider . Then

It is enough to prove the theorem for a sequence , i.i.d. with mean and variance . Set . We need to show that

Assume that such that .

Set . By the continuity property for MGFs, it suffices to show that

Note that

We want to show that

We have

Claim.

Proof. Let .

However,

where . So

where .

Then we can conclude, because

and hence

Corollary 5.10

Let be i.i.d. random variables with mean and variance . Set . Then

Example 5.11

Suppose . Then where are i.i.d. with . So

Therefore, for large ,
Suppose with . Then
We can approximate Poisson distribution with normal distribution. Suppose with . Then where are i.i.d. with . So

Therefore, for large ,

Lecture 24 · 2026-03-18

5.2.1 Sampling Error via the CLT

Suppose we hold a referendum and a proportion of individuals is inclined to write “Yes”. [Every individual writes “Yes” with probability and “No” with probability .] We wish to estimate .

Sample individuals at random (for a large ) and record their votes. Let be if the -th individual voted “Yes” and otherwise.

Let be the number of “Yes” votes. Then by SLLN 5.5,

We have .

We need to testimate with an accuracy of with probability . We now wish to consider how large should be.

By Central Limit Theorem 5.8,

where for large . Then we want to find such that

Note that for large ,

Hence

Since we have and that by standard tables, we requrie

Since , taking is sufficient.

5.3 Simulation of Random Variables

Computers can generate random numbers between to . We would like to extend this to general probability distributions beyond uniform distributions.

Example 5.12

Suppose is a random variable with probability distribution function . Suppose that we know how to simulate .

Assume that is 1-1.

Let . Then
If is not 1-1, then define its generalised inverse . Then let . We can show that as well.

Claim. iff

Proof. [] This follows by definition.

[] such that decreases to and . By the right continuity of ,

So . Therefore, if , then becuase is increasing.

Let . Then

5.3.1 Box-Muller Transform

We want to simulate independent with using two independent . Since

there is no closed form solution. However, recall Example 4.29.

Let be independent. Then let

We get

and hence has the correct density. Using the transformation in Example 4.29, we get :

5.4 Rejection Sampling

Suppose we have a random variable with density

where is the volume of .

Let be i.i.d. with .

Set . Let

We want to show that ,

Note that