← IA Problems – Full Text
These are Zixuan’s notes for Part IA – Probability at the University of Cambridge in 2026. The notes are not endorsed by the lecturers or the University, and all errors are my own.
The latest version of this document is available at academic.micfong.space. Please direct any comments to my CRSid email or use the contact details listed on the site.
This document is typeset using Typst. All figures are created using Inkscape.
1 Basic Concepts
We shall begin by general definitions in probability theory.
1.1 Probability Spaces
Suppose is a set and is a collection of subsets of . Then we call a -algebra if
-
[ the whole space is in . ]
-
If , then [ closed under complements. ]
-
If is a countable collection of sets in , then [ closed under countable unions. ]
Let be a -algebra on . A function
is called a probability measure if
-
-
Countable additivity. For any countable disjoint collection of sets in , we have
We say to be the probability of the event .
We call a probability space.
Suppose is a probability space.
The elements of are called outcomes, and the elements of are called events.
Remark. We talk about probabilities of events instead of outcomes.
When is countable, we take to be the power set of .
Suppose is a probability space. Then for any , we have
-
-
-
if
-
-
Consider rolling a fair die. We have
-
Consider
This models the experiment of picking a uniformly random element from . We have
-
Consider picking balls from a bag. Suppose we have balls labelled . We pick balls at random without replacement. Then we have
-
Consider a well-shuffled deck of 52 cards. Then
-
Largest digit problem. Consider a string of random digits from . Then
Then we define
And so
Another example is the event
Then
-
Birthday problem. There are people. What is the probability that at least two people share a birthday?
We may assume that there are 365 days in a year, and each person’s birthday is equally likely to be any of the 365 days, independently of other people.
We have
Then
Let be the event that at least two people share a birthday. Then we have
Thus
1.2 Combinatorial Analysis
-
Let be a finite set with .
We want to partition into disjoint subsets with and . Consider the number of ways to do this.
Let be the number of ways to do this. Then
-
Let . We say that is
-
strictly increasing if
-
increasing if
Each strictly increasing functions can be determined by its range. Hence the number of strictly increasing functions is .
Lecture 2 · 2026-01-26For increasing functions, define a bijection
by
Hence the number of increasing functions is .
-
1.3 Stirling’s Formula
Notation. Let , be 2 sequences. We write
if as .
As , we have
We shall first consider a weaker statement.
Proof. Define
For , we shall write as the integer part of . Then
Integrating from to , we have
Dividing by , we have
Proof. [of Stirling’s Formula 1.5, non-examinable proof] For all that is twice differentiable, for all we have
This can be checked using integration by parts twice.
Take . Then we get
Therefore,
where
We have
Now, for ,
So the series converges. Let
Therefore,
We have shown that
It remains to show that . We already know that
Consider
We shall use a different method to show that
This would imply that .
Consider
Using integration by parts, we have the recurrence relation
So
Thus .
Similarly,
Now, if we have that as , then we have
In order to show that as , we note that
Recall that
Since is decreasing in ,
So
This completes the proof.
2 Axiomatic Approach
Recall our definition of probability spaces in Definition 1.1.
2.1 Probability Measure
There are several important properties of the probability measure .
If is a collection in , i.e. , then
Proof. Define and for ,
Then is a disjoint collection in and .
So
because for all .
Let where and for all . We call an increasing sequence in .
Then is an increasing sequence in and converges to .
Proof. Define , and for ,
Then is a disjoint collection, and
Now we also have
Hence
Let where and for all . We call a decreasing sequence in .
Then is a decreasing sequence in and converges to .
2.2 Inclusion-Exclusion Formula
Let . Then
In general, for events , we have
Proof. We shall prove this by induction on . The base case has already been shown.
Assume that the formula holds for events. We shall show it holds for events.
By induction hypothesis,
Plugging these into the previous equation gives the result.
Let . We have
Let be events in . Then for any ,
Proof. We shall prove this by induction. The base case is clear.
Suppose the result holds for events. We shall show it holds for events.
Assume that is odd. Then
By the induction hypothesis, since is odd,
Substituting these into the previous equation gives the result. The case where is even is similar.
Remark. If is a finite set and is a probability space with , and define
Take , then by the inclusion-exclusion formula on , we can get
This is the inclusion-exclusion principle in combinatorics.
Let
We want to find .
Define
Then
Note that we have
So
A derangement is a permutation with no fixed points. Let
Pick a permutation at random. Consider the probability that it is in . For , let
Then
So
Using Inclusion-Exclusion Formula 2.4, we have
Therefore,
Note that as , .
2.3 Independence
Let be a probability space. Let . We say that and are independent if
We write .
Let be a collection of events in . We say that are mutually independent if for any finite subset ,
Consider tossing a fair coin twice. Let
Define events
Then we have
Thus, are pairwise independent but not mutually independent.
Proof.
2.4 Conditional Probability
Let be a probability space. Let where . The conditional probability of given is
In particular, if , then
Let is a disjoint sequence in . Then for some where ,
Proof.
Suppose is a disjoint collection if such that and for all . Then for any ,
Proof.
Consider to be a collection of disjoint events in such that and for all . Then for any where ,
Proof. By Conditional Probability 2.11,
By Law of Total Probability 2.13, we have
Also,
Plugging these into the first equation gives the result.
This formula is the basis of Bayesian statistics; if we know the probabilities of and we have a model which gives us , then we can compute the posterior probability .
Suppose that a rare condition affects of the population, i.e. . A test for the condition has a true positive rate and a false positive rate. [For affected individuals, the test is positive with probability , and for unaffected individuals, the test is positive with probability .]
An individual takes the test and tests positive. Consider the probability that they actually have the condition.
Define
We have
Then
Note that we can rewrite this as
Since and are both very close to , so the relavent ration is approximately
But and hence the posterior probability is still quite small.
Consider the probability that a person have 2 boys given that,
-
They have exactly 2 children, one of whom is a boy.
Since no further information is given, we assume that all outcomes are equally likely.
Let
We want to find
-
They have exactly 2 children, the elder of whom is a boy.
We want to find
-
They have exactly 2 children, exactly one of them is a boy, who was born on a Thursday.
Let denote the event that a child is a boy born on a Thursday, and denote the event that a child is a boy not born on a Thursday. We want to find
Suppose the following data is collected for applicants to a Cambridge college:
| Admitted | Rejected | Admission Rate | ||
| All Applicants | State | 25 | 25 | 50% |
| Independent | 28 | 22 | 56% | |
| London Schools | State | 15 | 22 | 41% |
| Independent | 5 | 8 | 38% | |
| Cambridge Schools | State | 10 | 3 | 77% |
| Independent | 23 | 14 | 62% | |
Note that within each school type, state school applicants have a higher admission rate than independent school applicants. However, when considering all applicants, independent school applicants have a higher admission rate than state school applicants.
This phenomenon is called confounding in statistics. It arises when we aggregate data from disparate populations.
In terms of conditional probability, let
We have
but
Note that
Now, if we (falsely) assume that , then
Nonetheless, this does not hold in our example.
2.5 Discrete Probability Distributions
Let be a probability space, where is countable with
If we know for all , then ,
In this case, we say that a discrete probability distribution.
We write for all .
Let be a discrete probability distribution on a countable set . Then
-
for all .
-
.
Let us see some examples of discrete probability distributions.
A Bernoulli distribution models the outcome of a single binary experiment, such as a biased coin toss.
The Bernoulli distribution with parameter is defined by
A binomial distribution models the number of successes in independent Bernoulli trials, each with success probability .
The binomial distribution with parameters and is defined by
where is called the binomial distribution. Note that in the case of a coin toss,
A multinomial distribution models the number of occurrences of each outcome in independent trials, each with possible outcomes with probabilities .
The multinomial distribution with parameters and where is defined by
For example, suppose that there are boxes and we throw balls into these boxes independently, where each ball lands in box with probability . Then
A geometric distribution models the number of Bernoulli trials needed to get the first success, where each trial has success probability .
The geometric distribution with parameter is defined by
Note that,
This models, for example, the number of times we need to toss a biased coin with heads probability until we get the first heads.
A Poisson distribution models the number of events occurring in a fixed interval of time or space, where these events occur with a known constant mean rate and independently of the time since the last event.
The Poisson distribution with parameter is defined by
Note that
Proof. Consider the number of customers arriving at a shop in a time interval . We can discretise the time interval to with , where each interval has a small probability of a customer arriving. Then
3 Discrete Random Variables
Consider a probability space . A random variable is a function , satisfying that ,
We write and , .
Let . The indicator of is a random variable defined by
Suppose is a random variable. Define the probability distribution function of to be
where .
is called a random variable in if
is a function such that ,
Equivalently, all are real random variables.
A random variable is called discrete if its range is countable.
Suppose it takes values in a countable set , then for every we write
We call the probability mass function of , or the distribution of . Note that ,
Recall that two events and are independent if .
Let be discrete random variables with values in . They are independent if for any ,
Consider tossing a -coin times independently. let with , where
and is the result of the -th toss. Then
For all define random variables with . Then
Note that is a Bernoulli random variable with parameter .
Proof.
3.1 Expectation
Let be a countable set and be a discrete random variable.
For , the expectation of is defined by
Alternatively, consider
Then
So the expectation of is the weighted average of the values taken by , with weights given by the probabilities of taking those values.
Consder . Then ,
Then, for the expectation,
Consider with . Then for all ,
Hence
Notation. Let be a random variable. Define
Then
Suppose is discrete. We can define and as in Definition 3.8.
If at least one of and is finite, then we can define the expectation of by
Otherwise, is not defined.
If is well-defined, then we have
-
If , then .
-
If and , then .
-
If , then and .
-
If and are random variables, then .
-
Let and be integrable random variables. Then
-
Suppose that are non-negative random variables. Then
Proof. Suppose that is countable.
For (6), we have
Let . Then
Let and consider defined by . Then is a random variable, with
Proof. Let . Then we have
Hence,
Suppose and takes integer values. Then
Proof. Note that for any ,
Since and , we have
Taking expectation on both sides, we get
With expectation, we can form another proof of the inclusion-exclusion formula.
Proof. [of Inclusion-Exclusion Formula 2.4] Let . Then
More generally, if we have , then
Taking expectation on both sides, since , we get
3.2 Variance and Covariance
The variance of is defined by
The variance is a measure of concentration of the distribution of around its expectation. The smaller the variance, the more concentrated the distribution is around its expectation.
-
.
-
If , then .
-
If , then and .
-
.
-
, with the minimum attained at .
Proof. For (4), we have
For (5), define . Then
Hence, taking derivative with respect to , we get
Therefore, if and only if . Moreover, , so is convex and the minimum is attained at . Hence
-
Consider . Then and
Note that
Hence,
-
Consider . Then and
Hence,
Let and be random variables. The covariance of and is defined by
It is a measure of the dependency between and .
-
.
-
.
-
.
-
Let . Then and .
-
.
-
Let . Then .
-
Let be random variables. Then .
More generally, for all , we have
In particular,
Proof. For (3), we have
For (5), we have
Recall that if are discrete random variables, then they are independent iff for all ,
Proof. We need to show that
We have
Let and be 2 independent random variables, and . Then
Proof. Let and define . Then
Let and be 2 independent random variables. Then
Let be independent random variables. Let
Then . Moreover,
So,
However,
Hence and are not independent, even though .
Let be independent random variables. Then
Consider random variables . Then we have as seen before.
We also know that where are independent and .
Hence, by the above corollary, we have
3.3 Inequalities
Let be a non-negative random variable and . Then
Proof. Observe that
Taking expectation on both sides, we get
Let be a random variable and . Then
Proof. Note that
Let and be random variables. Then
Proof. Assume that and , otherwise there is nothing to prove. Then
Assume that and , otherwise this is the trivial case. Assume WLOG that and are non-negative.
Let and consider . Then
Then
This function is minimised at where . Hence, using the fact that , we get
Remark. The equality holds when , so
A function is called convex if , , we have
Let be a random variable and be a convex function. Then
Proof. We first need an additional claim.
Claim. Let be a convex function. Then is the supremum of all the lines below it, i.e. for all , such that
Proof. Let and . Then for some , we have
By the convexity of , we have
Hence, using the fact that , we get
Let . Then ,
So, this gives that ,
Hence, we can take and get the desired result.
Let . By the claim, such that
Then we have . Taking expectations on both sides, we get
Remark. For the equality case, let be convex with the extra property that for , , such that for all and .
We wish to find a condition of such that we have
We have . Taking expectations gives
Hence, the equality forces .
Since , we must have . By the extra property of , this is equivalent to .
Let , then
Proof.
Claim. Let be a convex function. Then , we have
Proof. Let be a random variable with for all . Then
Let with . This is a convex function. Hence, by the above claim, we have
Rearranging gives the desired result.
3.4 Multiple Discrete Random Variables
3.4.1 Joint Distribution and Conditional Distribution
Recall that if is a discrete random variable, then the distribution of is given by .
Let be discrete random variables. Their joint distribution is defined to be
In particular,
where is called the marginal distribution of .
Let and be two discrete random variables. The conditional distribution of given is defined by
By the law of total probability, we have
3.4.2 Distribution of the Sum of Random Variables
Let and be two independent random variables. We wish to find the distribution of .
Let and with Then
So . Hence
3.4.3 Conditional Expectation
Recall that for ,
Let be a discrete random variable and with . The conditional expectation of given and event is defined by
Note that if , we recover
Let be a discrete random variable and a partition of with for all . Then
Proof. We have
In particular, for the conditional probability of given , we have
Let since this is a function of . Consider what is meant by .
We will define as a random variable, which is a function of .
Let and be discrete random variables. The conditional expectation of given is defined by
where .
Remark. The above notation can be confusing. is a random variable, which is a function of . In particular, if , then .
Bear in mind that random variables are functions. We have , and so really means .
Consider tossing a -coin times independently. For , let
We wish to find . Let
So, .
Let and be discrete random variables. Then for some constant ,
-
.
-
.
In particular, .
-
.
Let and be discrete random variables. Then
Proof. We have
Taking expectation on both sides gives
Let and be independent discrete random variables. Then
Proof. We have
Let be independent discrete random variables. Then for any random variable ,
Proof. Let and .
Proof. We need to show that
We have
Then, by Proposition 3.47, we have .
By Proposition 3.46, we have . Hence, .
Let be two random variables and . Then
Proof. We have
Let be two random variables. Then
-
-
.
Consider tossing a -coin times independently. For , let
We wish to find . By symmetry, for all . Hence,
So, .
3.5 Random Walks
A random walk is a random process that can be expressed as
where is a deterministic constant and are independent and identically distributed (i.i.d.) random variables.
The steps of the random walk are the random variables .
3.5.1 Simple Random Walk
A simple random walk is a random walk satisfying .
If , then we call it a simple symmetric random walk.
Consider as a simple random walk, which is the fortune of a gambler who starts with at time and at every time step, he wins with probability and loses with probability .The game ends if he reaches or if he reaches , whichever comes first.
Let . Then and .
Then,
So we can solve the following system of equations to find for all :
-
For , we get a simple symmetric random walk (SSRW), in which case we have
This leads to
Hence,
Considering boundary conditions, gives . Hence, for a SSRW, we have
-
For , we need to try a solution of the form for some . Then
The general solution is of the form for some constants . Using the boundary conditions, we get
3.5.2 Time to Absorption
Let be a simple random walk. We are interested in the time to absorption, which is the duration of the game. Let the time to absorption be denoted by . Then
We would like to consider the expected time to absorption, i.e. .
By the law of total expectation, we have
Boundary conditions are .
-
For , try a solution of the form . Then
The general solution will be of the form . Using boundary conditions, we get
-
For , try as a solution. Then . Hence
Solving this gives
3.6 Probability Generating Functions
3.6.1 Introduction
Let be a discrete random variable. The probability mass function of is defined by
Let be a discrete random variable taking values in . The probability generating function (PGF) of is defined by
where .
Note that converges absolutely for since and . Hence the radius of convergence of is at least . Therefore, is well-defined for .
Proof. Let and be 2 probability distributions with the same PGF, i.e. for all ,
We need to show that for all . We shall show this by induction. For ,
Assume that for all . Then
Dividing both sides by and letting gives
Consider . Then
Remark. Suppose that are independent random variables with PGFs
Consider the PGF of :
Consider and with . Then
Hence .
Consider . Then
Consider . Then
Consider and with . Then
Hence, .
Let be the PGF of a random variable . Then
Proof.
-
Assume that . Take . Then
So is increasing in and bounded above by . Hence, .
Take . There exists such that
Then,
Hence, .
-
Assume that . Then such that
Then,
Hence, .
Let be the PGF of a random variable . Then
3.6.2 Sum of a Random Number of Random Variables
Let be independent and identically distributed random variables and let be an independent random variable taking values in .
Let
Let be the PGF of and be the PGF of .
Then the PGF of is given by
Proof.
- Proof 1
-
We have
- Proof 2
-
We have
Note that
Hence, we can write . Taking expectation on both sides gives
We have
3.7 Branching Processes
Let , and be the distribution of the number of offspring of the individual in the first generation. [ is the number of individuals in the -th generation.]
Each individual produces an independent number of offspring with the same distribution as .
Let be a family of independent random variables such that has the same distribution as for all . Then we can write
We are interested in the probability of extinction.
3.7.1 Generating Functions of Branching Processes
Let be a branching process with offspring distribution . Then,
Proof. We will show this by induction. For , we have . Assume that . Then
Note that
Thus,
Taking expectation on both sides gives
Let and . Then
Proof. We have
Note that,
Hence,
3.7.2 Extinction Probability
Let
and .
Since , is an increasing sequence that converges to , by the continuity of probability measures.
[Recall that for all implies that .]
Let be the PGF of . Then , and
Proof. If we were given that , then since is continuous as , we have . So we just need to show that .
- Proof 1
-
We have
- Proof 2
-
Conditioning on , let be independent and identically distributed branching processes with the same offspring distribution as . Then
Hence,
Note that in Theorem 3.66, we have
So, in order to evaluate , consider the following cases:
-
If , then as , so we expect that .
-
If , then as , so we expect that .
-
If , then for all , which does not give us an intuition on .
Note that the relation always has a solution at , as shown in the following graphs:
Note that the gradient of the tangent at is . Hence, the first two cases can be justified.
Let be a branching process with offspring distribution . Assume that .
Then the extinction probability is the minimal non-negative solution to the equation .
Moreover, iff .
Proof. We know that in Proposition 3.68.
Let be the smallest non-negative solution to . We will show that .
We will prove by induction that for all , which will imply that as , but since is the smallest non-negative solution to , we have . Hence, .
For , we have . Assume that . Then
since is increasing in . Hence, .
Now, we will show that iff .
Assume that , then , and then
Then in this case, . Then,
Because , we have , and so .
Now assume that and , we shall show that iff . Define
Then . We shall first show that can have at most one more root in . We have
because implies that there exists such that .
Hence is strictly increasing in . Therefore, can have at most one more root other than , due to Rolle’s theorem. [If not, then such that . Then by Rolle’s theorem, there exists such that . Also, such that . This contradicts the fact that is strictly increasing in .]
We therefore have two cases:
-
has no other root other than . Then . We need to show that . We have
This is because and . So
Hence
So , and hence .
-
has another root . So has to be the extinction probability . We need to show that . We have
By Rolle’s theorem, there exists such that . We also have
Since is strictly increasing in , we have . Hence, .
4 Continuous Random Variables
4.1 Probability Distribution Function
Consider a probability space . Recall that a random variable is a function
where , the set is an event in .
Recall Definition of Probability Distribution Function 3.3. Let be the probability distribution function of . We have the following properties of :
Let be the probability distribution function of a random variable . Then, we have the following properties:
-
is increasing, i.e. for all .
-
For all with , we have
-
is right-continuous and always has left limits, i.e. for all , we have
-
.
-
and .
Proof.
For (2), we have
For (3), let be a decreasing sequence with . Then, we want to show that . Define
so
Note that , and hence by the continuity of probability measures.
Now, we have , so . Therefore, .
Since is increasing, left limits always exist, and so .
For (4), consider
Define . Then we have . Hence
A random variable is said to be continuous if its probability distribution function is continuous. Equivalently, for all ,
For the purpose of this course, we will only consider continuous random variables whose PDF is differentiable. These are also known as absolutely continuous random variables.
Let be a continuous random variable with probability distribution function . If is differentiable, then the probability density function of is defined as
Let be a continuous random variable with probability density function . Then, we have the following properties:
-
for all .
-
.
-
for all . More generally, for every ,
[If is discrete, then , so the above formula can be seen as a continuous analogue of the discrete case.]
Intuitively, can be thought as being proportional to the probability that takes values around . Mathematically, for small, we have
4.2 Uniform Distribution
A uniform distribution on is a continuous random variable , denoted , with probability density function
Proof of density validity. We have
Suppose . Then for every ,
Otherwise, if , then , and if , then .
Note that for , we have for all .
4.3 Exponential Distribution
An exponential distribution with parameter is a continuous random variable , denoted , with probability density function
Proof of density validity. We have
Suppose . Then for every ,
Remark. Let , and with . Then, for every ,
Note that is a geometric random variable with parameter . As , .
Hence, converges in distribution to as . So, we can think of an exponential distribution as a continuous analogue of a geometric distribution.
Let with . Let . Then
It is significant that the exponential distribution is the only continuous distribution with the memoryless property.
Let be a positive random variable which is not identically zero or . Then has the exponential distribution iff has the memoryless property, i.e. for all ,
Proof.
[] This is shown in Memoryless Property of the Exponential Distribution 4.7.
[] Suppose that has the memoryless property. Let and
Then . For every , .
In particular, when , we have .
Let . We have
and so .
By our definition, for any ,
so . Therefore, for all . We shall extend this to .
Let , and such that and . Then
Taking , we have .
4.4 Expectation and Variance of a Continuous Random Variable
Let be a continuous random variable with density with . The expectation of is defined as
Let be a non-negative function on . Then, we define
For a general random variable , define and . Then, . If or , then we define
Remark. As in the discrete case, we have
If is a continuous random variable with , then
Proof. We have
Recall that in the discrete case, we have
If is a continuous random variable with , then for any outcome ,
Let be a continuous random variable with . The variance of is defined as
-
Consider . Then the density of is given by
Hence
-
Consider . Then for , and . Hence
4.5 Normal Distribution
4.5.1 Introduction
An normal distribution with parameters and is a continuous random variable , denoted , with probability density function
Proof of density validity. We have
Consider . We have
Changing to polar coordintates with and , we have
Since , we have . Hence, is a valid probability density function.
Let with and . Then
Proof. We have
Hence, . Moreover,
Hence .
4.5.2 Linear Transformations of Normal Distributions
Let have density , and let be a function which is strictly monotone and is differentiable. Then has density
Proof.
-
Suppose that is strictly increasing. Then
-
Suppose that is strictly decreasing. Then
Hence the result follows in either case.
Let with and . Let with . Then
Proof. Define and . We have and . Hence,
So .
Remark. If , then
Let , and consider
Let
Note that
Since , we have . Hence,
We have existing tables of values, so we can compute for any , in particular,
Let be a continuous random variable. The median of , denoted by , is the value such that
Alternatively,
where is the density of .
Suppose that . Then
Hence the median of is .
4.6 Multivariate Density Functions
4.6.1 Introduction
Let be a random vector. We say that has a multivariate density function if there exists a non-negative function such that for all ,
The probability distribution function of is defined as
Therefore,
More generally, for ,
Let be continuous random variables. They are independent if for all ,
Let be a random vector with density .
-
Suppose are independent with densities , then for all ,
-
Conversely, suppose factorises as for some non-negative functions on . Then are independent with densities proportional to .
Proof.
-
We have
So .
-
We have
Note that
Hence
So, are independent with densities proportional to .
4.6.2 Marginal Density Functions
Let be a random vector with density . The marginal density function of is defined as
Proof. Suppose has density . Then consider
So the density of is
4.6.3 Sum of Independent Random Variables
Suppose and are independent random variables with densities and respectively. We want to find the density of .
Recall that in the discrete case, we used the discrete convolution formula
For the continuous case, we have the following analogue of the discrete convolution formula.
Proof. We have
So the density of is given by the convolution formula
4.6.4 Conditional Density Functions
Let and be two random variables with joint density and marginal densities and respectively.
The conditional density function of given is defined as
Let and be two random variables with joint density and marginal densities and respectively. Then, for every ,
Similarly, we can define the conditional expectation of given .
The conditional expectation of given is defined as , where
4.6.5 Transformation of Random Variables
Let be a random variable with values in and density . Let be a bijection with a continuous derivative on , and
Then, the random variable has density
where is the Jacobian determinant of .
Let be independent. Then, consider in polar coordinates with and . We have
We want to find the density of . We have
where
and . Hence,
where and .
Hence, and has density for , and they are independent, by Theorem 4.23 (2).
4.7 Order Statistics of a Random Sample
Let be i.i.d. random variables with probability distribution function and density . Order them from the smallest to the largest as
Lete . Then, are called the order statistics of the random sample .
We aim to find the density of . For the minimum , we have
For the maximum , we have
In order to find with , we have
Hence,
4.7.1 Order Statistics of Exponential Distributions
Let and , where and . Let . Then
Hence, .
If are independent random variables with , then .
Now, consider be i.i.d. random variables with . Let be the order statistics of .
Let . Note that we have found the distribution of above.
Consider the joint density of . We have
Hence, with the transformation of and , we have
Hence, and they are independent, by Theorem 4.23 (2).
4.8 Moment Generating Functions
Let be a random variable with density . The moment generating function (MGF) of is
whenever the integral is finite. Note that .
Suppose that the MGF is defined for an interval of values of , then
4.8.1 Gamma Distribution
For and , the gamma distribution with parameters and is a continuous random variable with density
Proof of density validity. We have
We denote . Then,
If , then .
If are independent random variables, then
Suppose and , where , and . Consider the density of .
We aim to show this by Theorem 4.32. Consider, for ,
Hence
Suppose are i.i.d. with . Then, .
Remark. One could also define with by replacing in the density by
We say if for .
Cauchy Distribution (Non-Examinable). The Cauchy distribution is defined as the distribution with density
The MGF is
Hence, all have the same MGF. However, it is not the case that all have the same distribution. So the assumption that the MGF must be finite for an open interval of values of is necessary in Theorem 4.32.
4.8.2 MGF of the Normal Distribution
Recall that if , then
Hence, the MGF of is
Note that, the exponent is
Hence,
Let and , and . Then
Hence .
4.8.3 Multivariate Moment Generating Functions
Let be a random variable. The MGF of is defined to be
where .
For a multivariate random variable, if the MGF is finite for an open set of values of , then it uniquely determines the distribution of the random variable.
In this case,
Let be a random variable in . Then
iff are independent.
Proof.
[] This is a direct consequence of the definition of MGF.
[] If are independent, then factorises. If factorises, then by Theorem 4.32, are independent.
4.9 Multidimensional Gaussian Random Variables
4.9.1 Introduction
A random variable with values in is called Gaussian (or normal) in if it can be written as
where , , .
If , then the density of is, for ,
Let be a random variable. We say that is a Gaussian vector (or Gaussian in ) if for all ,
is a Gaussian random variable in .
Let be a Gaussian vector. Let be an matrix and . Then is also a Gaussian vector.
Proof. Let . We need to show that is a Gaussian random variable in . Note that
Letting , we have
Since is a Gaussian vector, is a Gaussian random variable in . Note that is a constant. Hence, is also a Gaussian random variable in .
Define
Note that
Hence, is a symmetric matrix.
Consider the random variable for some . We have
is a non-negative definite matrix, i.e. for any ,
Consider the MGF of . We have
Note that is . So
[Recall that if , ]
We have seen that the MGF uniquely characterises the distribution if defined for an open set of values. Hence, to characterise a Gaussian vector, we only need the mean and the covariance matrix .
We have determined the MGF of a Gaussian vector purely from the definition of a Gaussian vector, and we will consider its density function later.
4.9.2 Construction of a Gaussian Random Vector
Proof. We need to show that , is normal in .
The MGF of is
So , and hence is a Gaussian vector.
Remark. We have
We write that
Let and let be a non-negative definite matrix. We want to construct a Gaussian vector with mean and (co)variance matrix , using the standard Gaussian vector .
[Note that in the case, we can construct by letting .]
Note that we will need some form of “square root” of .
Let be a non-negative definite matrix. Consider
and is a diagonal matrix with
Then, the square root of is defined as
Note that .
Let , be a non-negative definite matrix. Let be i.i.d. with , and let .
Let be the square root of . Then, is a Gaussian vector with mean and covariance matrix . i.e., .
Proof. is a Gaussian vector as a linear transformation of a Gaussian vector. We have
4.9.3 Density of a Gaussian Vector
Let . We want to find the density of .
[In the case, we have ]
We shall consider two cases
-
is positive definite, i.e. . We can write
Note that gives . Hence,
-
is non-negative definite, and such that .
By an orthogonal change of basis, we could assume that
Let
We can then write
Let be a Gaussian vector. Let and .
If are independent, then is a diagonal matrix.
Proof 2. Consider the MGF of . We have
since we have
Hence
Since factorises into a product of functions of ‘s, by Theorem 4.32, are independent, and .
Therefore, we can conclude that if is a Gaussian vector, then are independent iff for all .
4.9.4 Bivariate Gaussian Distribution
Let be a Gaussian vector in . Let , and
Then is called a bivariate Gaussian vector with parameters and .
Note that we can write the covariance matrix of as
Let and Then the matrix
is non-negative definite.
Proof. Consider any . Then
-
If , the second line gives .
-
If , the first line gives .
Now, consider . We can write
Let . Then is a Gaussian vector since
Note that
Taking gives . Hence, and are independent. We have
5 Convergence Results and Limit Theorems
5.1 Convergence Results
A sequence of random variables converges to a random variable in probability and we write
if as ,
Let be i.i.d. random variables with mean . Let . Then
i.e.
This is called the weak law of large numbers (WLLN).
Proof. Assume that . We need to show that
We have
By definition, we have . So
Let be a sequence of random variables and be a random variable. We say that converges to almost surely (a.s.) or with probability if
We write as almost surely.
Remark. The above statement, more precisely, means that
where and .
To compare with Definition 5.1, this is equivalently saying that
Note that the quantifiers are now inside .
[It can be clearer to see the difference between convergence in probability and convergence almost surely without any shorthands. Let the underlying probability space be . Let and the sequence be random variables, where . Convergence in probability states that , Convergence almost surely states that i.e. convergence in probability cares about that as , (at each snapshot of big enough) the percentage of the population that behaves badly (away from ) shrinks to zero; whereas convergence almost surely cares about that the percentage of the population that converges to as (without leaving there) is 1. This is a much stronger condition. For a sequence to converge almost surely, the individuals can’t keep jumping away from the target infinitely often, but they could for convergence in probability. This is indeed a challenging topic. See Wikipedia, Maths Stack Exchange for further explanation.]
Proof. We need to show that as . We have
Note that . Hence
So
By assumption, the RHS is exactly .
Let be i.i.d. random variables with mean . Let . Then
This is called the strong law of large numbers (SLLN).
Proof. [Non-examinable.]
Assume that . Set . Then and
Set . We need to show that .
The goal is to show that since this will imply that with probability 1, as .
It suffices to show that . We have
By definition, we have
where is a sum of terms of the form
for distinct . By independence and the fact that , we have . Moreover,
Therefore,
So
5.2 Central Limit Theorem
Let be i.i.d. random variables with mean and variance . Set . By SLLN 5.5, we expect that
Note that
Hence has expectation and variance .
Also, we have
So, if are i.i.d. with , then
The normal distribution is universal as it appears as the limit of the distribution of no matter what the distribution of is, as long as are i.i.d. with mean and variance . This is the content of the central limit theorem (CLT).
A sequence of random variables converges to in distribution as and we write
for all where is continuous.
Proof. Let be a continuity point of . We need to show that . Let .
We have
So we have
Also,
Hence
Therefore, letting , we have
So as required.
Let be i.i.d. random variables with mean and variance . Set . Then
i.e. ,
Proof. We will need the following continuity property for moment generating functions.
Suppose are random variables with for and is a random variable with for . Assume for some .
If as for all in , then
The proof is beyond the scope of this course.
Consider . Then
It is enough to prove the theorem for a sequence , i.i.d. with mean and variance . Set . We need to show that
Assume that such that .
Set . By the continuity property for MGFs, it suffices to show that
Note that
We want to show that
We have
Claim.
Proof. Let .
However,
where . So
So
where .
Then we can conclude, because
and hence
Let be i.i.d. random variables with mean and variance . Set . Then
-
Suppose . Then where are i.i.d. with . So
Therefore, for large ,
-
Suppose with . Then
-
We can approximate Poisson distribution with normal distribution. Suppose with . Then where are i.i.d. with . So
Therefore, for large ,
5.2.1 Sampling Error via the CLT
Suppose we hold a referendum and a proportion of individuals is inclined to write “Yes”. [Every individual writes “Yes” with probability and “No” with probability .] We wish to estimate .
Sample individuals at random (for a large ) and record their votes. Let be if the -th individual voted “Yes” and otherwise.
Let be the number of “Yes” votes. Then by SLLN 5.5,
We have .
We need to testimate with an accuracy of with probability . We now wish to consider how large should be.
where for large . Then we want to find such that
Note that for large ,
Hence
Since we have and that by standard tables, we requrie
Since , taking is sufficient.
5.3 Simulation of Random Variables
Computers can generate random numbers between to . We would like to extend this to general probability distributions beyond uniform distributions.
Suppose is a random variable with probability distribution function . Suppose that we know how to simulate .
-
Assume that is 1-1.
Let . Then
-
If is not 1-1, then define its generalised inverse . Then let . We can show that as well.
Claim. iffProof. [] This follows by definition.
[] such that decreases to and . By the right continuity of ,
So . Therefore, if , then becuase is increasing.
Let . Then
5.3.1 Box-Muller Transform
We want to simulate independent with using two independent . Since
there is no closed form solution. However, recall Example 4.29.
Let be independent. Then let
We get
and hence has the correct density. Using the transformation in Example 4.29, we get :
5.4 Rejection Sampling
Suppose we have a random variable with density
where is the volume of .
Let be i.i.d. with .
Set . Let
We want to show that ,
Note that
