← IA Vectors and Matrices – Full Text
These are Zixuan’s notes for Part IA – Vectors and Matrices at the University of Cambridge in 2025. The notes are not endorsed by the lecturers or the University, and all errors are my own.
The latest version of this document is available at academic.micfong.space. Please direct any comments to my CRSid email or use the contact details listed on the site.
This document is typeset using Typst. All figures are created using Inkscape and Mathematica.
1 Complex Numbers
We should be fairly familiar with complex numbers already, but here is a recap of what should be well-covered.
1.1 Definition
We construct by adding the element to , satisfying . Then, any complex number has the form
where .
Each complex number consists of a real part and an imaginary part .
1.2 Properties
-
Addition. Given where , , we can add or subtract them by
-
Multiplication. We can multiply and by
Remark. Both addition and multiplication are associative and commutative. -
Identity. The identity element for the addition operation is the element 0. [Thus is an Abelian group with identity element 0.]
-
Inverse. For any , the inverse of is given by
and it satisfies . [Thus is an Abelian group with identity element 1.]
Moreover, distributivity is satisfied, i.e. if Then
-
Complex conjugate. For any , the complex conjugate is . With this, we can write .
Properties of complex conjugates includes:
-
Modulus. For any , we define the modulus in to be
We will sometimes denote by .
-
Argument. The argument fo a complex number is a rela number, denoted by such that
This is called the polar form of . We can write
If is an argument of , then any where is also an argument of . Therefore, to make this argument unique, we restrict the range of theta to . We call arguments within this range to be the principal value.
We denote the principal value by . We also write .
Remark.
- , since for any , we have .
- Complex numbers of the form are called pure imaginary numbers.
- The representation of a complex number in terms of real and imaginary parts is unique.
Once we have the properties above, here are a few more properties we can get to.
-
is a field.
-
For the modulus operation, we have
We can also reach the following theorem, though we will not prove it here.
A polynomial of degree with coefficients in can be written as a product of linear factors:
Hence has at least one root in , and roots connected with multiplicity.
1.3 Argand Diagram
For , we can plot in a 2-dimensional plot.
We can therefore demonstrate some operations on the diagram.
- Addition and subtraction
- Complex conjugates
This method immediately leads to some properties:
- .
1.4 De Moivre’s Theorem
For any , , we have
Proof. To prove this, we first need a lemma.
Let and . Then
Proof. Multiplying and ,
For , we have , which is true.
For , we shall prove by induction.
-
Base case. This statement is true for .
-
Inductive step. Let us assume for . Then consider the case for .
For , we write with . Thus
1.5 Exponential and Trigonometric Functions
1.5.1 Exponential Function
For , we define
This definition converges . Some fundamental properties of this function are:
-
-
if , then reduces to the usual exponential for reals
-
-
1.5.2 Trigonometric Functions
For all :
Analogously,
If , then the definitions produce the analogous results to real numbers.
From these definitions, we can write, for all ,
In particular, [Euler’s identity]
If ,
Proof. Write . Then
[] Assume that . Matching real and imaginary parts gives
Hence .
[] Assume that . Then evaluating gives .
Finally, if for , then de Moivre’s Theorem is immediate from the results above.
1.6 Roots of Unity
Let . If for some we have , then
Hence leads to .
Also, for some . Therefore .
Therefore, we get
We call the roots to be the roots of unity.
1.7 Logarithm and Complex Powers
1.7.1 Logarithm
We define, for and ,
such that
Hence we have
Note that complex logarithm is multi-valued.
1.7.2 Complex Powers
We can define, for , , that
Note that this is multi-valued in general. However
gives the same value for .
Consider .
Then and . Hence for all .
1.8 Lines and Circles in the Complex Plane
1.8.1 Lines
Taking as a point on the line, and as the direction, then a line can be expressed as
Taking conjugates, .
1.8.2 Circles
For a center and radius , we can describe a circle as
2 Vectors
A vector can be specified by a (positive) magnitude and a direction in space.
2.1 Introduction on Vectors
We can represent a vector as a line segment between two points and , and we write The vector has length and direction from to .
If we choose as the origin, then point has position vector .
A vector space over or is a set of abstract vectors equipped with operations of
- vector addition , and
- scalar multiplication
that satisfy the following axioms:
Vector addition axioms
- Commutivity: .
- Associativity: .
- Additive identity: such that for all .
- Additive inverse: , such that .
Scalar multiplication axioms
- .
- .
- .
- .
Remark.
- Vectors under form an Abelian group.
In , we define the two operations as follows:
-
Vector addition. Consider and the position vectors of two points respectively. We can construct a parallelogram and then do compositions of two vectors, such that .
-
Scalar multiplicaton. Given the position vector of a point , and , is a position vector of a point on line , with length , in the direction as shown follows.
Consider two vectors and scalars . Then
is a linear combination of and .
In general, we denote all possible linear combinations of two given vectors by
This is called that span of .
This extends to any number of vectors (possibly more than two).
2.2 Scalar Product (Dot Product)
For two vectors in , and the solid angle between them. Then the scalar product of and is given by:
Intuitively, this is the product of the parts in and which are parallel.
We have some interesting results on scalar products in general.
If are vectors and , we have
- , and if and only if
Moreover, we say that and are orthogonal or perpendicular and denote it by if
In this case, we allow for or to be .
Using the dot product we can write the projection of onto as
We can actually derive Definition 2.6.
For , define to be the solid angle between them. Then
Proof.
For any , we have
But from the definition of scalar product,
By comparing Equation 1 and Equation 2, we get
We say, for vector space , a map is called an inner product if
- .
- .
- for .
We can now form an inequality that we will encounter various times in various forms in later courses, but here we shall see a simplest formation of it.
For all , then
Proof. Consider the expression , where . Then
Here are some important observations for the Cauchy-Schwarz inequality.
Remark.
-
This inequality holds for all scalar product in any real vector space.
-
The equality holds if and only if or for some .
-
Definition 2.6 is now well-defined since Cauchy-Schwarz ensures that .
For , we have
Proof. We have
The result then follows.
2.3 Orthonormal Bases
Consider , and consider vectors , , that are orthonormal. Then we have
This is equivalent to choosing Cartesian axes along these directions. We need a few extra definitions to describe this.
We say the set is linearly independent if
Hence, is an orthonormal basis.
We can therefore denote in the following ways:
Now, for , we have
In particular, we can derive the Pythagorean rule, since
For the canonical basis of , the one that we use for the representation in terms of row or column vector,
we can represent the vectors by
respectively.
2.4 Vector Product (Cross Product) in
Consider . Their vector product is defined by
where is a unit vector that is perpendicular to both and , and is right-handed.
Remark.
-
If we change to , we obtain in the definition of instead.
-
is not defined if . However, we immediately have .
-
is not defined if or .
If are vectors in , then we have
- .
- .
- for some , or either vector is the zero vector.
- .
- .
- .
For two vectors , then is the area of the parallelogram formed by and .
If and , then the area of the triangle is given by .
Fix a vector and consider . Then, computing gives a vector that scales by and rotates it by in a plane that is orthogonal to .
2.4.1 Component Expressions
Let . Then
Consider and . We have
This is also equivalent to
2.5 Triple Products
2.5.1 Scalar Triple Product
Consider . We write
to be the scalar triple product between .
For , we have
We can interpret Proposition 2.23 using a parallelpiped.
Note that is a signed volume:
-
If , then constitute a right-handed set.
-
iff are coplanar. i.e. one of the them is a linear combination of the other two.
2.5.2 Vector Triple Product
Consider . We call
to be the vector triple product between .
For , we have
Note that the vector triple product is not associative. This is because
but
2.6 Lines, Planes and Vector Equations
2.6.1 Lines
Any point on a line through with direction has position vector given by
This form is equivalent to
where is a constant vector.
2.6.2 Planes
Any point on a plane through can be described using directions , where , with the position vector
The normal vector to the plane is
This normal vector is not a unit vector in general.
Then, we can write
The component of along is
and is the perpendicular distance from the origin to the plane.
Remark. If lie in the plane, then we can write the equation of the plane by
Consider the point of intersection between
The line equation can be re-written as . Taking vector product of this with gives
Applying vector triple product property in Proposition 2.25 gives
Hence
If , then we can compute
as the position vector of the point of intersection.
Otherwise, if , is orthogonal to . So either
- the line is parallel to the plane and never intersects the plane, or
- the line is contained within the plane.
Consider two lines
Then, the shortest distance between and is attained at a line perpendicular to both lines, with direction
The shortest distance is then computed by projecting the vector onto the unit vector in the direction of , giving
2.6.3 Spheres
A sphere in with centre and radius is given by
In general, in , a hypersphere with center and radius is given by
2.6.4 Vector Equations
Our goal is to solve equations of the form
for , where are known vectors.
Using the vector triple product identity in Proposition 2.25, we have
so that Equation 3 becomes
Taking the dot product of both sides of Equation 4 with gives
so we obtain
Hence, substituting back into Equation 4 gives
- If , the there is a unique solution given by
-
If , then by Equation 5, either
- there is no solution if , or
-
there are infinitely many solutions if . The set of solutions is given by our derived condition
which represents a plane.
2.7 Index Notation & Summation Conventions
Consider an orthonormal right-handed basis . We write vectors etc. in terms of coordinates in this basis.
From now on, we will use indices that take values .
The Kronecker delta is defined as
- It is symmetric: . Note that we can write
- For vectors , we can write
The Levi-Civita epsilon is defined as
This is to say, that
and all other combinations are zero.
- It is antisymmetric. We can write
- For vectors , we can write
2.7.1 Einstein Summation Convention
Now, we can use a more efficient notation.
In index notation, an index variable that appears twice in an expression are normally summed. To simplify notation, we omit the summation sign for repeated indices and sum over them. This is called the Einstein summation convention.
This notation follows the following rules:
-
If an index appears only once in an expression, it is a free index, so it must appear in every term of the equation, and can take any value. [We are not summing over it.]
-
If an index appears twice in a term, it is a contracted index, and we sum over all its possible values. [We are summing over it.]
-
No index can appear more than twice in a term.
Using Einstein summation convention, we can write
-
(which means )
-
-
-
-
For indices taking values , we have
2.7.2 Proofs Using Index Notation
We can now use index notation to prove the vector triple product identity.
We want to show that for ,
Proof. Using index notation, the th component of the left-hand side is
This is precisely the th component of the right-hand side.
2.7.3 Spherical Trigonometry
With index notation, we can also consider spherical trigonometry.
For , then
Proof.
Now consider a unit sphere in with centre , and points on the surface of the sphere with position vectors respectively.
The distance from to , , is an arc length on the sphere.
In the same way, .
Hence, we have
Which is the cosine rule for spherical triangles.
2.8 Vectors in
We define the following operations for vectors in .
Addition. For , we define
Scalar Multiplication. For and , we define
Any can be written as
where is the standard basis for with in the th position and elsewhere for .
For , we define their dot product to be
Hence, the components of can be determined by
Notation. If we write vectors in as columns, then for , and denote their transposes, and that their inner product can be written as
2.8.1 Summation Convention
We have
We define to be the extension of the Levi-Civita epsilon (Definition 2.32) to dimensions.
In , it can be used to define an additional scalar product:
Geometrically, this represents the signed area of the parallelogram formed by and .
2.9 Vectors in
We define the following operations for vectors in .
Addition. For , we define
Scalar Multiplication. For and , we define
- If , then is a real vector space.
- If , then is a complex vector space.
For any , we have
If we are only allowing real scalars, then we can write
where is defined to be the vector with in the th position of the imaginary part and elsewhere.
Note that forms a basis for as a real vector space, with dimension .
If we allow complex scalars, then we can define
and thus . Hence is a complex vector space with dimension . Note that forms a basis for as a complex vector space, with dimension .
2.9.1 Inner Product in
For , we define their inner product to be
2.9.1.1 Properties of the Inner Product
-
Hermitianity. .
-
Linearity and anti-linearity. ,
-
Positive definite. . Equality holds iff .
For , we define its norm to be
We say that are orthogonal if
Remark. The standard basis for is orthonormal, and
2.9.1.2 From Complex to Real Inner Products
For , take , then
Now, write and where . Then we can identify and as vectors in , with and respectively.
Then,
where is product defined in Section 2.8.1, recovers both scalar products in .
2.10 General Vector Spaces
- If the scalar field is , then is a real vector space.
- If the scalar field is , then is a complex vector space.
Consider a real vector space , and consider , we can write a linear combination:
for any .
The span of is defined as
Equivalently, a non-empty subset is a subspace if it satisfies that for every and , we have .
In particular, for any ,
is a subspace of .
2.10.1 Linear Independence and Dependence
Consider a vector space , and vectors . Consider a linear combination of these vectors:
If implies , then the vectors are linearly independent.
If there exists , not all zero, such that , then the vectors are linearly dependent.
Remark.
-
A set of vectors is linearly dependent iff one of the vectors can be expressed as a linear combination of the others.
-
In , are linearly independent iff
This can be geomtrically interpreted as the vectors not being coplanar [the LHS represents the volume of the parallelepiped spanned by the vectors].
-
in is linearly dependent, noting that .
-
in is linearly independent.
-
Any set containing is linearly dependent.
2.10.2 Inner Products
An inner product on a vector space is a function that assigns to each pair of vectors a scalar , satisfying
-
Hermitianity: .
-
Linearity and anti-linearity: ,
-
Positive definiteness: with equality iff .
We say that are orthogonal if
Proof. Suppose for contradiction that the vectors are linearly dependent. Then there exist scalars , not all zero, such that
Then
By positive definiteness, , so we must have . This holds for all , contradicting our assumption that not all are zero.
2.10.3 Basis and Dimension
A basis of a vector space is a set of vectors in that
-
spans ,
-
the vectors in are linearly independent.
If is a vector space of dimension . Then,
-
if spans , and that , we can remove vectors from to get a basis.
-
If is a linearly independent set in with , we can add vectors to to get a basis.
3 Matrices
3.1 Linear Maps
For two vector spaces and , a linear map is a function
such that
for all and all scalars .
Let be a linear map.
-
The image of under is the vector .
The image of is the set
It forms a subspace of .
-
If such that , then is in the kernel of .
The kernel of is the set
It forms a subspace of .
-
For , is called the domain of and the codomain of .
-
The dimension of the image of , , is called the rank of , denoted .
-
The dimension of the kernel of , , is called the nullity of , denoted .
Remark. For , we have
-
The zero linear map is defined by for all .
It has and .
-
The identity map is defined by for all .
It has and .
-
Consider and , with
This is a linear map. In this case, and .
We can carry out several operations on linear maps.
-
Linear combination
Let be linear maps. Then,
is still a linear map, defined by
for all and all scalars .
-
Composition
Let , be linear maps. Then,
is still a linear map, defined by
for all .
Let be a linear map, where is finite-dimensional. Then,
Proof. Let us call and . Since , we have . We have two cases:
-
. Then, , so is the zero map. Thus, and . Therefore, .
-
. Then let be a basis of . Then, for all .
We can extend to the basis of the whole :
We need to show that is a basis of .
-
Spanning. To show that spans , take . Then such that
Since , we can write
for some scalars . Thus,
Therefore, is in the span of .
-
Linear independence. To show that is linearly independent, suppose that
for some scalars . Then, by linearity of , we can write
Thus, . Therefore, since we supposed that is a basis of , we write
for some scalars . But since is a basis of , the representation of is unique. Thus, .
-
- Zero linear map. We have and . Then .
- Identity map. We have and . Then .
3.2 Matrices as Linear Maps
Let be a matrix with entries . define
such that
where
Given , with
we have
Consider the rows, and the columns of .
The image and kernel of the linear map defined by the matrix are given by
and
Proof. Let us consider the image and kernel of . The components are related in the following form:
If is the standard basis of , then, under ,
Since is a linear map, we can write
Thus, , which is the span of the columns of .
Now, for the kernel, consider .
If , then for all . Thus, is the set of vectors orthogonal to all the rows of .
-
Zero map. The zero map is defined by taking .
-
Identity map. The identity map is defined by taking , where is the identity matrix.
-
Consider the map where . Let be defined by
then, the matrix associated to is
with columns
and rows
Hence, the image and kernel of the linear map are given by
because we have that .
Then, for the kernel, we need
Hence
3.3 Geometric examples in and
3.3.1 In
-
Rotations.
Consider such that . Then, a rotation by an angle about the origin in is given by the matrix
Note that .
-
Reflections.
Consider with . Then, a reflection of angle in is given by the matrix
Note that .
3.3.2 In
-
Rotations.
-
Consider a rotation by an angle about axis . This is given by the matrix
-
Consider a rotation by an angle about the unit vector . In this case, we have
where and .
Then,
or equivalently,
This can be derived by decomposing into components parallel and perpendicular to , and then rotating the perpendicular component in the plane orthogonal to .
with
After applying , we have
-
-
Reflections.
Reflections in a plane through the origin with normal unit vector are given by
Thus we have
where
-
Dilations.
Dilations from the origin with scale factor are given by
Thus, we have
where
-
Shears.
Given with and such that , a shear with parameter is defined by
Thus, we have
where
3.4 Matrices in General
3.4.1 Definitions
Consider a linear map , with and , and take two bases of and of .
Then, can be represented by , which is an array with entries for as the rows and as the columns, such that
for . This automatically ensures that for any , , we can always write and in terms of the bases:
This means that any coefficient from the image can be written as
To summarise, given and which are real or complex vector spaces with and , and given bases of and of , then
- is identified with or .
- is identified with or .
- We identify the linear map with the matrix such that .
Remark. Consider another linear map with matrix representation with respect to the same bases. Then, for scalars ,
is represented by matrix
with coefficients
This is because addition and scalar multiplication in matrices takes place entry-wise.
Consider and . Hence and . Consider the map
The map is linear. We want to find the matrix representation of with respect to the bases
of and
of .
To determine , we need to compute for :
Therefore, for ,
Thus, the matrix representation of with respect to the given bases is
3.4.2 Matrix Multiplication
Consider linear maps and such that
We wish to compose them. The composition is given by
such that
for all .
If is represented by the matrix and is represented by the matrix , then is represented by the matrix .
Let
- be a basis of (),
- be a basis of (),
- be a basis of ().
If we consider so that is represented by the matrix , with coefficients given by
Note that
- is an matrix,
- is an matrix,
- is an matrix.
Remark.
- The number of columns of must equal the number of rows of for the product to be defined.
- has the same number of rows as and the same number of columns as .
We can also write
If we apply to a , we obtain
with
and Thus
For any three matrices such that the products below are defined, and for any scalars ,
-
-
-
.
3.4.3 Matrix Inverses
Consider three matrices , satisfiying
- The size of is ,
- The size of is ,
- The size of is .
We say that is a left inverse of if
where is the identity matrix of size .
We say that is a right inverse of if
If is a square matrix (i.e., ), then
so the left and right inverses coincide. In this case, we say that is invertible (or non-singular) and we denote its inverse by .
Remark. If has an inverse, then is a square matrix.
Not all square matrices are invertible. For example, the zero matrix is not invertible.
For two invertible matrices and of the same size,
Proof.
-
Rotation. For , we have
-
Shear. Fix . Then, for , we have
-
Reflection. If is a reflection in a plane with normal , then
3.4.4 Transpose and Hermitian Conjugate
Consider a matrix of size . Then, the transpose of is the matrix of size with entries
- If is a column vector , then is the row vector .
If is a square matrix, then is
- symmetric if ,
- antisymmetric if
Consider a matrix of size with complex entries. Then, the Hermitian conjugate of is the matrix of size is the matrix
with entries
where denotes the complex conjugate of .
If is a square, then is
- Hermitian if , i.e. for all ,
- anti-Hermitian (or skew-Hermitian) if , i.e. for all .
3.4.5 Trace
Consider any matrix , the trace is defined by
i.e. the sum of the diagonal entries.
- for the identity matrix of size .
3.4.6 Decomposition of Matrices
Any matrix is a sum of symmetric and antisymmetric parts. For a matrix that is square with real entries, we can write as , where
is the symmetric part and
is the antisymmetric part.
The symmetric part can be further decomposed:
Note that , and we call to be traceless. Noting that and , we can write
3.4.7 Orthogonal and Unitary Matrices
A real matrix is orthogonal if and only if
or equivalently,
This means that columns and rows of are orthonormal vectors. Equivalently, is orthogonal if and only if preserves the dot product, i.e. for all ,
and in this cases, preserves lengths and angles.
A complex matrix is unitary if and only if
or equivalently,
Equivalently, is unitary iff it preserves the complex inner product, i.e. for all ,
and in this cases, preserves lengths and angles.
In , consider as an orthogonal matrix. Consider the basis
- preserves norms
- preserves angles, in particular, orthogonality
Thus, we have either
or
3.5 Determinant
Consider a map given by a real matrix , where
for all .
Assume that exists, then
3.5.1 In
Consider , and let . Then,
with
Note that .
Therefore, if , then .
3.5.2 In
We shall attempt to generalise our construction of the to . Take where is a matrix with real entries. We seek a matrix and a scalar such that
We call this scalar the determinant of .
Recall that the scalar triple product of three vectors is defined by
which describes the volume of the parallelepiped formed by the three vectors.
Under the action of a matrix , volumes are scaled by a factor , where
Thus, in , the determinant of a matrix is given by
To construct , note
so that
Thus,
And hence iff is linearly independent. This is equivalent to saying , or that .
Remark. General determinants can be expanded in terms of determinants. For example,
3.5.3 Permutations
Our goal is to generalise the Levi-Civita symbol to dimensions to define the determinant of an matrix.
Consider with
We can write
where and are called cycles.
Note that disjoint permutations commute, but in general permutations do not commute.
Proof. This is because we can write
The sign of a permutation is defined by
where is the number of 2-cycles of when written as a product of -cycles.
In particular, if , then is an even permutation, and if , then is an odd permutation.
The Levi-Civita symbol in dimensions is defined by
It is totally antisymmetric.
3.5.4 Alternating Forms
For vectors in or , the rank alternating form is defined by
-
is multilinear in its arguments. i.e.
-
It is totally antisymmetric: for all .
Alternatively, for any permutation .
-
.
-
If for some , then . [Follows from (2).]
-
If for some scalars , then . [Follows from (1) and (4).]
Proof.
[] If the vectors are linearly dependent, then one of them can be written as a linear combination of the others. By property (5), the alternating form is zero.
[] If the vectors are linearly independent, then they span or . In particular, for some matrix , we can write
Hence,
Since , we have
3.5.5 Determinants in and
Consider an matrix with columns given by
The determinant of is defined by
where is the sign of the permutation . We can also write
-
The determinant is multilinear in the columns of the matrix. In particular,
for any scalar and any matrix .
-
The determinant is totally antisymmetric in the columns of the matrix. In particular, if we exchange two columns of , then the determinant changes sign.
-
for the identity matrix of any size .
-
If two rows or two columns of are equal, then .
-
If two rows or two columns of are linearly dependent, then .
-
if and only if the columns of are linearly independent.
As a consequence, under a column operation for some , the determinant is unchanged.
-
Hence, all properties above also hold for rows.
-
For any two matrices and ,
In particular, if is invertible, then
-
If is orthogonal, then .
-
If is unitary, then .
Proof. For (5), Suppose for some and scalar . Define given by
Then
Then, the th column of is all zeros. And thus
For (7), take a single term , and a in . We have
Take . Since , we have
For (8), note that swapping columns an even/odd number of times introduces a factor of . Hence,
If in two indices for some , then by property (4). Hence, we only need to consider the case where are all distinct. This means that there exists a permutation such that for all . Thus,
Therefore,
For (9), if is orthogonal, then , and thus
Hence, .
For (10), if is unitary, then , and thus
Hence, .
3.5.6 Minors and Cofactors
We want to find a way to compute determinants of matrices in an efficient way. We do this by defining minors and cofactors.
For an matrix , the cofactor of entry is defined by
Consider the columns and rows of given by
Then, the determinant of can be written as (see proof in Theorem 3.40):
We have
i.e. the cofactor is the determinant of the matrix obtained from by replacing the entry with and all other entries in row and column with .
Hence, we can write the determinant of as
for any fixed column . Alternatively, we can write
for any fixed row .
Consider an matrix. Then, for any fixed ,
Proof.
We have
Consider the permutation that moves to the th position, and leaves everything else in its natural order:
Assume (we can do a similar argument for ). Since we have to perform transpositions for , . Now consider the permutation ,
Note that reorders to . Thus,
Hence, we can rewrite
Reasoning as above, if then
Hence
The adjugate of a matrix is defined to be
where is the matrix with entries of cofactors .
Remark. From the expression above, note that
and if , then
This suggests a way to compute the inverse of a matrix using only determinants of smaller matrices.
Consider the matrix
for some arbitrary scalar . We want to compute .
By the fact that determinants are conserved under operations of the form ,
3.6 Systems of Linear Equations
3.6.1 Case
Consider the system of equations given by
We can write this system in matrix form as
where
Consider , we have
Similarly, consider , we have
Note that is . Thus, we can write
Equivalently, given , if exists, we can write
3.6.2 General Case
Consider a system of linear equations in unknown written is matrix form as
where is an matrix, .
We shall consider three possible scenarios.
-
If , then exists, and therefore there is a unique solution given by
-
If and , then there is no solution.
-
If and , then there are infinitely many solutions. We can find these solutions by considering
where is a particular solution to the system, and .
In more detail, a solution exists for
if and only if we can find for some . This is equivalent to saying that . Then, is also a solution if and only if
satisfies
Thus, the general solution is given by
for any .
Remark. In the first case, note that
In this case, if then we must have . Hence there is a unique solution.
For the other cases,
and thus either
If is a basis for , then the general solution for is
for any scalars , where .
Consider the equation
with
and
where are some scalars. We saw before that
-
Assume . Then exists, and we can construct it from the matrix of cofactors.
[ It can be computed that ]
We have
Note that . This indicates that we can simplify our matrix. Hence, the solution to the equation is
The solution is a point in .
-
Assume that , then
and then with .
The image suggests that we must have to have a solution. In this case, one particular solution is given by . Hence, the general solution is given by
for any scalars , i.e.
If , then there is no solution.
-
The case is similar to case 2.
3.6.3 The Homogeneous Case – Geometrical Interpretation
Consider the equation
Then, if are the rows of , then
Each equation represents a plane in that passes through the origin with normal . The solution to the system, which is , is the intersection of these planes.
The possible scenarios are as follows:
-
, so . This means that all the normals of the three planes are linearly independent, and thus the only intersection point is the origin.
-
. The intersection of the three planes is a line through the origin, and the three normals span a plane.
-
. The intersection of the three planes is a plane through the origin, so all three planes coincide. In this case, all normals are parallel.
3.6.4 The General Case – Geometrical Interpretation
Consider the equation
Then,
These are three planes in with normals , and in general do not pass through the origin.
The possible scenarios are as follows:
-
. All the normals are linearly independent, and thus the three planes intersect at a single point. There is a unique solution for any .
-
.
The existence of solutions depends on . More specifically, whether is in the image of .
-
if , then the planes may intersect in a line as in the homogeneous case, or there is no solution.
-
if , then either all three planes coincide as in the homogeneous case, or there is no solution.
-
3.6.5 Gaussian Elimination
Consider a system of equations in unknowns:
WLOG, we can assume that (since we can always swap rows).
Step 1. We subtract multiples of the first equation from all other equations to make the coefficients of zero in all equations except the first one.
Step 2. Repeat (1) for and coefficients of in all equations except the second one, and so on.
In all these equations, .
The possible cases are as follows.
-
and for all . Then, there is a unique solution. To obtain it, we can first find from the th equation, then substitute it into the th equation to find , and so on.
-
and for some . Then, there is no solution.
-
(and not necessarily . Then are undetermined. So, given any values of , we can solve . Then, there are infinitely many solutions given by varying .
Note that this algorithm can also be written in matrix form by
where is an matrix. This algorithm can be reexpressed to obtain
with
which is called the row echelon form of . Note that
-
the first block is upper triangular with non-zero entries on the diagonal.
-
.
-
if , , and if , then . Then, both and are invertible.
4 Eigenvalues and Eigenvectors
Eigenvalues and eigenvectors can be used to analyse and simplify matrices.
4.1 Introduction
Let be a polynomial of degree . Then
where and .
Then has precisely roots in (counting with multiplicities).
Let (for a real or complex vector space ) be a linear map. Then, a vector with is an eigenvector of if there exists a scalar (or ) such that
The scalar is called the eigenvalue corresponding to the eigenvector .
If or , and is given in terms of a matrix , then
and for a given , this holds for some vector if and only if . This is called the characteristic equation of the matrix .
Furthermore, the polynomial is called the characteristic polynomial of degree of the matrix .
Remark. From the definition of the determinant,
for some coefficients . From here we can conclude
-
has degree , and thus roots by the fundamental theorem of algebra. Hence, an matrix has eigenvalues (counting multiplicities).
-
If are real, then the coefficients are real, and thus the eigenvalues are either real or come in complex conjugate pairs.
-
and . By Vieta’s formulas, the sum of the eigenvalues is equal to the trace of the matrix:
-
Finally,
By Vieta’s formulas, the product of the eigenvalues is equal to the determinant of the matrix:
-
Consider and [representing a 90° rotation]. Then
Hence the eigenvalues are and . To find eigenvectors, for , we have
For , we have
-
Consider with . Then
Hence the only eigenvalue is with multiplicity 2. To find eigenvectors, we have
for any .
4.2 Eigenspaces and Multiplicity
For an eigenvalue of a matrix , we define its eigenspace as
The algebraic multiplicity of an eigenvalue , or , is its multiplicity as a root of the characteristic polynomial .
By the fundamental theorem of algebra, the sum of the algebraic multiplicities of all eigenvalues of an matrix is .
Consider an matrix, and an eigenvalue of . Then
-
Consider
The eigenvalue is with algebraic multiplicity . To find the eigenspace, we solve
Therefore, eigenvector is with geometric multiplicity .
Eigenspace is with .
-
Consider a reflection matrix in in plane through with normal . Then we have
Hence the eigenvalues are and . We have
-
Consider a rotation in
We have
-
Consider a rotation by angle about . Then
and we have an eigenvalue with eigenspace
There are no other real eigenvalues unless for some integer . A rotation restricted to the plane that is perpendicular to has eigenvalues .
-
Consider the matrix
Then the only eigenvalue is with algebraic multiplicity . To find the eigenspace, we solve
and we have a general solution . Therefore, the eigenspace is
with geometric multiplicity .
The defect is . The eigenvectors do not form a basis for .
4.3 Diagonolisation and Similarity
For an matirx acting on or , the following are equivalent:
-
There exists a basis of consisting of eigenvectors of . i.e. we have where
for some eigenvalue .
-
is diagonalisable, i.e. there exists an invertible matrix such that
where is a diagonal matrix, with the eigenvalues of on the diagonal:
We will prove Proposition 4.11 in the following section.
4.3.1 Linearly Independent Eigenvectors
Proof. We shall prove this by contradiction. Suppose that are linearly dependent, such that
for some scalars , not all zero.
Take the minimal for which with [reordering if necessary]
Then, applying gives
which is a linear combination of eigenvectors with non-zero coefficients. This contradicts the minimality of .
Now, we can prove Proposition 4.11.
Proof. [of Proposition 4.11]
For any matrix ,
- has columns
- has columns
where is the th column of .
This means that
where .
[] Given a basis of eigenvectors, we can construct with these eigenvectors as columns, and the above holds.
[] Given such that the above holds, the columns of are eigenvectors of . Since is invertible, its columns form a basis of .
4.3.2 Criteria for Diagonalisability
-
[Sufficient but not necessary] An matrix with distinct eigenvalues is diagonalisable.
This implies the existence of eigenvectors which are linearly independent, and then this provides a basis for or .
-
[Sufficient and necessary] For any eigenvalue ,
If with are all the different eigenvalues of a matrix, then is a linearly independent set, and its number of elements is
Hence, it forms a basis of or .
4.3.3 Similarity
We say that two matrices and of size are similar if
for some invertible matrix .
If and are similar, then,
So similar matrices represent the same linear map with respect to different bases.
Remark. For the particular case of
this means that is diagonalisable, and then
4.3.4 Hermitian and Symmetric Matrices
Recall that a matrix is called Hermitian if , and symmetric if .
Recall that the complex inner product is defined as . For , this reduces to the dot product .
Remark. If is Hermitian,
For a Hermitian matrix of size ,
-
Every eigenvalue of is real.
-
Eigenvectors corresponding to distinct eigenvalues are orthogonal.
-
If is symmetric, then for each eigenvalue , we can choose a real eigenvector so that (2) becomes
Proof.
-
Consider an eigenvector with eigenvalue . We have
Since , we have , so .
-
Let be eigenvectors with eigenvalues . Then
Since , we have .
-
We have with and are real. Let , with . Then we have
but since it is an eigenvector, so at least one of is non-zero, and we can choose this as a real eigenvector.
4.3.5 Gram-Schmidt Orthogonalisation
Given a linearly independent set of vectors in , say . We can construct a sequence of sets of the form:
- …
so that each set has the same span, each is linearly independent, and are orthonormal to each other, and orthogonal to the -vectors.
We construct this as follows:
-
First step. Let and .
This guarantees that and for all .
-
Next step. Let and .
This guarantees that and
for all .
-
Continue similarly until we reach .
We then find an orthonormal basis for each eigenspace of a Hermitian matrix .
Then, if are the distinct eigenvalues of , we have that
is an orthonormal set of consisting of eigenvectors of .
4.3.6 Unitary and Orthogonal Diagonolisation
Let be a hermitian matrix of size . Then, is diagonalisable.
More specifically,
-
There exists a basis of eigenvectors with
for eigenvalues ;
and equivalently,
-
There exists an invertible matrix such that
with the columns of representing the eigenvectors .
Remark. In addition, the eigenvectors can be chosen to be orthonormal, so that .
Equivalently, the matrix can be chosen to be unitary, so that , and that
Remark. For an real symmetric matrix , the eigenvectors can be taken to be , and can be chosen such that
Equivalently, can be chosen to be orthogonal, so that , and that
4.4 Change of Basis
Consider to be real or complex vector spaces, with
and
- to be a basis of ;
- to be a basis of ,
such that is represented by the matrix with respect to these bases. This means that
Now consider
- to be another basis of ;
- to be another basis of .
In this case, is represented by another matrix with respect to these new bases, such that
Suppose that the bases are related by
where of size and of size are invertible matrices.
With as above, we have
This defines the change of basis formula for matrices representing linear maps.
and are called the change of basis matrices.
Proof. We have
and also
Comparing coefficients of , we have, in summation notation,
Therefore,
Remark.
-
The definition of which represents with respect to and implies that the column of consists of the components of in the basis .
-
Similarly, the column of consists of the components of in the basis .
-
If we instead change in the other direction, i.e. from to and from to , then and , such that
Consider and , with
Thus, is represented by
Now consider a basis for formed by that relates to by
Hence we have
For , consider a basis formed by that relates to by
Hence we have
Therefore, the matrix representing with respect to the new bases is given by
Remark. [Special cases]
-
If with the same basis change, i.e. and , then and
Therefore, matrices represent the same linear map iff they are similar.
-
If , consider for both the standard basis , then if there exists a basis of eigenvectors of denoted by , denote , and define to be the matrix representing with respect to this basis. Then,
where has columns given by the eigenvectors . By Proposition 4.11, is diagonal, with the eigenvalues of on the diagonal. So
where for each , and thus such that
Since is the th eigenvector of , the columns of are the eigenvectors of expressed in the standard basis. Therefore, is the change of basis matrix, and is also the matrix that diagonalises .
4.4.1 Changes in Vector Components Under Change of Basis
Consider a vector space and . Assume that and are two different bases of , related by and
Then, taking into account that , we have
and hence
and this is the relation between vector components with respect to bases related by . We can write
and thus
Similarly, consider vector space and . Assume that and are two different bases of such that
and with bases related by . Then, we have
Now, if we consider the definition of a linear map in terms of a matrix , we have
Therefore,
This recovers the change of basis formula for matrices representing linear maps:
4.5 Cayley-Hamilton Theorem
Let be an matrix with
Then,
Proof.
-
Consider a general matrix of size . Then
Then, checking by direct substitution gives the result.
-
For a diagonolisable matrix, consider with eigenvalues , along with an invertible matrix such that
Note that we can compute powers of easily:
Thus
But , and so . Therefore,
-
[Non-examinable.] In the general case, let , and .
Recall that the adjugate matrix is defined such that
We will use
Comparing coefficients of on both sides,
and hence
evaluating in gives
Adding these equations gives
This completes the proof.
4.6 Quadratic Forms
We wish to study functions of the form or in , or more generally, a quadratic homogeneous polynomial of degree 2 in variables . It turns out that these can be written in matrix form as for some symmetric matrix .
A quadratic form is a function defined by
where is a real symmetric matrix of size .
We can hence write
where is diagonal with eigenvalues on the diagonal, and is a real orthogonal matrix of size with columns given by orthonormal eigenvectors of .
Setting , we can diagonalise the quadratic form:
Therefore,
Note that is the representation of in the orthonormal basis of eigenvectors of , where is the eigenvector corresponding to eigenvalue . Indeed, since the columns of are the eigenvectors , we have
and
are the components of in the orthonormal basis of eigenvectors , with the new axes along these direction called the principal axes of the quadratic form.
Since these are related to the standard axes by orthogonal , we have
In , consider with .
The eigenvalues are and the eigenvectors are
Then
with
e.g. take , , then , . Then if we set ,
defines an ellipse.
e.g. take , , then . Then if we set ,
defines a hyperbola.
Consider
-
If , then
defines an ellipsoid.
-
If , then the eigenvalues are , and the eigenvectors are
Then,
If we set , it defines a two-sheeted hyperboloid.
If we set , it defines a one-sheeted hyperboloid.
Remark. Given a matrix , can be decomposed as
where is symmetric and is antisymmetric. Note that since is antisymmetric, for all . Therefore,
This is why we only consider symmetric matrices in the definition of quadratic forms.
4.7 Quadrics and Conics
4.7.1 Quadrics
A quadric in is a hypersurface defined by
for some real symmetric matrix , and .
Hence,
The purpose of this section is to classify the solutions of this kind of equations up to geomtrical equivalence. i.e. there is no distinction between solutions related by isometries of , including
-
translations,
-
orthogonal transformations about the origin.
If to be invertible, we can complete the square by setting
then
where is a constant.
Hence we have
Now we diagonolise as for the quadratic forms before. The orthonormal eigenvectors of define principal axes (the new coordinate axes), and the eigenvalues of along with determine the shape of the quadric.
-
If all eigenvalues and , then we have an ellipsoid.
-
If eigenvalues are of both signs and , then we have a hyperboloid.
-
If has one or more zero eigenvalues, then our analysis changes. It is simplest in the standard form, where we have linear and quadratic terms to analyse.
4.7.2 Conics
-
If , we get the form
which represents
-
if , then
- if , an ellipse;
- if , a point;
- if , no solutions.
-
if and have opposite signs, then
- if , a hyperbola;
- if , a pair of lines.
-
-
If , consider and . We can diagnoalise into the original formula for quadrics to get
-
If , then the equation reduces to . This represents
- if , a pair of lines;
- if , a single line;
- if , no solutions.
-
If , we can write
for . This represents a parabola.
Note that all changes of coordinates used here are isometries of .
-
4.7.3 Standard Forms for Conics in Cartesian Coordinates
Consider
-
If eccentricity , this is an ellipse, where the semi-major axis is and the semi-minor axis is . We can write
and the foci are at .
-
If eccentricity , this is a parabola, with focus at and
-
If eccentricity , this is a hyperbola, with semi-major axis and semi-minor axis related by
The foci are at .
4.7.4 Focus-Directrix Property of Conics
The four types of conics (ellipse, parabola, hyperbola, circle) are essentially four different types of cross-sections of a cone. They can also be defined in terms of a focus point and a directrix line.
Consider the expression for the conic:
Conic sections can be defined in terms of the followings.
The eccentricity is a non-negative parameter. The eccentricity and scale properties of a conic section satisfy
-
the foci of a conic are ;
-
the directrices are the vertical lines .
A conic is the set of points whose distance from the focus is
unless , in which we will take the other directrix.
We have the following cases.
-
, the conic is an ellipse.
The equation of the ellipse is
or equivalently,
where .
In this case, the semi-major axis is and the semi-minor axis is . Additionally, if , the ellipse is a circle of radius .
-
If , the conic is a hyperbola.
The equation of the hyperbola is
or equivalently,
where .
-
If , the conic is a parabola.
The equation of the parabola is
or equivalently,
4.7.5 Polar Coordinates
We introduce a new parameter such that is the distance from the focus to the directrix. Then,
We can use polar coordinates centered on a focus, such that the focus-directrix property is:
-
For an ellipse with , we have
-
For a hyperbola with , we have
-
For a parabola with , we have
4.8 Jordan Normal Forms
This gives us a classification for complex matrices up to similarity.
Consider a matrix of size corresponding to a linear map and is similar to a matrix after a change of basis.
Any complex matrix is similar to one of the followings:
-
with , with .
-
, with .
-
, with .
Proof. has 2 roots, counting multiplicities, in . We have the following cases.
-
For distinct roots (eigenvalues) , we have . And thus eigenvectors form a basis of , diagonolised with the eigenvectors as columns of .
-
For repeated root , with , then the same argument as above applies, and is diagonolised.
-
For repeated root , with and . Let to be an eigenvector for and extend it to a basis , where is any vector linearly independent of . Hence,
Then, the matrix of the linear map w.r.t. the basis is
Note that we will only consider , otherwise we will return to case (1). Also, , otherwise we will return to case (2).
Now, defining . Then we have that, with respect to the basis , the matrix of the linear map is
with , and the columns of given by .
Any complex matrix is similar to a matrix with block form given by
where each Jordan block is a matrix of the form
with , and are the eigenvalues of and (because they are similar).
Note that the same eigenvalue may appear in multiple Jordan blocks.
is diagonalisable iff all Jordan blocks are of size .
4.9 Symmetries and Transformation Groups
4.9.1 Orthogonal Transformations and Rotations in
[This topic is discussed in more detail in IA Groups.]
Recall that is an orthogonal is equivalent to
-
,
-
for all ,
-
The columns or rows of form an orthonormal basis of .
Recall that for any orthogonal matrix .
-
preserves lengths and -dimensional (absolute) volumes.
-
preserves orientations, given by the signs of the volumes.
Geometrically, consists of all rotations in , and reflections belong to but not to .
Any element of is of the form:
- ,
- where and .
For a rotation matrix , consider
We can view this in two ways:
-
transformation of vectors (active point of view)
We have where are component sof the new vector after with respect to the standard basis.
-
change of basis (passive point of view)
Now are components of the same vector but with respect to a new orthonormal basis where
4.9.2 2D Minkowski Space and Lorentz Transformations
Consider the inner product on given by where .
If and , then
This inner product is not positive definite, since
which is not always positive. Nonetheless, it is still bilinear and symmetric.
Now let us consider how the standard basis vectors behave under this inner product. Consider amd . They are orthonormal with respect to this inner product, in the sense that
The inner product defined by
where is called the Minkowski metric.
equipped with the Minkowski metric is called a Minkowski space.
Consider associated to a linear map . This preserves the Minkowski metric iff
The matrices satisfying this condition form a group, with
4.9.2.1 General Form of Lorentz Transformations
We shall determine a general form for matrices in the Lorentz group given the conditions above.
A similar argument as for orthogonal matrices will be followed.
Using
-
, which gives .
We have
-
, which gives .
We have
-
, which gives .
This similarly gives .
-
Combining these equations, we can derive a general form
For elements of the Lorentz group, we have
using hyperbolic trigonometric identities.
4.9.2.2 Physical Interpretation of Lorentz Transformations
Fix to be constant. Any Lorentz transformation over maps it to other vector on a same curve.
Note that and must lie on the same branch of the curve, since .
We have
since and .
Now, for a physical interpretation, define with . [With the speed of light .]
Rename [time coordinate] and [space coordinate]. Then, we can interpret
so Lorentz transformations boost the time and space coordinates for an observer moving at speed relative to another observer. More details on this topic will be covered in IA Dynamics and Relativity.
The factor in Lorentz transformations gives rise to time dilation and length contraction.
For composition of velocities,
where , we get
This is the relativistic velocity addition formula.
