Mathematics Basics - Linear Algebra (Vector)

≡放荡痞女 提交于 2020-01-25 18:20:18

Vector

Intro

Vector is the most fundamental concept in linear algebra. It is also used universally in machine learning algorithms. One simple application is to write our familiar simultaneous equation into its vector form.

4x+3y=20x+y=10(1) 4x + 3y = 20\\ x + y = 10\tag{1}

Equation (1) can be written as,

(4311)(xy)=(2010) \begin{pmatrix}4&3\\1&1\end{pmatrix}*\begin{pmatrix}x\\y\end{pmatrix}=\begin{pmatrix}20\\10\end{pmatrix}

This vector form of equation is what we will use most of the time to write out machine learning equations.

In addition, we can summarize an object’s features in a vector. For example, if we have a house of 120 square meter with 2 bedrooms and 1 bathroom located at city center, we can specify the house parameters as

houseA=(120211) house_A = \begin{pmatrix}120\\2\\1\\1\end{pmatrix}

The first entry represents the area of the house. The second and third entry represent the number of bedrooms and bathrooms respectively in the house. The last entry is a boolean (0 or 1) value that indicates if the house is at the city center (value 1) or outside the city center (value 0). This vectorized express can then be used as input to our machine learning algorithms.

Lastly, we also express our model parameters as a vector. In a normal distribution, we have two parameters μ\mu and σ\sigma that specify its center and spread. So a normal distribution is represented as

(μσ)=(12) \begin{pmatrix}\mu\\\sigma\end{pmatrix}=\begin{pmatrix}1\\2\end{pmatrix}

In machine learning, we will keep on optimizing the model parameters so that they can better fit the actual data. During this process, we are effectively updating μ\mu and σ\sigma of our model vector in an iterative manner.

In this chapter, we will cover some essential topics on vector. We will start with explaining the basic vector operations. Then we will introduce one of the most important vector operation called dot product. After that, we will see how to use dot product to calculate angles between two vectors as well as how to perform vector projections. Lastly, we will discuss the basis which vectors are referenced to and how to change the vector basis.

Basic Vector Operation

There are 4 basic operations on vectors namely, addition, subtraction, scaler multiplication and modulus.

To explain these operations, we first define two vectors rr and ss where

r=(43),s=(12) r=\begin{pmatrix}4 \\ 3 \end{pmatrix},s=\begin{pmatrix}1 \\ 2 \end{pmatrix}

Plot rr and ss on a graph
Vectors

Addition

To add vector rr and ss, we add each element of vector rr and ss together.

r+s=(43)+(12)=(4+11+2)=(53) r+s=\begin{pmatrix}4\\3\end{pmatrix}+\begin{pmatrix}1\\2\end{pmatrix}=\begin{pmatrix}4+1\\1+2\end{pmatrix}=\begin{pmatrix}5\\3\end{pmatrix}

This can be shown graphically as follows. We shift vector ss parallel to the end of vector rr, connecting the end of vector rr to the beginning of the new vector ss'. The resultant vector is then from the beginning of vector rr to the end of vector ss'.
Vector Addition
It is also worth noting that vector addition is associative, i.e. r+s=s+rr+s = s+r.

Subtraction

Vector subtraction is similar to vector addition. We subtract each element of a vector from that of the other vector.

rs=(43)(12)=(4132)=(31) r-s=\begin{pmatrix}4\\3\end{pmatrix}-\begin{pmatrix}1\\2\end{pmatrix}=\begin{pmatrix}4-1\\3-2\end{pmatrix}=\begin{pmatrix}3\\1\end{pmatrix}

To solve this graphically, there are essentially two steps involved. First, we calculate the negative of vector ss,

s=(12)=(12) -s=-\begin{pmatrix}1\\2\end{pmatrix}=\begin{pmatrix}-1\\-2\end{pmatrix}

Then, we perform a normal vector addition of vector rr and vector s-s. Vector s-s is shifted parallel to the end of vector rr to form the resultant vector. This is illustrated in the graph below.
Vector Subtraction

Scalar Multiplication

Scalar multiplication calculates the multiple of a vector. Again, element-wise multiplication is performed.

2r=2(43)=(2423)=(86) 2*r=2*\begin{pmatrix}4\\3\end{pmatrix}=\begin{pmatrix}2*4\\2*3\end{pmatrix}=\begin{pmatrix}8\\6\end{pmatrix}

It is equivalent to performing vector addition multiple times

2r=(43)+(43)=(4+43+3)=(86) 2*r=\begin{pmatrix}4\\3\end{pmatrix}+\begin{pmatrix}4\\3\end{pmatrix}=\begin{pmatrix}4+4\\3+3\end{pmatrix}=\begin{pmatrix}8\\6\end{pmatrix}

Graphically, this is to extend the vector along the line where this vector lies.
Scalar Multiplication
Multiplying a vector by a negative scalar works almost the same way, except that now the vector is extending in the opposite direction.

Modulus

Lastly, modulus of a vector is the length of this vector. By Pythagoras Theorem, the square of the hypotenuse (the longest side) is equal to the sum of squares of the other two sides. To calculate the modulus of vector rr, we know that rr has a horizontal length of 4 and a vertical length of 3. Therefore,

r=42+32=5 |r|=\sqrt{4^2+3^2}=5
Modulus
The modulus operation is represented by two vertical bars (|) enclosing the vector, r|r|. This operation is not limited to 2-dimensional space. The way to calculate the modulus of a vector with more dimensions is the same - get the square root of sum of squares of the vector components.

That is all you need to know about the basic vector operations. Let’s move on to our next topic for more advanced vector operations.

Dot Product

Dot product, or sometimes is called inner product, is one of the most important vector operations. You are going to see it a lot later when we dive into the derivation of different machine learning algorithms. It also preludes how we calculate angles between two vectors and projection of one vector onto the other.

We have learnt that scalar multiplication multiplies a vector by a scalar. Dot product, on the other hand multiplies a vector by another vector. In general, for vector aa and vector bb, dot product (aba\cdot b) evaluates to:

ab=(a1a2an)(b1b2bn)=a1b1+a2b2+...+anbn a\cdot b=\begin{pmatrix}a_1\\a_2\\\vdots\\a_n\end{pmatrix}\cdot\begin{pmatrix}b_1\\b_2\\\vdots\\b_n\end{pmatrix}=a_1*b_1+a_2*b_2+...+a_n*b_n

In our example of previously defined vectors rr and ss,

rs=(43)(12)=41+32=10 r\cdot s=\begin{pmatrix}4\\3\\\end{pmatrix}\cdot\begin{pmatrix}1\\2\\\end{pmatrix}=4*1+3*2=10

There are three properties of dot product operation.

  1. Dot products are commutative, so rs=srr\cdot s=s\cdot r.
  2. Dot products are distributive, so r(s+t)=rs+rtr\cdot (s+t)=r\cdot s + r\cdot t.
  3. Dot products are not associative, so r(st)(rs)tr\cdot(s\cdot t)\neq(r\cdot s)\cdot t.

It is interesting to note that dot product of a vector by itself is equal to the square of its modulus.

rr=r1r1+r2r2++rnrn=r12+r22++rn2=r2(2) \begin{aligned} r\cdot r &= r_1*r_1 + r_2*r_2 + \cdots + r_n*r_n \\ &= r_1^2+r_2^2+\cdots+r_n^2 \\ &= |r|^2 \end{aligned} \tag{2}

Calculate Angle Between Two Vectors

Now we are ready to derive the angle between two vectors using what we have learnt on dot product.
Cosine Rule
First, let’s refresh our memory on cosine rule. Given the lengths of two sides of a triangle (a and b) and the angle between them (θ), we can calculate the length of the opposite side (c) using following formula.

c2=a2+b22abcosθ(3) c^2=a^2+b^2-2ab\cos\theta\tag{3}

If we take side a as our vector rr and side b as our vector ss, then side c is vector rsr-s as shown below.
Cosine Rule in Vector
Equation (3) can be rewritten as

rs2=(rs)(rs)=rr+ssrsrs=r2+s22rscosθ(4) \begin{aligned} |r-s|^2&=(r-s)\cdot(r-s)\\ &=r\cdot r+s\cdot s-r\cdot s-r\cdot s\\ &=|r|^2+|s|^2-2*|r|*|s|*\cos\theta \end{aligned} \tag{4}

Because we learned from previous equation (2) that square of modulus of a vector equals to the inner product of the vector by itself.

So now we have rr, ss, and rsr-s, how can we calculate the angle θ between rr and ss?

On the left hand side of equation (4) we can convert rs2|r-s|^2 to a dot product as

rs2=(rs)(rs)=rr+ssrsrs=r2+s22(rs) \begin{aligned} |r-s|^2&=(r-s)\cdot(r-s)\\ &=r\cdot r+s\cdot s - r\cdot s - r\cdot s \\ &= |r|^2+|s|^2-2*(r\cdot s) \end{aligned}

Substitute this back to equation (4), we get

r2+s22(rs)=r2+s22rscosθrs=rscosθcosθ=rsrs(5) \begin{aligned} |r|^2+|s|^2-2*(r\cdot s) &= |r|^2+|s|^2-2*|r|*|s|*\cos\theta\\ r\cdot s &= |r|*|s|*\cos\theta\\ \cos\theta &= \frac{r\cdot s}{|r|*|s|} \end{aligned} \tag{5}

Therefore, angle θ between vector rr and ss can be calculated by dot product of rr and ss and modulus of rr and ss.

We are also interested in some special angle θ between rr and ss. For example

  • When θ = 0, rr and ss are in the same direction. cosθ=rsrs=1\cos\theta=\frac{r\cdot s}{|r|*|s|}=1
  • When θ = 90, rr and ss are orthogonal to each other. $\cos\theta=\frac{r\cdot s}{|r|*|s|}=$0
  • When θ = 180, rr and ss are in the opposite directions. cosθ=rsrs=1\cos\theta=\frac{r\cdot s}{|r|*|s|}=-1

Vector Projection

Another important concept in vectors is projection. For vector rr and ss, we can draw a line ss_\perp from ss to rr such that ss_\perp is perpendicular to rr. The length ss' represents the projection of vector ss on vector rr.
Vector Projection
We know that

cosθ=ss(6) \cos\theta = \frac{|s'|}{|s|} \tag{6}

Substitute equation (6) into (5)

ss=rsrss=rsr \begin{aligned} \frac{|s'|}{|s|}&=\frac{r\cdot s}{|r|*|s|}\\ |s'|&=\frac{r\cdot s}{|r|} \end{aligned}

s|s'| is called the scalar projection of vector ss onto rr. It has only magnitude, but no direction. In order to find the direction of the projection, we need to use following formula

s=srr=rsrrr s'=|s'|*\frac{r}{|r|}=\frac{r\cdot s}{|r|}*\frac{r}{|r|}

rr\frac{r}{|r|} is the unit length vector in the direction of rr. Multiplying the scalar projection s|s'| with unit length vector rr\frac{r}{|r|} gives us the projection in the direction of rr. This is called the vector projection of ss onto rr.

With this, we have concluded our discussion on dot product operation on vectors and the calculation of angle and projection between two vectors. In the next topic, we will see these concepts in action when they are applied to changing basis of a vector

Changing Basis

So far we have only seen vectors in their own coordinate systems. It would be worthwhile to define the coordinate system or basis where our vectors are referenced to.

We can express a 2-dimensional vector as the sum of two basis vectors. For our vector r=(43)r=\begin{pmatrix}4\\3\end{pmatrix}, we can define 2 basis vectors e1=(10)e_1=\begin{pmatrix}1\\0\end{pmatrix} and $ e_2=\begin{pmatrix}0\1\end{pmatrix}$such that r=4e1+3e2r = 4e_1 + 3e_2.
Basis Vectors
However, the choice of basis vectors e1e_1 and e2e_2 is arbitrary. It depends entirely on how coordinate systems are set up. You might want to have 2 basis vectors that are of unequal lengths or are not orthogonal to each other. Let’s see what happens when we change the basis vectors to a different one.

For example, we can define a new set of basis vectors b1=(21)b_1=\begin{pmatrix}2\\1\end{pmatrix} and b2=(24)b_2=\begin{pmatrix}-2\\4\end{pmatrix}. b1b_1 and b2b_2 are defined on the basis vector e1e_1 and e2e_2. What is our vector rr expressed in b1b_1 and b2b_2 now?
New Basis Vectors
This is where vector projection comes into play. We need to calculate the vector projection of rr on the new basis vector b1b_1 and b2b_2 respectively.

To calculate the vector projection of rr onto b1b_1,

rb1b11b1=(43)(21)(21)(21)=115 \begin{aligned} \frac{r\cdot b_1}{|b_1|}*\frac{1}{|b_1|}&=\frac{\begin{pmatrix}4\\3\end{pmatrix}\cdot \begin{pmatrix}2\\1\end{pmatrix}}{\begin{pmatrix}2\\1\end{pmatrix}\cdot \begin{pmatrix}2\\1\end{pmatrix}}\\ &=\frac{11}{5} \end{aligned}

rb1b1\frac{r\cdot b_1}{|b_1|} gives us the scalar projection of rr onto b1b_1. By dividing that by the magnitude of b1b_1, we know that the projection is 115\frac{11}{5} of length of b1b_1, thus the vector projection of rr onto b1b_1 is 115b1\frac{11}{5}b_1.

Similarly, we can calculate the vector projection of rr onto b2b_2.

rb2b21b2=(43)(24)(24)(24)=420=15 \begin{aligned} \frac{r\cdot b_2}{|b_2|}*\frac{1}{|b_2|}&=\frac{\begin{pmatrix}4\\3\end{pmatrix}\cdot \begin{pmatrix}-2\\4\end{pmatrix}}{\begin{pmatrix}-2\\4\end{pmatrix}\cdot \begin{pmatrix}-2\\4\end{pmatrix}}\\ &=\frac{4}{20}\\ &=\frac{1}{5} \end{aligned}

So our vector rr can be expressed as a vector sum of b1b_1 and b2b_2.

r=115b1+15b2 r=\frac{11}{5}b_1+\frac{1}{5}b_2

If we evaluate this expression by substituting vectors b1b_1 and b2b_2 in the original e1e_1 and e2e_2 basis, we will get back our original vector rr.

r=115b1+15b2=115(21)+15(24)=(225115)+(2545)=(43) \begin{aligned} r&=\frac{11}{5}b_1+\frac{1}{5}b_2\\ &=\frac{11}{5}*\begin{pmatrix}2\\1\end{pmatrix}+\frac{1}{5}*\begin{pmatrix}-2\\4\end{pmatrix}\\ &=\begin{pmatrix}\frac{22}{5}\\\frac{11}{5}\end{pmatrix}+\begin{pmatrix}-\frac{2}{5}\\\frac{4}{5}\end{pmatrix}\\ &=\begin{pmatrix}4\\3\end{pmatrix} \end{aligned}

Note here b1b_1 and b2b_2 are orthogonal to each other. We can verify this by calculating the cosine of angle θ between b1b_1 and b2b_2.

cosθ=b1b2b1b2=(21)(24)520=010=0 \begin{aligned} \cos\theta&= \frac{b_1\cdot b_2}{|b_1|*|b_2|}\\ &=\frac{\begin{pmatrix}2\\1\end{pmatrix}\cdot \begin{pmatrix}-2\\4\end{pmatrix}}{\sqrt5*\sqrt20} \\ &=\frac{0}{10} \\ &=0 \end{aligned}

Since cosθ=0\cos\theta=0, θ=90\theta=90.

So we have successfully converted the basis for our vector rr from the original basis vectors e1e_1 and e2e_2 to the new basis vectors b1b_1 and b2b_2. This method of change basis works as long as the new basis vectors are orthogonal to each other. For a more general case where the new basis vectors can have any angle between them involves a different matrix operation which will be covered in our next chapter.

When we extend this method to 3 or more dimensional space, it is critical that the additional basis vector is not a linear combination of existing ones. This property is called linearly independent. It means we cannot find a value ⍺ and β that satisfies the linear equation below, so b3b_3 does not lie on the same plane as b1b_1 and b2b_2.

b3=αb1+βb2 b_3=\alpha*b_1+\beta*b_2

That is it! We have completed our discussion on vectors in linear algebra. You have built a solid foundation for what we will explore further in future chapters.


(Inspired by Mathematics for Machine Learning lecture series from Imperial College London)

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!