Example Problems on Linear Algebra and The Normal Equations
ST705 Homework 3 on Linear Algebra and The Normal Equations
1
Let $G(\beta) = (y - X\beta)’W(y - X\beta)$. Derive $\partial{G} / \partial{\beta}$.
\[G(\beta) = (y - X\beta)'W(y - X\beta) = y^T W y - y^T W X \beta - \beta^T X^T W y + \beta^T X^T W X \beta\] \[\begin{align} \partial{G} / \partial{\beta} & = \partial/ \partial{\beta} \Big[ y^T W y - y^T W X \beta - \beta^T X^T W y + \beta^T X^T W X \beta \Big] \\ & = 0 - (y^T W X) ^T - \partial/ \partial{\beta} \Big[ \beta^T X^T W y \Big] + (X^TWX + (X^T W X)^T) \beta \\ & =(y^T W X) ^T - \partial/ \partial{\beta} \Big[ \beta^T X^T W y \Big] + (X^TWX + (X^T W X)^T) \beta \\ \end{align}\]Notice that $\beta^T X^T W y - (\beta^T X^T W y)^T = y W^T X \beta$ because it’s a scalar.
\[\begin{align} \partial{G} / \partial{\beta} & = (y^T W X) ^T - \partial/ \partial{\beta} \Big[ y W^T X \beta\Big] + (X^TWX + (X^T W X)^T) \beta \\ & = -X^T W^T y - X^T W y + (X^TWX + (X^T W X)^T) \beta \end{align}\]2
Let $A \in \mathbb{R}^{n\times p}$
a
Prove that if $A^{g}$ is a generalized inverse of $A$ (i.e., only satisfying $AA^{g}A = A$), then $(A^{g})’$ is a generalized inverse of $A’$. Conclude from this fact that $P_{X} := X(X’X)^{g}X’$ is symmetric (this finishes the proof of a theorem about $P_{X}$ from lecture).
\[\begin{align} A A^g A & = A \\ (A A^g A)^T & = A^T \\ A^T (A^g)^T A^T & = A^T \end{align}\]Thus, $(A^g)^T = (A^T)^g$.
Notice that $P_X = P_X^T$.
\[\begin{align} P_X & = X (X^T X)^g X^T \\ & = (X (X^T X)^g X^T )^T \\ & = X ((X^T X)^g)^T X^T \\ & = X ((X^T X)^T)^g X^T \\ & = X (X^T X)^g X^T. \end{align}\]b
Prove the existence and uniqueness of the Moore-Penrose generalized inverse, usually denoted $A^{+}$, of $A$.
From our work on the previous homework, we know that every matrix has a Moore-Penrose inverse via its SVD. Thus, we have existence.
Say we have two generalized inverses $A_1^+$ and $A_2^+$. In order to show uniqueness we need to show
\[A_1^+ = A_1^+ A A_1^+ = A_1^+ A A_2^+ = A_2^+ A A_2^+ = A_2^+.\]Thus, it is sufficient to show $A A_1^+ = A A_2^+$ and $A_1^+ A = A_2^+ A$. We will use our rules of Moore-Penrose inverses.
\[\begin{align} A A_1^+ & = A A_2^+ A A_1^+ & (1)\\ & = (A A_2^+)^T (A A_1^+)^T & (3) \\ & = (A_2^+)^T A^T (A_1^+)^T A^T \\ & = (A_2^+)^T A^T & (1,3) \\ & = (A A_2^+)^T & (3) \\ & = A A_2^+ \end{align}\] \[\begin{align} A_1^+ A & = A_1^+ A A_2^+ A & (1) \\ & = (A_1^+ A)^T (A_2^+ A)^T & (3) \\ & = A^T (A_1^+)^T A^T (A_2^+)^T \\ & = A^T (A_2^+)^T & (1, 3) \\ & = A_2^+ A & (3) \end{align}\]c
Show that if $A$ has full column rank, then $A^{+} = (A’A)^{-1}A’$.
Notice that since $A$ is full column rank $\text{rank}( A ) = \text{rank}( A^T A )$. So $ A^T A $ is square and has full column rank, so it is invertible. Now we need to show that the 4 conditions hold for the Moore-Penrose generalized inverse.
\[\begin{align} A A^+ A & = A (A'A)^{-1}A' A \\ & = A (A'A)^{-1} (A' A) \\ & = A I \\ & = A \end{align}\] \[\begin{align} A^+ A A^+ & = (A'A)^{-1} A' A (A'A)^{-1}A' \\ & = (A'A)^{-1} (A' A) (A'A)^{-1}A' \\ & = (A'A)^{-1}A' \\ & = A^+ \end{align}\] \[\begin{align} (A^+ A)^T & = ((A'A)^{-1}A' A)' \\ & = A' (A')' (A'A)^{-1}\\ & = A' A (A'A)^{-1} \\ & = (A' T A) (A'A)^{-1} \\ & = I \\ & = (A'A)^{-1} (A' A) \\ & = A^+ A \end{align}\] \[\begin{align} (A A^+)^T & = (A (A'A)^{-1}A' )^T \\ & = A (A'A)^{-1} A^T \\ & = A A^+ \end{align}\]d
Show that if $A$ has full row rank, then $A^{+} = A’ (AA’)^{-1}$.
Since $A$ has full row rank, $A^T$ has full column rank. Thus, $A A^T$ is invertible. Now we need to verify the Moore-Penrose conditions.
\[\begin{align} A A^+ A & = A A^T (A A ^T )^{-1} A \\ & = (A A^T) (A A ^T )^{-1} A \\ & = A \end{align}\] \[\begin{align} A^+ A A^+ & = A^T (A A ^T )^{-1} A A^T (A A ^T )^{-1} \\ & = A^T (A A ^T )^{-1} (A A^T) (A A ^T )^{-1} \\ & = A^T (A A ^T )^{-1} \\ & = A^+ \end{align}\] \[\begin{align} (A A^+)^T & = (A A^T (A A ^T )^{-1})^T \\ & = ((A A ^T )^{-1})^T A A^T \\ & = I \\ & = (A A^T) (A A ^T )^{-1} \\ & = A A^+ \end{align}\] \[\begin{align} (A^+ A)^T & = (A^T (A A ^T )^{-1} A )^T \\ & = A^T (A A^T)^{-1} A \\ & = A^+ A \end{align}\]3
Let $S$ be a nonempty subset of an inner product space $V$. The orthogonal complement to the set $S$ is defined as
\[S^{\perp} := \{x \in V : \langle x,y\rangle = 0 \text{ for every } y \in S\}.\]a
Show that $S^{\perp}$ is a subspace of $V$ for any $S \subseteq V$.
In order to show that $S^{\perp}$ is a subspace we have to show three conditions:
1: That there is a zero element. Take $y \in S$. Then $\langle 0 , y \rangle = 0$ by the definition of the inner product. Thus, $0 \in S^{\perp}$.
2: That the space is closed under vector addition. Take $a, b \in S^\perp$ and $y \in S$. Then $\langle a , y \rangle = \langle b , y \rangle = 0$. Then,
\[\langle a + b, y \rangle = \langle a , y \rangle + \langle b , y \rangle = 0 + 0.\]Thus, $a + b \in S^\perp$.
3: That the space is closed under scalar multiplication. Take $x \in S^\perp$, $y \in S$ and $a \in F$. Then,
\[\langle a x , y \rangle = a \langle x , y \rangle = a \cdot 0 = 0.\]Thus. $a x \in S^\perp$.
b
Let $W \subseteq V$ be a finite dimensional subspace, and let $y \in V$. Show that there exist unique vectors $u \in W$ and $z \in W^{\perp}$ such that $y = u + z$.
Take ${ u_1, \dots u_k }$ to be an orthonormal basis for $W$ (which is guaranteed to exist by Gram-Schmidt). Then define $u = \sum_{ i=1 }^{ k } \langle y , u_i \rangle u_i$ and $z = y - u$. Then $y = u + z$. We need to show that $\langle u , z \rangle = 0$.
\[\begin{align} \langle u , z \rangle & = \langle u , y-u \rangle \\ & = \langle u , y \rangle - \langle u , u \rangle \\ & = \langle \sum_{ i=1 }^{ k } \langle y , u_i \rangle u_i , y \rangle - \langle \sum_{ i=1 }^{ k } \langle y , u_i \rangle u_i , \sum_{ i=1 }^{ k } \langle y , u_i \rangle u_i \rangle \\ & = \sum_{ i=1 }^{ k } \langle y , u_i \rangle^2 - \sum_{ i } \sum_{ j } \langle y , u_i \rangle \langle y , u_i \rangle \langle u_i , u_j \rangle \\ & = \sum_{ i=1 }^{ k } \langle y , u_i \rangle^2 - \sum_{ i=1 }^{ k } \langle y , u_i \rangle^2 \\ & = 0 \end{align}\]Thus $y = \sum a_i v_i \rightarrow a_i = \langle y , v_i \rangle$ if the basis is orthonormal. By construction, $u \in W$. So now we must show that $z \in W^\perp$.
\[\begin{align} \langle v, z \rangle & = \langle \sum a_i u_i , y - \sum \langle y , u_i \rangle u_i \rangle \\ & = \langle \sum a_i u_i , y \rangle - \langle \sum a_i u_i , \sum \langle y , u_i \rangle u_i \rangle \\ & = 0 & \text{same as above} \end{align}\]Thus, $z \in W^\perp$. Now to show uniqueness assume $u_1 + z_1 = u_2 + z_2 = y$. Then $u_1 - u_2 = z_2 - z_1$ where $u_1 - u_2 \in W$ and $z_2 - z_1 \in W^\perp$. Since they are equal and in the intersection, they must be 0. Thus, $u_1 = u_2 $ and $z_1 = z_2$.
c
Let $X \in \mathbb{R}^{n\times p}$. Verify that column$(X)$ and null$(X’)$ are orthogonal complements.
Take $y \in \text{ column }( X )\rightarrow y = Xv$ and $W \in \mathcal N (X^T) \rightarrow X^T W = 0$.
\[y^T W = v^T X^T W = v^T (0) = 0\]Thus, they are orthogonal. If $dim(A) = m$ and $\text{rank}( A )= \text{rank}( A^T ) = r$. Then $dim(\mathcal C (A)) = \text{rank}( A ) = r$ and $dim( \mathcal N (A^T) ) = m - \text{rank}( A^T ) - m-r$.
4 (2.8)
Using $X$ and $P_X$ from Example 2.3, and $y^T = (1, 3, 5, 7)$:
a
Construct all solutions to the normal equations.
\[\begin{align} \begin{bmatrix} 1 & 0 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 2 \\ 1 & 1 & 2 \end{bmatrix} \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix} & = \frac{ 1 }{ 2 } \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \end{bmatrix} \begin{bmatrix} 1 \\ 3 \\ 5 \\ 7 \end{bmatrix} \\ \begin{bmatrix} b_1 + b_3 \\ b_1 + b_3 \\ b_1 + b_2 + 2 b_3 \\ b_1 + b_2 + 2 b_3 \end{bmatrix} & = \begin{bmatrix} \frac{ 1 }{ 2 }(1+3) \\ \frac{ 1 }{ 2 }(1+3) \\ \frac{ 1 }{ 2 }(5+7) \\ \frac{ 1 }{ 2 }(5+7) \end{bmatrix} \end{align}\]Fix $b_1 = x$. Then $b_3 = 2-x$ and
\[\begin{align} x + b_2 + 2 (2-x) & = 6 \\ b_2 & = 2+x. \end{align}\]b
Construct all solutions to $Xb = P_X y$
\[\begin{align} \begin{bmatrix} b_1 + b_3 \\ b_1 + b_3 \\ b_1 + b_2 + 2 b_3 \\ b_1 + b_2 + 2 b_3 \end{bmatrix} & = \begin{bmatrix} \frac{ 1 }{ 2 }(1+3) \\ \frac{ 1 }{ 2 }(1+3) \\ \frac{ 1 }{ 2 }(5+7) \\ \frac{ 1 }{ 2 }(5+7) \end{bmatrix} \end{align}\]If we set $b_1 = x$ then notice that we will get the same solutions as above.
\[\begin{align} b_1 & = x \\ b_2 & = 2+x \\ b_3 & = 2-x \end{align}\]5 (2.9)
Using $P_X$ from Example 2.3, show that $1 \in \text{ column }( X )$. Also compute $P_X - P_1$.
$1$ is in the column space of $X$ since the first column of $X$ is the $1$ vector itself.
\[P_X - P_1 = \frac{ 1 }{ 2 } \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \end{bmatrix} - \frac{ 1 }{ 4 } \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{bmatrix} = \frac{ 1 }{ 4 } \begin{bmatrix} 1 & 1 & -1 & -1 \\ 1 & 1 & -1 & -1 \\ -1 & -1 & 1 & 1 \\ -1 & -1 & 1 & 1 \end{bmatrix}\]6 (2.11)
A design matrix for a polynomial regression problem with only three design points takes the form
\[X = \begin{bmatrix} 1 & -1 & 1 & -1 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{bmatrix}\]a
What is the rank of $X$?
There are 3 linearly independent rows and columns of $X$, so $\text{rank}( X ) = 3$.
b
What is the dimension of $\mathcal{C}(X)$ and a nonzero vector in it.
The dimension of the column space of $X = \text{rank}( X ) = 3$. One vector in it is
\[v = c_1 \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{bmatrix}\]c
What is the dimension of $\mathcal{N}(X)$ and a nonzero vector in it.
\[\begin{align} dim(X) & = \text{rank}( X ) + dim(\mathcal N (X)) \\ 4 & = 3 + dim(\mathcal N (X)) \\ dim(\mathcal N (X)) & = 1 \end{align}\]One nonzero vector in is it:
\[v = \begin{bmatrix} 0 \\ 1 \\ 0 \\ -1 \end{bmatrix} \rightarrow X v = \begin{bmatrix} 0 + -1 \cdot 1 + 0 + -1 \cdot 1 \\ 0 \cdot 1+ 0 \\ 0 \cdot 1 + 0 \\ 0 \cdot 1 + 1 \cdot 1 + 0 \cdot 1 + 1 \cdot -1 \\ 0 \cdot 1 + 1 \cdot 1 + 0 \cdot 1 + 1 \cdot -1 \end{bmatrix} = 0\]d
Give the dimension of $\mathcal{ C }(X^T)$ and a nonzero vector in it.
$\mathcal C (X^T) = \mathcal R(X) = 3$.
\[v = r_1 = \begin{bmatrix} 1 \\ -1 \\ 1 \\ -1 \end{bmatrix}\]e
Compute $X^T X$ and find a generalized inverse for it. (No need to check $AGA=A$).
\[X^T X = \begin{bmatrix} 1 & 1 & 1 & 1 & 1\\ -1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 1 & 1 \end{bmatrix} \begin{bmatrix} 1 & -1 & 1 & 1 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{bmatrix} = \begin{bmatrix} 5 & 1 & 3 & 1 \\ 1 & 3 & 1 & 3 \\ 1 & -1 & 1 & -1 \\ 1 & 3 & 1 & 3 \end{bmatrix}\]We can use R to caculate the SVD and the Moore-Penrose generalized inverse.
svd(X)
$d
[1] 2.940034e+00 2.130947e+00 9.029217e-01 4.950053e-17
$u
[,1] [,2] [,3] [,4]
[1,] 0.6342573 -0.5271852 -0.56550283 8.681328e-18
[2,] 0.4242986 0.5634958 -0.04942893 -7.071068e-01
[3,] 0.4875030 -0.2949928 0.82177862 1.110223e-16
[4,] 0.4242986 0.5634958 -0.04942893 7.071068e-01
$v
[,1] [,2] [,3] [,4]
[1,] 0.09291154 -0.9146966 0.3933161 1.896487e-17
[2,] 0.21573128 -0.2473949 -0.6263033 7.071068e-01
[3,] 0.21573128 -0.2473949 -0.6263033 -7.071068e-01
[4,] 0.67018187 0.1430414 0.1743428 -1.051618e-16
[5,] 0.67018187 0.1430414 0.1743428 -1.051618e-16
library(MASS)
X_ginv = ginv(X)
This gives
\[A^+ = \begin{bmatrix} 0 & -0.25 & 0.5 & -0.25 \\ 0.5 & 0 & -0.5 & 0 \\ 0.5 & 0 & -0.5 & 0 \\ 0 & 0.125 & 0.25 & 0.125 \\ 0 & 0.125 & 0.25 & 0.125 \end{bmatrix}.\]Now we can check that this meets all of the requirements to be Moore-Penrose inverse.
\[X X^+ X\]round(X %*% X_ginv %*% X, 3)
1 1 1 1 1
-1 0 0 1 1
1 0 0 1 1
-1 0 0 1 1
This is equal to $X$, so the first condition is met.
\[X^+ X X^+\]round(X_ginv %*% X %*% X_ginv, 3)
0.0 -0.250 0.50 -0.250
0.5 0.000 -0.50 0.000
0.5 0.000 -0.50 0.000
0.0 0.125 0.25 0.125
0.0 0.125 0.25 0.125
This is equal to $X^+$, so the second condition is met.
Now we need to check that $X^+X$ and $X X^+$ are symmetric.
> round(X_ginv %*% X)
1 0 0 0 0
0 1 1 0 0
0 1 1 0 0
0 0 0 1 1
0 0 0 1 1
> round(t(X_ginv %*% X))
1 0 0 0 0
0 1 1 0 0
0 1 1 0 0
0 0 0 1 1
0 0 0 1 1
> round(X %*% X_ginv)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 1
[3,] 0 0 1 0
[4,] 0 1 0 1
> round(t(X %*% X_ginv))
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 1
[3,] 0 0 1 0
[4,] 0 1 0 1
These are indeed symmetric, so $X^+$ is the Moore-Penrose inverse of $X$.
f
Which of the following vectors could be $P_X y$ for an appropriate vector $y$?
\[\begin{bmatrix} 3 \\ 1 \\ 1\\ 2\\ 2 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \\ 1\\ -1\\ 1 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 0\\ 2\\ 2 \end{bmatrix}\] \[\begin{bmatrix} 3 \\ 1 \\ 1\\ 2\\ 2 \end{bmatrix}\]is an appropriate vector because it is in the column space of $X$. It is $c_1 + \frac{ 3 }{ 2 }c_3 - \frac{ 1 }{ 2 } c_2$.
\[\begin{bmatrix} 1 \\ 0 \\ 0\\ 2\\ 2 \end{bmatrix}\]is also an appropriate vector because it is in the column space of $X$. It is $\frac{ 1 }{ 2 }c_2 + \frac{ 3 }{ 2 } c_3$.
g
Using the right-hand side you gave above in (f), find all solutions to the equations $Xb = P_X y$.
\[\begin{align} Xb & = P_X y \\ \begin{bmatrix} b_1 - b_2 + b_3 - b_4 \\ b_1 \\ b_1 \\ b_1 + b_2 + b_3 + b_4 \\ b_1 + b_2 + b_3 + b_4 \end{bmatrix} & = \begin{bmatrix} 3 \\ 1 \\ 1\\ 2\\ 2 \end{bmatrix} \end{align}\]Then $b_1 = 1$ and take $b_2 = x$. Then,
\[\begin{align} b_1 - b_2 + b_3 - b_4 & = 3 \\ 1 - x + b_3 - b_4 & = 3 \\ b_3 & = 2 + x + b_4 \\ \\ b_1 + b_2 + b_3 + b_4 & = 2 \\ 1 + x + 2 + x + b_4 + b_4 & = 2 \\ b_4 & = -1/2 -x \\ b_3 & = 3/2. \end{align}\]Also,
\[\begin{align} Xb & = P_X y \\ \begin{bmatrix} b_1 - b_2 + b_3 - b_4 \\ b_1 \\ b_1 \\ b_1 + b_2 + b_3 + b_4 \\ b_1 + b_2 + b_3 + b_4 \end{bmatrix} & = \begin{bmatrix} 1 \\ 0 \\ 0\\ 2\\ 2 \end{bmatrix}. \end{align}\]Then take $b_1 = 0$, and $b_2 = x$. Then,
\[\begin{align} b_1 - b_2 + b_3 - b_4 & = 3 \\ 0 - x + b_3 - b_4 & = 1 \\ b_3 & = 1 + b_4 + x \\ \\ b_1 + b_2 + b_3 + b_4 & = 2 \\ 0 + x + 1 + b_4 + x + b_4 & = 2 \\ b_4 & = \frac{ 1 }{ 2 }- x \\ b_3 & = \frac{ 3 }{ 2 }. \end{align}\]h
Which of the following had the same column space as $X$?
\[U = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{bmatrix} W = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 0 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ \end{bmatrix}\]$U$ has the same column space as $X$. It’s columns are linear combinations of columns in $X$.
\[\frac{ 1 }{ 2 } (c_3 - c_3) = \frac{ 1 }{ 2 } \begin{bmatrix} 1 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} - \frac{ 1 }{ 2 } \begin{bmatrix} -1 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} = u_1\] \[c_1 - c_3 = \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{bmatrix} - \begin{bmatrix} 1 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ 1 \\ 1 \\ 0 \\ 0 \end{bmatrix} = u_2\] \[\frac{ 1 }{ 2 }(c_2 + c_3) = \frac{ 1 }{ 2 } \begin{bmatrix} -1 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} + \frac{ 1 }{ 2 } \begin{bmatrix} 1 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} = u_3\]