Example Problems on Linear Algebra and The Normal Equations

ST705 Homework 3 on Linear Algebra and The Normal Equations

1

Let $G(\beta) = (y - X\beta)’W(y - X\beta)$. Derive $\partial{G} / \partial{\beta}$.

\[G(\beta) = (y - X\beta)'W(y - X\beta) = y^T W y - y^T W X \beta - \beta^T X^T W y + \beta^T X^T W X \beta\] \[\begin{align} \partial{G} / \partial{\beta} & = \partial/ \partial{\beta} \Big[ y^T W y - y^T W X \beta - \beta^T X^T W y + \beta^T X^T W X \beta \Big] \\ & = 0 - (y^T W X) ^T - \partial/ \partial{\beta} \Big[ \beta^T X^T W y \Big] + (X^TWX + (X^T W X)^T) \beta \\ & =(y^T W X) ^T - \partial/ \partial{\beta} \Big[ \beta^T X^T W y \Big] + (X^TWX + (X^T W X)^T) \beta \\ \end{align}\]

Notice that $\beta^T X^T W y - (\beta^T X^T W y)^T = y W^T X \beta$ because it’s a scalar.

\[\begin{align} \partial{G} / \partial{\beta} & = (y^T W X) ^T - \partial/ \partial{\beta} \Big[ y W^T X \beta\Big] + (X^TWX + (X^T W X)^T) \beta \\ & = -X^T W^T y - X^T W y + (X^TWX + (X^T W X)^T) \beta \end{align}\]

2

Let $A \in \mathbb{R}^{n\times p}$

a

Prove that if $A^{g}$ is a generalized inverse of $A$ (i.e., only satisfying $AA^{g}A = A$), then $(A^{g})’$ is a generalized inverse of $A’$. Conclude from this fact that $P_{X} := X(X’X)^{g}X’$ is symmetric (this finishes the proof of a theorem about $P_{X}$ from lecture).

\[\begin{align} A A^g A & = A \\ (A A^g A)^T & = A^T \\ A^T (A^g)^T A^T & = A^T \end{align}\]

Thus, $(A^g)^T = (A^T)^g$.

Notice that $P_X = P_X^T$.

\[\begin{align} P_X & = X (X^T X)^g X^T \\ & = (X (X^T X)^g X^T )^T \\ & = X ((X^T X)^g)^T X^T \\ & = X ((X^T X)^T)^g X^T \\ & = X (X^T X)^g X^T. \end{align}\]

b

Prove the existence and uniqueness of the Moore-Penrose generalized inverse, usually denoted $A^{+}$, of $A$.

From our work on the previous homework, we know that every matrix has a Moore-Penrose inverse via its SVD. Thus, we have existence.

Say we have two generalized inverses $A_1^+$ and $A_2^+$. In order to show uniqueness we need to show

\[A_1^+ = A_1^+ A A_1^+ = A_1^+ A A_2^+ = A_2^+ A A_2^+ = A_2^+.\]

Thus, it is sufficient to show $A A_1^+ = A A_2^+$ and $A_1^+ A = A_2^+ A$. We will use our rules of Moore-Penrose inverses.

\[\begin{align} A A_1^+ & = A A_2^+ A A_1^+ & (1)\\ & = (A A_2^+)^T (A A_1^+)^T & (3) \\ & = (A_2^+)^T A^T (A_1^+)^T A^T \\ & = (A_2^+)^T A^T & (1,3) \\ & = (A A_2^+)^T & (3) \\ & = A A_2^+ \end{align}\] \[\begin{align} A_1^+ A & = A_1^+ A A_2^+ A & (1) \\ & = (A_1^+ A)^T (A_2^+ A)^T & (3) \\ & = A^T (A_1^+)^T A^T (A_2^+)^T \\ & = A^T (A_2^+)^T & (1, 3) \\ & = A_2^+ A & (3) \end{align}\]

c

Show that if $A$ has full column rank, then $A^{+} = (A’A)^{-1}A’$.

Notice that since $A$ is full column rank $\text{rank}( A ) = \text{rank}( A^T A )$. So $ A^T A $ is square and has full column rank, so it is invertible. Now we need to show that the 4 conditions hold for the Moore-Penrose generalized inverse.

\[\begin{align} A A^+ A & = A (A'A)^{-1}A' A \\ & = A (A'A)^{-1} (A' A) \\ & = A I \\ & = A \end{align}\] \[\begin{align} A^+ A A^+ & = (A'A)^{-1} A' A (A'A)^{-1}A' \\ & = (A'A)^{-1} (A' A) (A'A)^{-1}A' \\ & = (A'A)^{-1}A' \\ & = A^+ \end{align}\] \[\begin{align} (A^+ A)^T & = ((A'A)^{-1}A' A)' \\ & = A' (A')' (A'A)^{-1}\\ & = A' A (A'A)^{-1} \\ & = (A' T A) (A'A)^{-1} \\ & = I \\ & = (A'A)^{-1} (A' A) \\ & = A^+ A \end{align}\] \[\begin{align} (A A^+)^T & = (A (A'A)^{-1}A' )^T \\ & = A (A'A)^{-1} A^T \\ & = A A^+ \end{align}\]

d

Show that if $A$ has full row rank, then $A^{+} = A’ (AA’)^{-1}$.

Since $A$ has full row rank, $A^T$ has full column rank. Thus, $A A^T$ is invertible. Now we need to verify the Moore-Penrose conditions.

\[\begin{align} A A^+ A & = A A^T (A A ^T )^{-1} A \\ & = (A A^T) (A A ^T )^{-1} A \\ & = A \end{align}\] \[\begin{align} A^+ A A^+ & = A^T (A A ^T )^{-1} A A^T (A A ^T )^{-1} \\ & = A^T (A A ^T )^{-1} (A A^T) (A A ^T )^{-1} \\ & = A^T (A A ^T )^{-1} \\ & = A^+ \end{align}\] \[\begin{align} (A A^+)^T & = (A A^T (A A ^T )^{-1})^T \\ & = ((A A ^T )^{-1})^T A A^T \\ & = I \\ & = (A A^T) (A A ^T )^{-1} \\ & = A A^+ \end{align}\] \[\begin{align} (A^+ A)^T & = (A^T (A A ^T )^{-1} A )^T \\ & = A^T (A A^T)^{-1} A \\ & = A^+ A \end{align}\]

3

Let $S$ be a nonempty subset of an inner product space $V$. The orthogonal complement to the set $S$ is defined as

\[S^{\perp} := \{x \in V : \langle x,y\rangle = 0 \text{ for every } y \in S\}.\]

a

Show that $S^{\perp}$ is a subspace of $V$ for any $S \subseteq V$.

In order to show that $S^{\perp}$ is a subspace we have to show three conditions:

1: That there is a zero element. Take $y \in S$. Then $\langle 0 , y \rangle = 0$ by the definition of the inner product. Thus, $0 \in S^{\perp}$.

2: That the space is closed under vector addition. Take $a, b \in S^\perp$ and $y \in S$. Then $\langle a , y \rangle = \langle b , y \rangle = 0$. Then,

\[\langle a + b, y \rangle = \langle a , y \rangle + \langle b , y \rangle = 0 + 0.\]

Thus, $a + b \in S^\perp$.

3: That the space is closed under scalar multiplication. Take $x \in S^\perp$, $y \in S$ and $a \in F$. Then,

\[\langle a x , y \rangle = a \langle x , y \rangle = a \cdot 0 = 0.\]

Thus. $a x \in S^\perp$.

b

Let $W \subseteq V$ be a finite dimensional subspace, and let $y \in V$. Show that there exist unique vectors $u \in W$ and $z \in W^{\perp}$ such that $y = u + z$.

Take ${ u_1, \dots u_k }$ to be an orthonormal basis for $W$ (which is guaranteed to exist by Gram-Schmidt). Then define $u = \sum_{ i=1 }^{ k } \langle y , u_i \rangle u_i$ and $z = y - u$. Then $y = u + z$. We need to show that $\langle u , z \rangle = 0$.

\[\begin{align} \langle u , z \rangle & = \langle u , y-u \rangle \\ & = \langle u , y \rangle - \langle u , u \rangle \\ & = \langle \sum_{ i=1 }^{ k } \langle y , u_i \rangle u_i , y \rangle - \langle \sum_{ i=1 }^{ k } \langle y , u_i \rangle u_i , \sum_{ i=1 }^{ k } \langle y , u_i \rangle u_i \rangle \\ & = \sum_{ i=1 }^{ k } \langle y , u_i \rangle^2 - \sum_{ i } \sum_{ j } \langle y , u_i \rangle \langle y , u_i \rangle \langle u_i , u_j \rangle \\ & = \sum_{ i=1 }^{ k } \langle y , u_i \rangle^2 - \sum_{ i=1 }^{ k } \langle y , u_i \rangle^2 \\ & = 0 \end{align}\]

Thus $y = \sum a_i v_i \rightarrow a_i = \langle y , v_i \rangle$ if the basis is orthonormal. By construction, $u \in W$. So now we must show that $z \in W^\perp$.

\[\begin{align} \langle v, z \rangle & = \langle \sum a_i u_i , y - \sum \langle y , u_i \rangle u_i \rangle \\ & = \langle \sum a_i u_i , y \rangle - \langle \sum a_i u_i , \sum \langle y , u_i \rangle u_i \rangle \\ & = 0 & \text{same as above} \end{align}\]

Thus, $z \in W^\perp$. Now to show uniqueness assume $u_1 + z_1 = u_2 + z_2 = y$. Then $u_1 - u_2 = z_2 - z_1$ where $u_1 - u_2 \in W$ and $z_2 - z_1 \in W^\perp$. Since they are equal and in the intersection, they must be 0. Thus, $u_1 = u_2 $ and $z_1 = z_2$.

c

Let $X \in \mathbb{R}^{n\times p}$. Verify that column$(X)$ and null$(X’)$ are orthogonal complements.

Take $y \in \text{ column }( X )\rightarrow y = Xv$ and $W \in \mathcal N (X^T) \rightarrow X^T W = 0$.

\[y^T W = v^T X^T W = v^T (0) = 0\]

Thus, they are orthogonal. If $dim(A) = m$ and $\text{rank}( A )= \text{rank}( A^T ) = r$. Then $dim(\mathcal C (A)) = \text{rank}( A ) = r$ and $dim( \mathcal N (A^T) ) = m - \text{rank}( A^T ) - m-r$.

4 (2.8)

Using $X$ and $P_X$ from Example 2.3, and $y^T = (1, 3, 5, 7)$:

a

Construct all solutions to the normal equations.

\[\begin{align} \begin{bmatrix} 1 & 0 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 2 \\ 1 & 1 & 2 \end{bmatrix} \begin{bmatrix} b_1 \\ b_2 \\ b_3 \end{bmatrix} & = \frac{ 1 }{ 2 } \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \end{bmatrix} \begin{bmatrix} 1 \\ 3 \\ 5 \\ 7 \end{bmatrix} \\ \begin{bmatrix} b_1 + b_3 \\ b_1 + b_3 \\ b_1 + b_2 + 2 b_3 \\ b_1 + b_2 + 2 b_3 \end{bmatrix} & = \begin{bmatrix} \frac{ 1 }{ 2 }(1+3) \\ \frac{ 1 }{ 2 }(1+3) \\ \frac{ 1 }{ 2 }(5+7) \\ \frac{ 1 }{ 2 }(5+7) \end{bmatrix} \end{align}\]

Fix $b_1 = x$. Then $b_3 = 2-x$ and

\[\begin{align} x + b_2 + 2 (2-x) & = 6 \\ b_2 & = 2+x. \end{align}\]

b

Construct all solutions to $Xb = P_X y$

\[\begin{align} \begin{bmatrix} b_1 + b_3 \\ b_1 + b_3 \\ b_1 + b_2 + 2 b_3 \\ b_1 + b_2 + 2 b_3 \end{bmatrix} & = \begin{bmatrix} \frac{ 1 }{ 2 }(1+3) \\ \frac{ 1 }{ 2 }(1+3) \\ \frac{ 1 }{ 2 }(5+7) \\ \frac{ 1 }{ 2 }(5+7) \end{bmatrix} \end{align}\]

If we set $b_1 = x$ then notice that we will get the same solutions as above.

\[\begin{align} b_1 & = x \\ b_2 & = 2+x \\ b_3 & = 2-x \end{align}\]

5 (2.9)

Using $P_X$ from Example 2.3, show that $1 \in \text{ column }( X )$. Also compute $P_X - P_1$.

$1$ is in the column space of $X$ since the first column of $X$ is the $1$ vector itself.

\[P_X - P_1 = \frac{ 1 }{ 2 } \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \end{bmatrix} - \frac{ 1 }{ 4 } \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{bmatrix} = \frac{ 1 }{ 4 } \begin{bmatrix} 1 & 1 & -1 & -1 \\ 1 & 1 & -1 & -1 \\ -1 & -1 & 1 & 1 \\ -1 & -1 & 1 & 1 \end{bmatrix}\]

6 (2.11)

A design matrix for a polynomial regression problem with only three design points takes the form

\[X = \begin{bmatrix} 1 & -1 & 1 & -1 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{bmatrix}\]

a

What is the rank of $X$?

There are 3 linearly independent rows and columns of $X$, so $\text{rank}( X ) = 3$.

b

What is the dimension of $\mathcal{C}(X)$ and a nonzero vector in it.

The dimension of the column space of $X = \text{rank}( X ) = 3$. One vector in it is

\[v = c_1 \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{bmatrix}\]

c

What is the dimension of $\mathcal{N}(X)$ and a nonzero vector in it.

\[\begin{align} dim(X) & = \text{rank}( X ) + dim(\mathcal N (X)) \\ 4 & = 3 + dim(\mathcal N (X)) \\ dim(\mathcal N (X)) & = 1 \end{align}\]

One nonzero vector in is it:

\[v = \begin{bmatrix} 0 \\ 1 \\ 0 \\ -1 \end{bmatrix} \rightarrow X v = \begin{bmatrix} 0 + -1 \cdot 1 + 0 + -1 \cdot 1 \\ 0 \cdot 1+ 0 \\ 0 \cdot 1 + 0 \\ 0 \cdot 1 + 1 \cdot 1 + 0 \cdot 1 + 1 \cdot -1 \\ 0 \cdot 1 + 1 \cdot 1 + 0 \cdot 1 + 1 \cdot -1 \end{bmatrix} = 0\]

d

Give the dimension of $\mathcal{ C }(X^T)$ and a nonzero vector in it.

$\mathcal C (X^T) = \mathcal R(X) = 3$.

\[v = r_1 = \begin{bmatrix} 1 \\ -1 \\ 1 \\ -1 \end{bmatrix}\]

e

Compute $X^T X$ and find a generalized inverse for it. (No need to check $AGA=A$).

\[X^T X = \begin{bmatrix} 1 & 1 & 1 & 1 & 1\\ -1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 1 & 1 \end{bmatrix} \begin{bmatrix} 1 & -1 & 1 & 1 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{bmatrix} = \begin{bmatrix} 5 & 1 & 3 & 1 \\ 1 & 3 & 1 & 3 \\ 1 & -1 & 1 & -1 \\ 1 & 3 & 1 & 3 \end{bmatrix}\]

We can use R to caculate the SVD and the Moore-Penrose generalized inverse.

 svd(X)
 $d
[1] 2.940034e+00 2.130947e+00 9.029217e-01 4.950053e-17

$u
          [,1]       [,2]        [,3]          [,4]
[1,] 0.6342573 -0.5271852 -0.56550283  8.681328e-18
[2,] 0.4242986  0.5634958 -0.04942893 -7.071068e-01
[3,] 0.4875030 -0.2949928  0.82177862  1.110223e-16
[4,] 0.4242986  0.5634958 -0.04942893  7.071068e-01

$v
           [,1]       [,2]       [,3]          [,4]
[1,] 0.09291154 -0.9146966  0.3933161  1.896487e-17
[2,] 0.21573128 -0.2473949 -0.6263033  7.071068e-01
[3,] 0.21573128 -0.2473949 -0.6263033 -7.071068e-01
[4,] 0.67018187  0.1430414  0.1743428 -1.051618e-16
[5,] 0.67018187  0.1430414  0.1743428 -1.051618e-16

library(MASS)
X_ginv = ginv(X)

This gives

\[A^+ = \begin{bmatrix} 0 & -0.25 & 0.5 & -0.25 \\ 0.5 & 0 & -0.5 & 0 \\ 0.5 & 0 & -0.5 & 0 \\ 0 & 0.125 & 0.25 & 0.125 \\ 0 & 0.125 & 0.25 & 0.125 \end{bmatrix}.\]

Now we can check that this meets all of the requirements to be Moore-Penrose inverse.

\[X X^+ X\]
round(X %*% X_ginv %*% X, 3)
1	1   1	1   1
-1	0   0   1   1
1	0	0   1   1
-1	0	0   1	1

This is equal to $X$, so the first condition is met.

\[X^+ X X^+\]
round(X_ginv %*% X %*% X_ginv, 3)
0.0 -0.250  0.50 -0.250
0.5  0.000 -0.50  0.000
0.5  0.000 -0.50  0.000
0.0  0.125  0.25  0.125
0.0  0.125  0.25  0.125

This is equal to $X^+$, so the second condition is met.

Now we need to check that $X^+X$ and $X X^+$ are symmetric.

> round(X_ginv %*% X)
1	0   0	0   0
0   1   1   0   0
0   1   1   0   0
0   0   0   1   1
0   0	0   1	1

> round(t(X_ginv %*% X))
1    0    0    0    0
0    1    1    0    0
0    1    1    0    0
0    0    0    1    1
0    0    0    1    1

> round(X %*% X_ginv)
     [,1] [,2] [,3] [,4]
[1,]    1    0    0    0
[2,]    0    1    0    1
[3,]    0    0    1    0
[4,]    0    1    0    1
> round(t(X %*% X_ginv))
     [,1] [,2] [,3] [,4]
[1,]    1    0    0    0
[2,]    0    1    0    1
[3,]    0    0    1    0
[4,]    0    1    0    1

These are indeed symmetric, so $X^+$ is the Moore-Penrose inverse of $X$.

f

Which of the following vectors could be $P_X y$ for an appropriate vector $y$?

\[\begin{bmatrix} 3 \\ 1 \\ 1\\ 2\\ 2 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \\ 1\\ -1\\ 1 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 0\\ 2\\ 2 \end{bmatrix}\] \[\begin{bmatrix} 3 \\ 1 \\ 1\\ 2\\ 2 \end{bmatrix}\]

is an appropriate vector because it is in the column space of $X$. It is $c_1 + \frac{ 3 }{ 2 }c_3 - \frac{ 1 }{ 2 } c_2$.

\[\begin{bmatrix} 1 \\ 0 \\ 0\\ 2\\ 2 \end{bmatrix}\]

is also an appropriate vector because it is in the column space of $X$. It is $\frac{ 1 }{ 2 }c_2 + \frac{ 3 }{ 2 } c_3$.

g

Using the right-hand side you gave above in (f), find all solutions to the equations $Xb = P_X y$.

\[\begin{align} Xb & = P_X y \\ \begin{bmatrix} b_1 - b_2 + b_3 - b_4 \\ b_1 \\ b_1 \\ b_1 + b_2 + b_3 + b_4 \\ b_1 + b_2 + b_3 + b_4 \end{bmatrix} & = \begin{bmatrix} 3 \\ 1 \\ 1\\ 2\\ 2 \end{bmatrix} \end{align}\]

Then $b_1 = 1$ and take $b_2 = x$. Then,

\[\begin{align} b_1 - b_2 + b_3 - b_4 & = 3 \\ 1 - x + b_3 - b_4 & = 3 \\ b_3 & = 2 + x + b_4 \\ \\ b_1 + b_2 + b_3 + b_4 & = 2 \\ 1 + x + 2 + x + b_4 + b_4 & = 2 \\ b_4 & = -1/2 -x \\ b_3 & = 3/2. \end{align}\]

Also,

\[\begin{align} Xb & = P_X y \\ \begin{bmatrix} b_1 - b_2 + b_3 - b_4 \\ b_1 \\ b_1 \\ b_1 + b_2 + b_3 + b_4 \\ b_1 + b_2 + b_3 + b_4 \end{bmatrix} & = \begin{bmatrix} 1 \\ 0 \\ 0\\ 2\\ 2 \end{bmatrix}. \end{align}\]

Then take $b_1 = 0$, and $b_2 = x$. Then,

\[\begin{align} b_1 - b_2 + b_3 - b_4 & = 3 \\ 0 - x + b_3 - b_4 & = 1 \\ b_3 & = 1 + b_4 + x \\ \\ b_1 + b_2 + b_3 + b_4 & = 2 \\ 0 + x + 1 + b_4 + x + b_4 & = 2 \\ b_4 & = \frac{ 1 }{ 2 }- x \\ b_3 & = \frac{ 3 }{ 2 }. \end{align}\]

h

Which of the following had the same column space as $X$?

\[U = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{bmatrix} W = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 0 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ \end{bmatrix}\]

$U$ has the same column space as $X$. It’s columns are linear combinations of columns in $X$.

\[\frac{ 1 }{ 2 } (c_3 - c_3) = \frac{ 1 }{ 2 } \begin{bmatrix} 1 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} - \frac{ 1 }{ 2 } \begin{bmatrix} -1 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} = u_1\] \[c_1 - c_3 = \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{bmatrix} - \begin{bmatrix} 1 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ 1 \\ 1 \\ 0 \\ 0 \end{bmatrix} = u_2\] \[\frac{ 1 }{ 2 }(c_2 + c_3) = \frac{ 1 }{ 2 } \begin{bmatrix} -1 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} + \frac{ 1 }{ 2 } \begin{bmatrix} 1 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 1 \end{bmatrix} = u_3\]