Residual sum of squares

Jameson

Staff member
Problem: Through transformation with orthogonal matrix $O$, the problem $$\displaystyle \hat{b}=\underset{b}{\operatorname{arg min}}||y-Xb||^2$$ is equivalent to $$\displaystyle \hat{b}=\underset{b}{\operatorname{arg min}}||y^{*}-X^{*}b||^2$$, where $y$ and $y^{*}$ are in $\mathbb{R}^m$, $X$ and $X^{*}$ are in $\mathbb{R}^{m \times n}$ ($m \ge n$) and $y^{*}=Oy$ and $X^{*}=OX$. Let $y^{*}=[y_1^{*},y_2^{*}...,y_m^{*}]^T$.

Prove that the residual sum of square $$\displaystyle ||y-X \hat{b}||^2=\sum_{i=n+1}^{m}||y_i^{*}||^2$$.

Solution: I must admit I am a bit overwhelmed by this problem. I believe that $\underset{b}{\operatorname{arg min}}$ just means "for the lowest value of $b$", correct? I think I should start by reading up on the concepts which are needed to solve this. Can someone highlight the main ideas I'll need to know to attack this?

Klaas van Aarsen

MHB Seeker
Staff member
Problem: Through transformation with orthogonal matrix $O$, the problem $$\displaystyle \hat{b}=\underset{b}{\operatorname{arg min}}||y-Xb||^2$$ is equivalent to $$\displaystyle \hat{b}=\underset{b}{\operatorname{arg min}}||y^{*}-X^{*}b||^2$$, where $y$ and $y^{*}$ are in $\mathbb{R}^m$, $X$ and $X^{*}$ are in $\mathbb{R}^{m \times n}$ ($m \ge n$) and $y^{*}=Oy$ and $X^{*}=OX$. Let $y^{*}=[y_1^{*},y_2^{*}...,y_m^{*}]^T$.

Prove that the residual sum of square $$\displaystyle ||y-X \hat{b}||^2=\sum_{i=n+1}^{m}||y_i^{*}||^2$$.

Solution: I must admit I am a bit overwhelmed by this problem. I believe that $\underset{b}{\operatorname{arg min}}$ just means "for the lowest value of $b$", correct? I think I should start by reading up on the concepts which are needed to solve this. Can someone highlight the main ideas I'll need to know to attack this?
Hi Jameson!

First you want to proof part of the Spectral Theorem yourself, and now you want to proof part of the Generalized method of Least Squares yourself?
What kind of course are you following?

Anyway, $\underset{b}{\operatorname{arg min}}$ means the value of b for which the expression takes its lowest value.

Btw, there seems to be something missing from your problem statement.
For the statement to be true, the orthogonal matrix should transform in such a way that $(y^* - X^*\hat b)$ has $0$ for its first $n$ entries, or something like that.

Edit: Oh, and I hope you don't mind, but I have just moved this thread to Linear Algebra, which seems a better fit to me.

Jameson

Staff member
The course is called "Computational methods" and it already seems to be way out of my league to be honest, but I'm hoping we'll switch over to coding more than theory. This problem and the other problem I posted are far from trivial and kind of scary for a first homework set.

Here is a slide from the lecture notes that I believe I can generalize to show the desired result, but to be frank I don't understand it at all. Looks like I have a lot of reading to do!

Klaas van Aarsen

MHB Seeker
Staff member
Well, as I see it, in any course I would suggest to first make sure you know what all the symbols and definitions mean. Then to try and apply the theorems that are given.
It sounds a bit as if for your purpose that is enough.
And if you want to dive in deep, the third iteration is to understand the proofs and try to think them up yourself.

The proofs that you ask for suggest that a bit of linear algebra is a prerequisite. It seems a bit out of scope for a course of computational methods to ask you to think up these proofs yourself from scratch.
What is probably more useful for you is to see how you can apply the formulas that are given.
Do you know for instance what they mean by finding a solution by back substitution?

Jameson

Staff member
I agree with you completely about the levels and methods of understanding. I took linear algebra at this university in spring and we didn't cover a lot of this material, definitely not the Spectral Theorem so I feel professors aren't communicating very well at all.

The back substitution method is an iterative process for solving upper triangular matrices.

For the proof to make sense, I believe $y^{*}=[y_1^{*},y_2^{*}...,y_m^{*}]^T$ should be upper-triangular. If so that would be a good place to start but I'm not sure it must be. I need to first grasp the purpose of $y^{*}=Oy$ and $X^{*}=OX$.

EDIT: Hmm, $y^*$ is in $\mathbb{R}^m$, so it's a column vector. Don't think it can be upper-triangular then.

Klaas van Aarsen

MHB Seeker
Staff member
True, $y^*$ can't be upper-triangular. It is $X^*=OX$ that will be upper-triangular.

Let me try to illustrate with a choice for the variables that is as simple as possible.
Suppose we pick:
$$n=1, \quad m=2, \quad y=\begin{pmatrix}1\\3\end{pmatrix}, \quad X = \begin{pmatrix} 2 \\1\end{pmatrix}$$
The residue is the squared length of the vector between $y$ and the line $Xb$.
The residue takes on its lowest value for $\hat b = (1)$. (Yes, b is 1-dimensional.)

We can draw them like this:

The orthogonal matrix $O$ would be a 2x2 matrix that rotates everything such that 1 of the coordinates becomes zero. At that time the residue is aligned with the y-axis, meaning that the y-component of $y^*$ represents the residue.
The matrix $OX$ will be an upper-triangular matrix now (a rather simple one).

Jameson

Staff member
I think I'm starting to get this a little better. A major problem in my intuition stems from that in linear algebra I'm used to writing $Ax=b$, where $A$ is an $m \times n$ matrix, $b$ is a column vector of weights and $b$ is the result. In this problem and discussion we have the form $y=Xb$, where $X$ is a matrix and $b$ is a column vector.

From your diagram I see that multiplying by the orthogonal matrix $O$ will simply rotate everything, but not change any lengths. So we basically multiply everything by $O$ and then we have the equality: $||y-Xb||=||y^*-X^*b||$. Correct?

Klaas van Aarsen

MHB Seeker
Staff member
Yes. That is correct.
An important property of an orthogonal transformation in general, is that it keeps lengths of vectors unchanged.

Deveno

Well-known member
MHB Math Scholar
One definition of an orthogonal matrix $$\displaystyle O$$ (or even better, an orthogonal linear transformation) is that:

$$\displaystyle OO^T = O^TO = I$$.

It then follows that if we use the definition (relative to some (real) inner product):

$$\displaystyle \|x\| = \sqrt{\langle x,x \rangle}$$

that:

$$\displaystyle \|Ox\| = \sqrt{\langle Ox,Ox \rangle}$$

$$\displaystyle = \sqrt{\langle x,O^T(Ox) \rangle}$$

(using essentially the same kind of proof as in another one of your threads)

$$\displaystyle = \sqrt{\langle x,x \rangle} = \|x\|$$.

Geometrically, orthogonal transformations preserve lengths and volumes, but may reverse orientations (such as a reflection about a line). In fact, it is often convenient to pick a particular reflection, such as:

$$\displaystyle R = \begin{bmatrix}1&0&\dots&0\\0&1&\dots&0\\ \vdots&\vdots&\ddots&\vdots\\0&0&\dots&-1 \end{bmatrix}$$

and express an orthogonal matrix as either:

$$\displaystyle S$$ or $$\displaystyle RS$$, where $$\displaystyle \det(S) = 1$$.

Matrices of the form $$\displaystyle S$$ are called "proper rotations" and matrices of the form $$\displaystyle RS$$ are called "improper rotations" or (rotations followed by a (distinguished) reflection). It follows that 2 improper rotations composed yield a (proper) rotation.

(my apologies in advance for this small diversion).