Vector subspace and basis vectors in the context of data science

DumpmeAdrenaline · Sep 29, 2023

The book I am studying from presents vector subspace as an infinite collection of vectors in a vector space with the properties of additive and multiplicative closure and basis vectors as a way to characterize/write a subspace compactly. All the vectors in the subspace can be written as linear combinations of this subset of vectors. If this subset of vectors is a minimal set meaning it forms a subset equal to the vector space itself through appropriate linear combinations of the basis vectors) then this subset is called a basis vector. Lets add context to the above

Let X be an m*n data matrix, where m represents the number of samples and n represents the values of the variables of interest for the system we are studying (e.g a reactor). Suppose interested in determining the number of independent variables and independent samples.

Is the row space a subset of a subspace comprising of m row vectors? [x1,..xm] where x1=(a11,a12,..a1n) x2=(a21,a22,..a2n) and xm=(am1,am2,..amn).
If we perform a LU decomposition of matrix X
LPX=U
If there are r rows r<= m in U then the first rows form the basis vectors set for the row space. Specifically, all vectors within the subspace can be expressed as linear combinations of these r rows. Does this imply if we have the variables for a new sample we can represent it as a linear combination of the aforementioned basis row vectors (old samples)?

fresh_42 · Sep 29, 2023

DumpmeAdrenaline said:

The book I am studying from presents vector subspace as an infinite collection of vectors in a vector space with the properties of additive and multiplicative closure and basis vectors as a way to characterize/write a subspace compactly. All the vectors in the subspace can be written as linear combinations of this subset of vectors. If this subset of vectors is a minimal set meaning it forms a subset equal to the vector space itself through appropriate linear combinations of the basis vectors) then this subset is called a basis vector. Lets add context to the above

This is a bit of a complicated point of view in my opinion. Vector spaces are all about linearity and linear mappings between them. So addition and multiplication by scalars (numbers) are the essential properties. To look at them as subsets that do not have any structure is unnatural. Demanding the structure afterward to make it a subspace is simply the wrong direction. But it is as it is.

DumpmeAdrenaline said:

Is the row space a subset of a subspace comprising of m row vectors? [x1,..xm] where x1=(a11,a12,..a1n) x2=(a21,a22,..a2n) and xm=(am1,am2,..amn).

It is a subspace because we define it as such: the linear span of row vectors
$$
\mathbb{R}\cdot \vec{x}_1 + \ldots + \mathbb{R}\cdot \vec{x}_m
$$

DumpmeAdrenaline said:

If we perform a LU decomposition of matrix X
LPX=U
If there are r rows r<= m in U then the first rows form the basis vectors set for the row space. Specifically, all vectors within the subspace can be expressed as linear combinations of these r rows. Does this imply if we have the variables for a new sample we can represent it as a linear combination of the aforementioned basis row vectors (old samples)?

This is again in the wrong direction. The answer is 'yes' but the reason is the other way around. The moment we write the data as a matrix and consider this matrix as a linear transformation between vector spaces is the moment where linearity happens. We already require that the data form a linear system. Row space considerations come afterward. We already required that new data are linearly dependent on the given data by writing the matrix. As long as our matrix is only a scheme of numbers, we don't need any linear properties. In this sense, it is only a set. But if we use words like row space and vector spaces, we automatically assume linearity. The terms only make sense in a linear situation. So saying "it implies linearity" is the wrong direction. We "demand linearity" and thus it - in a way - implies it again. But the implication is a given condition, not a conclusion.

DumpmeAdrenaline · Oct 2, 2023

fresh_42 said:

This is a bit of a complicated point of view in my opinion. Vector spaces are all about linearity and linear mappings between them. So addition and multiplication by scalars (numbers) are the essential properties. To look at them as subsets that do not have any structure is unnatural. Demanding the structure afterward to make it a subspace is simply the wrong direction. But it is as it is.It is a subspace because we define it as such: the linear span of row vectors
$$
\mathbb{R}\cdot \vec{x}_1 + \ldots + \mathbb{R}\cdot \vec{x}_m
$$

This is again in the wrong direction. The answer is 'yes' but the reason is the other way around. The moment we write the data as a matrix and consider this matrix as a linear transformation between vector spaces is the moment where linearity happens. We already require that the data form a linear system. Row space considerations come afterward. We already required that new data are linearly dependent on the given data by writing the matrix. As long as our matrix is only a scheme of numbers, we don't need any linear properties. In this sense, it is only a set. But if we use words like row space and vector spaces, we automatically assume linearity. The terms only make sense in a linear situation. So saying "it implies linearity" is the wrong direction. We "demand linearity" and thus it - in a way - implies it again. But the implication is a given condition, not a conclusion.

Suppose we represent data as a matrix. The matrix is composed of m row vectors and n column vectors. We can think of a matrix as either a stack of column vectors, where each column vector requires m elements to represent it, or as a stack of row vectors, where each row vector requires n elements to represent it. If we think of a matrix as a stack of (m*1) column vectors, then the columns of the matrix belong to one or more subspace/s of (m*1) column vectors.The columns of the matrix are obtained through linear operations (linear transformations) on the basis vectors of every subspace, linear transformations have already been performed to yield the scheme of numbers. By performing an LU decomposition on the matrix we are reversing the process we are trying to obtain the basis vectors that formed the columns of the matrix.To align with the representation of a matrix as a column vector, when we add a new feature or variable for which values have been measured for all m samples, we increase the size of the matrix to m*(n+1) by stacking that column vector.By doing so, we demanded linearity from the basis vectors of the same subspaces, or from different/new subspaces if the new column is independent to form the scheme of numbers.

fresh_42 · Oct 2, 2023

Yes. And, so?

Data written in matrix form are primarily just a number scheme.

If we interpret this number scheme as a matrix of a linear function, then we have to make sure that our data allows such an interpretation. Otherwise, we just have an example of a linear transformation that has nothing to do with our experiment anymore.

DumpmeAdrenaline · Oct 2, 2023

fresh_42 said:

Yes. And, so?

Data written in matrix form are primarily just a number scheme.

If we interpret this number scheme as a matrix of a linear function, then we have to make sure that our data allows such an interpretation. Otherwise, we just have an example of a linear transformation that has nothing to do with our experiment anymore.

My previous reply was to confirm I understood your point. So are you suggesting we need to have a guess or a conjecture about which variables are linearly related such that If we add a new variable that describes our system and we perform an LU decomposition and it turn outs that its dependent on other existing variable/s ( column/s) that we believe otherwise that it is independent. In this case do we discard it and work back with with m*n matrix. If we dont put our understanding of the system we are studying we are playing with numbers?

fresh_42 · Oct 2, 2023

DumpmeAdrenaline said:

So are you suggesting we need to have a guess or a conjecture about which variables are linearly related such that If we add a new variable that describes our system and we perform an LU decomposition and it turn outs that its dependent on other existing variable/s ( column/s) that we believe otherwise that it is independent.

No. That isn't necessary. My point is the interpretation of the manipulations. If we treat the data sorted in a matrix capable of performing an LU decomposition, and they are automatically by construction, what will the result tell us? In other words, is your row space a linear space at all? Are n+1 measurements, which leads to n+1 rows with n variables if I understood you correctly, necessarily linearly dependent? Is it possible, and that is a property of your experiment, not a property of linear algebra, that linear dependence makes sense for different measurements?

You can perform all linear manipulations once you have a matrix. However, the relation between matrix and measurement has to be stated beforehand in order to interpret the result of algebraic transformations.

WWGD · Mar 17, 2024

Notice, though, for the OP, that a matrix itself is just an array of Mathematical objects. I believe this is what Fresh was saying. It could be /represent the adjacency graph for a matrix or a correlation matrix, a linear transformation, etc., as such, it has no inteinsic connection with linear maps.

Vector subspace and basis vectors in the context of data science

Attachments

Similar threads

Hot Threads

Recent Insights