it's a natural generalization of the spectral theorem you asked about in your last post. for the truly mathematically-inclined, this is motivation enough.
but, it can also be viewed the following ways:
it allows us to compute "the best possible orthogonal bases" of the domain and co-domain of a linear transformation of finite-dimensional linear spaces, in this sense that the matrix for T in these bases is as "simple as possible" (diagonal).
geometrically, this allows us to view any linear transformation as:
one way to see this is to "follow what happens to a unit n-sphere" (under the norm induced by the inner product we are using), for each of the three linear transformations in the decomposition.
it allows us to calculate the pseudo-inverse of a matrix, which is used in solving "least squares" (best fit) solutions such as finding the best fit polynomial of a given degree that matches the data (the polynomial isn't linear in its "indeterminate" variable, but IS a linear function of its coefficients).
in signal processing, the size of the singular values of a matrix are related to "which signals carry information" and "which signals are noise". calculating the SVD allows for "better (noise) filter design".
variations of the SVD are used in such diverse applications as: optical character recognition, radar target recognition profiles, 3d reconstruction from 2d images, fingerprint analysis, and weather prediction.
in general, calculation with a given mxn matrix is hard, evaluating the image of a given domain vector requires mn2+m numerical operations. if m is near n, this is O(n3) operations. using the SVD reduces this to O(n) operations (with, of course, an "up-front cost" of calculating the unitary matrices used in the decomposition). if one is going to use a particular linear transformation several times, this is well worth the effort. as the great mathematican indiana jones said: "choose (your bases) wisely".