Curse of dimensionality - what is it that grows?

  • Thread starter schniefen
  • Start date
In summary: Banach-Tarski uses the Axiom of Choice as well as the concept of what a point is.Yes, that is correct.
  • #1
schniefen
178
4
TL;DR Summary
The volume outside a hypersphere inscribed in a unit hypercube converges to 1 as the dimension increases. It seems that the space somehow “gets bigger” as the dimension increases. However, ##\mathbb{R}## has the same cardinality as ##\mathbb{R^2}##.
The volume outside a hypersphere inscribed in a unit hypercube converges to 1 as the dimension increases. It seems that the space somehow “gets bigger” as the dimension increases. However, given data points in the interval ##[0,1]##, the density of points in the interval compared to the density of points for the same points in the unit square doesn’t increase, since ##\mathbb{R}## has the same cardinality as ##\mathbb{R^2}##. Then, what is it that “gets bigger”, gets more? Is sparseness a more relevant term than density?
 
Mathematics news on Phys.org
  • #2
Cardinality is not related volume measurements at all basically, and you should just not think about it. [0,1] and [0,10000000000] have the same cardinality, but the former has almost zero volume compared to the latter.
 
  • Like
Likes mfb and schniefen
  • #3
Suppose one would like to estimate an integral of a 1D and 2D function in the unit interval and square respectively, using say Monte Carlo. Why are more points needed in the latter vs the former? It seems again the space has grown and the density of points decreased.
 
  • #4
The curse of dimensionality is often used in the context of machine learning where more attributes to be considered means more dimensions to evaluate and thus longer evaluation times. In ML it’s good to simplify.
 
  • #5
schniefen said:
Summary:: The volume outside a hypersphere inscribed in a unit hypercube converges to 1 as the dimension increases. It seems that the space somehow “gets bigger” as the dimension increases. However, ##\mathbb{R}## has the same cardinality as ##\mathbb{R^2}##.

The volume outside a hypersphere inscribed in a unit hypercube converges to 1 as the dimension increases.
Can you explain this? The hypercube has volume ##1## and the hyperspheres have volume ##\dfrac{2\pi^{n/2}r^{n-1}}{\Gamma(n/2)}.##

We get strange behaviors if we increase the dimension, cp.
https://www.physicsforums.com/threads/math-challenge-december-2019.981217/page-2#post-6274751 f.
 
  • #6
schniefen said:
Suppose one would like to estimate an integral of a 1D and 2D function in the unit interval and square respectively, using say Monte Carlo. Why are more points needed in the latter vs the former? It seems again the space has grown and the density of points decreased.
I remember a colloquium about Banach-Tarski. The lecturer liked to emphasize, that it is less AC which contributes to the paradox, but rather our poor understanding of points.
 
  • #7
fresh_42 said:
I remember a colloquium about Banach-Tarski. The lecturer liked to emphasize, that it is less AC which contributes to the paradox, but rather our poor understanding of points.
What is AC? What’s up with the points?

With the volume outside a hypersphere inscribed in the unit hypercube converging to 1, I meant that the volume of the hypersphere goes to zero as ##n## increases, right?
 
  • #8
fresh_42 said:
Can you explain this? The hypercube has volume ##1## and the hyperspheres have volume ##\dfrac{2\pi^{n/2}r^{n-1}}{\Gamma(n/2)}.##

We get strange behaviors if we increase the dimension, cp.
https://www.physicsforums.com/threads/math-challenge-december-2019.981217/page-2#post-6274751 f.
The unit sphere has radius 1, so to be fair the unit cube needs to have "radius" 1 also (and hence volume ##2^n##). A correct claim is the volume of a sphere inscribed in a hypercube divided by the volume of that hypercube goes to zero as n goes to infinity.
 
  • #9
schniefen said:
What is AC? What’s up with the points?
Banach-Tarski uses the Axiom of Choice as well as the concept of what a point is.
 
  • Like
Likes schniefen
  • #10
Is it correct to say that the density of points decreases as the dimension increases, for a fixed number of points? It doesn’t sound right, but yet I’ve come across phrases like “lower data density...” in higher dimensional space.
 
  • #11
I don't think you get that. If you set r=1, you have ##\pi^{n/2}\Gamma(n/2)##.

Taking log, and letting k=n/2, we get ##k \ln(\pi)- \ln(k!)##. Stirling's approximation says ##ln(k!) \approx k \ln(k)-k##. The ##k \ln(k)## term dominates the whole thing, so log of the volume goes to negative infinity as k goes to infinity, and hence the volume has to go to zero.Although now I realize there is a slight misstatements. Your formula is for the surface area of the hypersphere, the thing people normally talk about is the volume of a ball with radius 1 (though I don't think this changes the result, the ball is the thing that is geometrically a more tractable and important result).

The layman's description of what is happening is that as the dimension goes to infinity, all the volume of the ball is actually concentrated in points where all the coordinates look like ##1/\sqrt{n}##, so the volume of the ball becomes small relative to the cube it's inscribed in.
 
Last edited by a moderator:
  • #12
Yes, I saw my mistakes. It was the volume of the sphere, not the ball, and I automatically associated the exponential function as I saw ##\frac{c^n}{n!}## without recognizing that there is no summation. :frown:
 
  • #13
The volume ratio between unit sphere and surrounding cube is the probability that the sum of squares of uniform random numbers from 0 to 1 is below 1. For two dimensions, it's the chance that ##x^2+y^2<1## where x and y are both uniform random numbers in [0,1]. Obviously, as we add more and more terms that probability gets smaller and smaller. Every non-zero value of a third summand restricts the range x and y can have.
In other words, the average distance between your points increases, even though the distance in each dimension keeps following the same distribution. If you just have a fixed number of points then your coverage of the volume gets poorer and poorer the more dimensions you have.
 

Related to Curse of dimensionality - what is it that grows?

1. What is the curse of dimensionality?

The curse of dimensionality refers to the phenomenon where the performance of machine learning algorithms decreases as the number of features or dimensions increases. This is because as the number of dimensions grows, the amount of data needed to accurately represent the space also grows exponentially, making it difficult for algorithms to find patterns and make accurate predictions.

2. How does the curse of dimensionality affect machine learning?

The curse of dimensionality can lead to overfitting, where a model performs well on the training data but fails to generalize to new data. It can also result in sparsity, where data points become more spread out and it becomes difficult to find meaningful relationships between them. This can ultimately lead to poor performance and inaccurate predictions.

3. What causes the curse of dimensionality?

The curse of dimensionality is caused by the increase in data volume and sparsity as the number of dimensions increases. As more dimensions are added, the data becomes more spread out and it becomes difficult for algorithms to find meaningful patterns and relationships.

4. How can the curse of dimensionality be addressed?

There are several techniques that can be used to address the curse of dimensionality, such as feature selection and dimensionality reduction. These techniques involve selecting the most relevant features and reducing the number of dimensions in the dataset to improve the performance of machine learning algorithms.

5. Is the curse of dimensionality a common problem in machine learning?

Yes, the curse of dimensionality is a common problem in machine learning, especially when dealing with high-dimensional datasets. It can significantly impact the performance of algorithms and must be carefully addressed in order to obtain accurate and reliable results.

Similar threads

  • General Math
Replies
1
Views
440
Replies
3
Views
357
Replies
4
Views
1K
Replies
14
Views
1K
  • New Member Introductions
Replies
1
Views
93
  • Engineering and Comp Sci Homework Help
Replies
1
Views
854
Replies
66
Views
4K
  • General Math
Replies
33
Views
2K
Replies
1
Views
2K
  • Advanced Physics Homework Help
Replies
2
Views
898
Back
Top