Checking Triangle Inequality for List Similarity Metric

In summary, the conversation discusses a method for creating a distance metric between two lists based on a similarity score. The similarity score is calculated as the percentage overlap between the two lists, and the distance is 0 when the two lists are the same. The speaker also mentions generalizing this method to n lists and calculating distances between them. They question whether the distance formula is valid and if it satisfies the triangle inequality, and discuss potential ways to check this. The conversation also mentions potential issues with the scoring metric and how to correct for them.
  • #1
hoffmann
70
0
Say I have two lists, List1 and List2 containing elements such as words. Some words are common two both List1 and List2. I want to create a distance metric that tells me how far apart the two lists are based on a similarity "score". The similarity score and distance metric are as follows:

Similarity score: Intersection(List1,List2) / Union(List1,List2)
Distance = 1 - Similarity Score

In other words, the similarity score is just the percentage overlap between the two lists and the distance is 0 when the two lists are the same. Say I generalize this to n lists and I calculate the distances between lists (a symmetric matrix of distances). My question is, is this distance formula valid? In other words, does it satisfy the triangle inequality? How do I check this?
 
Mathematics news on Phys.org
  • #2
It's not very hard to prove that it satisfies the triangle inequality if all list sizes are equal. Let N = size of the list, x, y & z = pairwise overlaps between three lists. You must have that x >= y+z-N. Distance between lists that give you x is 1-x/(2N-x). With some algebra you can conclude that the triangle inequality is satisfied for all valid x, y & z.

It seems to be true for arbitrary list sizes too, but I don't see an easy way to prove that right now ...
 
  • #3
yes, that is also not coming to me. if it helps, i know that this scoring metric is skewed when the sizes of the lists are large and their overlaps are small (only in comparison to the sizes of the lists, but still pretty large in comparison to smaller list/overlap sizes). it seems that i need to correct the metric for the size of the overlap to get a more accurate value. also the same sort of situation when the sizes of the lists are small and their overlaps are ~0. finally, when one of the lists is big and the other small (or vice versa), i could also run into a skewed distance value.
 
  • #4
http://pnylab.com/pny/papers/nmet/nmet/index.html
 
  • #5
beautiful! thank you!
 

Related to Checking Triangle Inequality for List Similarity Metric

1. What is the triangle inequality for list similarity metric?

The triangle inequality for list similarity metric is a mathematical concept that states that the distance between two points in a metric space must always be less than or equal to the sum of the distances between those points and a third point. In the context of list similarity, this means that the similarity between two lists should not be greater than the sum of their similarities with a third list.

2. Why is it important to check for triangle inequality in list similarity?

Checking for triangle inequality is important in list similarity because it ensures that the similarity metric being used is consistent and reliable. Violations of triangle inequality can lead to inconsistent results and inaccurate comparisons between lists.

3. How is triangle inequality checked for list similarity?

To check for triangle inequality in list similarity, the distance or similarity between each list and a third list is calculated. Then, the sum of these values is compared to the distance or similarity between the two original lists. If the sum is greater than the original distance or similarity, then triangle inequality has been violated.

4. What happens if triangle inequality is violated in list similarity?

If triangle inequality is violated in list similarity, it means that the similarity metric being used is not consistent and may lead to inaccurate comparisons between lists. In this case, the metric should be re-evaluated and adjusted to ensure that it follows the triangle inequality rule.

5. Are there any exceptions to the triangle inequality rule in list similarity?

There are some cases where the triangle inequality rule may not be applicable in list similarity. For example, if the lists being compared are of different lengths, the triangle inequality may not hold. In these cases, it is important to consider the context and the specific characteristics of the lists being compared.

Similar threads

  • Programming and Computer Science
Replies
2
Views
663
  • Programming and Computer Science
Replies
4
Views
1K
  • General Math
Replies
5
Views
2K
  • Linear and Abstract Algebra
Replies
8
Views
1K
  • Calculus and Beyond Homework Help
Replies
19
Views
1K
  • Quantum Physics
Replies
5
Views
1K
  • Calculus and Beyond Homework Help
Replies
5
Views
3K
Replies
17
Views
9K
Replies
40
Views
2K
  • Sticky
  • General Math
Replies
6
Views
11K
Back
Top