Algorithm to create a composite score

In summary, the conversation discusses how to calculate a composite score for a set of individual scores that evaluate the usability of hypotheses based on criteria such as specificity, verifiability, theoretical foundation, and testability. The challenges of using a simple addition method are addressed, and alternative methods such as taking the product or summing the square roots of individual scores are suggested. The underlying problem of defining what makes a hypothesis "useful" is also discussed, and a general mathematical formulation for finding a composite score is presented. Suggestions for considering weight values for each criteria in order to obtain a more useful score are also mentioned.
  • #1
PatternSeeker
19
0
Hi everyone!

This is an application question. I would like to get some advice about how to calculate a score based on a set of individual scores in a way that makes most sense.

CONTEXT:
I am going over some criteria for judging usability of hypotheses. I came up with a whole bunch about a certain topic and now I'm trying to select the best ones. I asked a number of people to evaluate these hypotheses on the criteria below.

A) hypothesis is specific
B) hypothesis is verifiable
C) hypothesis has a strong theoretical foundation
D) hypothesis can be tested using available resources
...
Let's say people evaluated these on a 10 point scale with 10 being the best.
I want one score based on all four criteria above. The easiest way would be to just add the mean individual scores. For example, if average ratings of a given hypothesis were 7,8,9, and 10 for criteria A, B, C, and D respectively, then the hypothesis would get the score of 34. But I wonder if addition would make sense. Here are some potential challenges:

1) if a hypothesis cannot be tested using available resources (criterion 4), then no matter how highly I evaluate points 1-3, I cannot use it. Such hypothesis could score higher than an alternative which was evaluated less highly on criteria A-D, but highly on criterion 4.
2) some of the criteria are highly corelated with each other. For example criteria 2 and 3
will be more highly correlated with each other than criteria B and C.
3) even though there may be a nearly perfect correlation between some criteria across the different hypotheses, conceptually these are different. So, averaging scores based on highly correlated criteria would not make sense.

How would you address the challenges above?
What are better alternative ways of obtaining a single score (alternative to A+B+C+D)?

I will greatly appreciate your help!
 
Last edited by a moderator:
Mathematics news on Phys.org
  • #2
No matter what you do it will be quite arbitrary.

Taking the product would heavily disfavor hypotheses that score poorly in one category.
Summing the square root (or some similar function) of the individual scores will also make low ratings more important, but not as much as the product.
You could introduce special rules like "something rated less than X in D cannot get an overall score better than Y"
 
  • #3
PatternSeeker said:
in a way that makes most sense.

The underlying problem is to define what you mean my "makes sense" - and if you actually mean "most" then you need some way to compare two ways of making a scale and deciding which one makes more sense.

If the goal is create numerical scale that reflects your own subjective judgements, then we have defined a specific problem - your particular judgements may not interest a lot of people, but at least it is a specific problem and the general idea of the problem is interesting.

One way to formulate a general case is as follows: We are given a list ##L## that orders N things from "best" to "worst". Never mind how this ordering is created - it's just a "given". Each of the N things is rated on M different aspects. We want to find a real valued function ##f(a_{k,1},a_{k,2},..a_{k,M})## defined on the M aspects of each ##k##-th thing such that the values of ##f## reflect the ordering given in the list ##L## - i.e. ## f(a_{i,1},...) > f(a_{j,1},..) ## if and only if thing ##i## is better than thing ##j## according to the list ##L##.

There are probably many ways to create a function ##f## that agrees with the ordering of list ##L##. However, the mathematical aspects of the problem are still interesting because we can see simple ways to create ##f##. The basic decision is whether you want to solve a mathematical problem or whether you want to discuss the somewhat philosophical question of what makes a hypothesis useful. If you want to discuss what makes scientific hypotheses useful then it would be better to do that in a section of the forum devoted to science.
 
  • #4
Note: each of your evaluation criteria are boolean, the hypothesis either meets the criteria or it does not. This is especially clear for D, where the hypothesis can either be evaluated with available resources or it cannot. I mean: if someone scored that a 4 or a 5 on the 1-10 scale, what would that mean?

Before you can get a proper composite score here you need to be clear about your metric.

To get a score on a range, then it must be possible for the thing being scored to exist on a range ... so A: you want to rank how specific the hypothesis is, not whether it is specific or not. (But how would you rank the specificness of a hypothesis ... can you give an example where something can score a 5/10 for being specific, compared with a 1/10 and a 10/10?)

As you have written it, it makes more sense to use a binary scale. Score 0 or 1

Then final score S=D(A+B+C) will satisfy the requirement that the highest score is best and a fail for D gives you S=0, for a fail overall.

In general, you will have to come up with a composite score where the different components have different "weight".
ie. when evaluating what sort of cars to buy for a company fleet, purchase price will be more important than comfort, with things like mileage and maintenance costs coming in between.

In those situations, a simple mean will not give a useful score.
The way to deal with this is to give each criteria a "weight value" as well... then you multiply the rating each criteria gets by the weighting... then take the average.

Example:

Consider: evaluating for a second date ... I may rate the subject on:
A. woman (= convincing cis female human)
B. smile
C. sense of humour
D. witty
E. bust wow factor
F. sluttiness
G. education
H. 1st impressions
I have a Y chromosome so sue me :P

I'll rate A as 0 or 1... all else out of 10.
Making myself out to be even more shallow... I could run the calculation as follows:

S = A[3B + 7C + D + 9E + 10F + 6G + 5H]/7 ... which will give a score out of 10.
The letters are the rating of each criteria, the numbers in front are the weightings (how important each one is to the evaluation).

Mind you... I may prefer something more like:
S = A[6B + 10C + 3D + E + 2F + 6G + H]/7

The point is to illustrate how flexible this way of doing things is.

You don't have to do a straight weighted average either ... ie. if you don't want outliers to have an undue influence, you can sum the squares and take the square root. You don't even have to use linear scales. For now though, this will give you the idea.

What I want you to take away from this is that you need to make sure the scoring of each criteria makes sense.
Can you reword your criteria so that they are not binary?
 
  • Like
Likes jim mcnamara
  • #5
Simon Bridge said:
Note: each of your evaluation criteria are boolean, the hypothesis either meets the criteria or it does not. This is especially clear for D, where the hypothesis can either be evaluated with available resources or it cannot. I mean: if someone scored that a 4 or a 5 on the 1-10 scale, what would that mean?
10 means you can test it at home, 0 means it would need more than the gross world product of 10 years, 4-5 is something a national lab could do?
 
  • Like
Likes Simon Bridge
  • #6
mfb said:
10 means you can test it at home, 0 means it would need more than the gross world product of 10 years, 4-5 is something a national lab could do?
Sure ... now let's see what OP had in mind.
 

Related to Algorithm to create a composite score

1. What is an algorithm to create a composite score?

An algorithm to create a composite score is a set of steps or instructions that are used to combine multiple individual scores or data points into a single overall score. This can be useful in situations where there are multiple factors or criteria that need to be considered in order to make a decision or evaluation.

2. How is a composite score calculated?

A composite score is typically calculated by assigning a weight to each individual score or data point, and then multiplying the weight by the score and summing all of the weighted scores together. The weights are often determined based on the relative importance or relevance of each factor.

3. What is the purpose of using a composite score?

The purpose of using a composite score is to provide a more comprehensive and holistic evaluation of a situation or data set. By combining multiple factors into a single score, it can help to simplify complex information and make it easier to compare and interpret.

4. Are there different types of algorithms for creating composite scores?

Yes, there are different types of algorithms that can be used to create composite scores. Some common ones include weighted average, weighted sum, and standardized scores. The type of algorithm used may depend on the specific goals and requirements of the situation at hand.

5. How can I ensure that my composite score is fair and accurate?

To ensure that a composite score is fair and accurate, it is important to carefully consider the weights assigned to each factor and to ensure that they are based on objective criteria. It can also be helpful to regularly review and update the algorithm as needed to reflect any changes or updates in the data or situation being evaluated.

Similar threads

Replies
3
Views
1K
Replies
4
Views
757
Replies
3
Views
12K
Replies
4
Views
537
Replies
2
Views
2K
  • Nuclear Engineering
Replies
31
Views
2K
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
795
Replies
1
Views
667
  • Art, Music, History, and Linguistics
Replies
3
Views
2K
Back
Top