Multiple variable correlation - drowning in stats formulas,

In summary, Ralph suggests that a linear regression model be used to determine the correlation between various sample parameters in a water quality study.
  • #1
ralph86
4
0
Hello everyone,

My boss has asked me to find if there is any correlation between various sample parameters in a water quality study. We took around 150 samples, and for each one measured around 50 parameters e.g. iron content, free residual chlorine, level of coliforms, pH, etc etc.

I always avoided stats through school, and having spent a couple of hours reading up about my options for doing this analysis on the internet, I remember why.

So, please, in simple terms, can someone explain to me:
What function or statistical analysis technique(s) I could/should use for this analysis
What (ideally free) software is available to do it

THANK YOU!
Ralph
 
Physics news on Phys.org
  • #2
Simple linear regression would be the easiest approach, assuming it is suitable for the data you have. Excel can do this well enough for most purposes. The value you would be looking for is the r^2 (r squared). With simple linear regression this tells you how 'strong' the relationship is between the dependant and independant variables.

I think you need to explain a bit more what your variables are. I assume you have a dependant variable which is some sort of water quality measure (DO, BOD etc?), then a batch of independant variables such as iron content, suspended solids and so on, and you want to be able to predict the water quality using the independant variable.

This kind of analysis can be quite easy, if you're lucky. For example an ideal situation for you would be that water quality is largely determined by, say, the suspended solids content of the water, so you only need one variable to predict water quality reasonably accurately. However most real-life situations are rarely that convenient and you may find it difficult to create a simple statistical model from your data.
 
  • #3
Hi Richard,

Thanks for your reply.

Just to be sure I understand, a dependent variable is one that is purely affected by another one, which you are also measuring?

If this is the case, the situation is a bit more complex - one indicator, e.g. dissolved oxygen, may be the determinate (independant?) variable for some indicators, i guess the microbiological ones, but another e.g. a certain heavy metal, may be an indicator for industrial pollution, and hence correlate with the presence of various industrial contaminants e.g. endocrine disruptors, soaps, whatever.

What I would like to do is, in some kind of reasonably automated way, find the correlation between every possible pair of parameters, and report it (by comparing R^2 I suppose).
Can you recommend a program or software for doing this?

Probably there are interactions between combinations, e.g. coliforms will need a certain pH, be more disposed to the rainy season, cannot survive in high concentrations of lead, etc., but I guess I don't have enough data to do look at high numbers of parameters at the same time with only 150 samples, although if you can think of a way I'd love to hear it.

Thanks for your help
Ralph
 
  • #4
Yes there will probably be correlations between a large number of your parameters (known as multicollinearity).

The best statistics package that I've personally used is SPSS - might be expensive though (I use the one my university pays for so no idea how much a license costs).

To be honest though, the data you describe, and the tasks you need to perform, it sounds as though the statistics you need will be rather complicated and require a good understanding of how stats works and how to apply the right methods, how to interpret the results etc.

If this is something you absolultely have to do you might be best to sub-contract out to an actual statistician.
 
  • #5
Ok Richard, thanks for your help. I was hoping for an easy way of doing this, and that sounds pretty easy. I will ask around if anyone dominates stats in the department :)
Ralph
 

Related to Multiple variable correlation - drowning in stats formulas,

What is multiple variable correlation?

Multiple variable correlation is a statistical method used to measure the relationship between three or more variables. It assesses the degree to which changes in one variable are associated with changes in the other variables.

How is multiple variable correlation calculated?

Multiple variable correlation is calculated using a statistical measure called the correlation coefficient. This is a numerical value between -1 and 1 that represents the strength and direction of the relationship between the variables. A value of 0 indicates no correlation, while a value of 1 or -1 indicates a perfect positive or negative correlation, respectively.

What is the purpose of multiple variable correlation?

The purpose of multiple variable correlation is to determine whether there is a relationship between three or more variables and to what extent they are related. It can also help identify which variables have the strongest influence on each other.

What are some limitations of multiple variable correlation?

Multiple variable correlation does not necessarily imply causation. Just because two variables are highly correlated does not mean that one causes the other. Additionally, correlation does not take into account other factors that may be influencing the variables.

How is multiple variable correlation used in research?

Multiple variable correlation is commonly used in research to explore relationships between variables and to test hypotheses. It can also be used to identify potential confounding variables that may need to be controlled for in further studies. It is often used in fields such as psychology, sociology, and economics.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
10K
  • STEM Academic Advising
Replies
13
Views
2K
  • General Math
Replies
4
Views
4K
  • STEM Academic Advising
Replies
10
Views
4K
  • General Discussion
Replies
12
Views
5K
  • Precalculus Mathematics Homework Help
Replies
4
Views
4K
  • Sticky
  • Feedback and Announcements
Replies
2
Views
495K
Back
Top