Closest Matching Chemical Fingerprint -what analysis?

geetar_king · Apr 16, 2014

Correlation of data sets, chemical composition

I have roughly 80 test results from different samples, each result set is a list of concentrations of various chemical compounds and proteins obtained through gcms (gas chromatography mass spec)

There are over 50 of these compound concentrations for each of these data sets.

Of these test results, 5 are from samples that are known to originate from the same source.

From what I can see by looking at the variance between these samples is that some of the compounds show similar test concentration and others do not, likely because of degradation due to exposure to different conditions.

I am trying to determine which of the other 75 samples (not in the 5 known same-source set) most closely resembles the 5.

Can someone recommend a method to correlate or determine which has the best match?

Thanks

Stephen Tashi · Apr 17, 2014

geetar_king said:

I am trying to determine which of the other 75 samples (not in the 5 known same-source set) most closely resembles the 5.

To have a mathematical question, you have to be precise about what it means for one sample to resemble another.

On the one hand, you might have in mind that the 5 samples come from some, say, geographic location such as a freshwater swamp and you are wanting to know which of the other sample also come from freshwater swamps. In that case "resemble" means "come from similar geographic conditions". So you are asking how to infer a resemblance that is not explicitly part of the data itself.

On the other hand, you might not care whether a sample comes from. Perhaps you just want to treat each sample as a vector of numbers and ask which vectors are close to each other in an abstract 50-dimensional space. You could try a "fuzzy clustering" algorithm for that.

If you are trying to do statistical inference, you need an explicit model for how random variation enters your sample data. Statistical analysis requires a probability model. The "bare facts" of data do not provide enough information in themselves. It's tempting to say "I'm going to be purely objective, I won't make any assumptions." If you do that, you won't come to any statistical conclusions either.

geetar_king · Apr 17, 2014

Thanks, I will look at fuzzy clustering.

I do not care really which samples come from a particular source. I also don't really know what compounds and proteins should remain unchanged in the sample over time or after exposure to different conditions, otherwise I would exclude some of the compounds.

I'll look at fuzzy clustering, otherwise, I'll try to determine which compounds are most similar in the 5 known samples, then look at that that set of concentrations in the other 75 samples.

Stephen Tashi · Apr 17, 2014

geetar_king said:

I'll try to determine which compounds are most similar in the 5 known samples, then look at that that set of concentrations in the other 75 samples.

Perhaps the idea of "Mahalanobis distance" would be useful. If you estimate the standard deviation of given type of concentration then it can be used to rescale the data before you use clustering. For example, for one type of measurement a difference of 5 ppm might be a "big" difference and for another type it might be a "small" difference. If can rescale the data so "big" and "small" have a common meaning for all types of concentration then clustering would work better.

blue_raver22 · Apr 24, 2014

for your question. Based on the information provided, it seems like you are looking for a way to determine the closest matching chemical fingerprint of your samples. This can be achieved through a correlation analysis of the data sets, where you compare the chemical composition of the 5 known same-source samples with the other 75 samples. This analysis would involve looking at the concentrations of the various compounds and proteins present in each sample and determining if there are any similarities or differences. You could also use statistical methods, such as principal component analysis, to further analyze the data and identify any patterns or correlations between the samples. Ultimately, the goal would be to find the sample that has the most similar chemical composition to the 5 known same-source samples, indicating a close match in terms of chemical fingerprint.

Closest Matching Chemical Fingerprint -what analysis?

Related to Closest Matching Chemical Fingerprint -what analysis?

1. What is a "Closest Matching Chemical Fingerprint" analysis?

2. How is a "Closest Matching Chemical Fingerprint" analysis performed?

3. What are the applications of "Closest Matching Chemical Fingerprint" analysis?

4. How accurate is a "Closest Matching Chemical Fingerprint" analysis?

5. Are there any limitations to "Closest Matching Chemical Fingerprint" analysis?

Similar threads

Hot Threads

Recent Insights