Closest Matching Chemical Fingerprint -what analysis?

In summary, the conversation discusses the need to determine which of the 75 samples most closely resembles the 5 known samples in terms of their chemical composition. The speaker considers using a method to infer resemblance based on geographic location or using fuzzy clustering. They also mention the use of Mahalanobis distance to rescale the data for better clustering results. Ultimately, the goal is to identify the compounds that are most similar in the 5 known samples and compare them to the other 75 samples.
  • #1
geetar_king
26
0
Correlation of data sets, chemical composition

I have roughly 80 test results from different samples, each result set is a list of concentrations of various chemical compounds and proteins obtained through gcms (gas chromatography mass spec)

There are over 50 of these compound concentrations for each of these data sets.

Of these test results, 5 are from samples that are known to originate from the same source.

From what I can see by looking at the variance between these samples is that some of the compounds show similar test concentration and others do not, likely because of degradation due to exposure to different conditions.

I am trying to determine which of the other 75 samples (not in the 5 known same-source set) most closely resembles the 5.

Can someone recommend a method to correlate or determine which has the best match?

Thanks
 
Last edited:
Physics news on Phys.org
  • #2
geetar_king said:
I am trying to determine which of the other 75 samples (not in the 5 known same-source set) most closely resembles the 5.

To have a mathematical question, you have to be precise about what it means for one sample to resemble another.

On the one hand, you might have in mind that the 5 samples come from some, say, geographic location such as a freshwater swamp and you are wanting to know which of the other sample also come from freshwater swamps. In that case "resemble" means "come from similar geographic conditions". So you are asking how to infer a resemblance that is not explicitly part of the data itself.

On the other hand, you might not care whether a sample comes from. Perhaps you just want to treat each sample as a vector of numbers and ask which vectors are close to each other in an abstract 50-dimensional space. You could try a "fuzzy clustering" algorithm for that.

If you are trying to do statistical inference, you need an explicit model for how random variation enters your sample data. Statistical analysis requires a probability model. The "bare facts" of data do not provide enough information in themselves. It's tempting to say "I'm going to be purely objective, I won't make any assumptions." If you do that, you won't come to any statistical conclusions either.
 
  • #3
Thanks, I will look at fuzzy clustering.

I do not care really which samples come from a particular source. I also don't really know what compounds and proteins should remain unchanged in the sample over time or after exposure to different conditions, otherwise I would exclude some of the compounds.

I'll look at fuzzy clustering, otherwise, I'll try to determine which compounds are most similar in the 5 known samples, then look at that that set of concentrations in the other 75 samples.
 
  • #4
geetar_king said:
I'll try to determine which compounds are most similar in the 5 known samples, then look at that that set of concentrations in the other 75 samples.

Perhaps the idea of "Mahalanobis distance" would be useful. If you estimate the standard deviation of given type of concentration then it can be used to rescale the data before you use clustering. For example, for one type of measurement a difference of 5 ppm might be a "big" difference and for another type it might be a "small" difference. If can rescale the data so "big" and "small" have a common meaning for all types of concentration then clustering would work better.
 
  • #5
for your question. Based on the information provided, it seems like you are looking for a way to determine the closest matching chemical fingerprint of your samples. This can be achieved through a correlation analysis of the data sets, where you compare the chemical composition of the 5 known same-source samples with the other 75 samples. This analysis would involve looking at the concentrations of the various compounds and proteins present in each sample and determining if there are any similarities or differences. You could also use statistical methods, such as principal component analysis, to further analyze the data and identify any patterns or correlations between the samples. Ultimately, the goal would be to find the sample that has the most similar chemical composition to the 5 known same-source samples, indicating a close match in terms of chemical fingerprint.
 

Related to Closest Matching Chemical Fingerprint -what analysis?

1. What is a "Closest Matching Chemical Fingerprint" analysis?

A "Closest Matching Chemical Fingerprint" analysis is a method used by scientists to compare and match the chemical composition of two substances. It involves analyzing the unique set of chemical compounds present in each substance and determining how closely they match or resemble each other.

2. How is a "Closest Matching Chemical Fingerprint" analysis performed?

To perform a "Closest Matching Chemical Fingerprint" analysis, scientists use techniques such as chromatography and mass spectrometry to identify and separate the various compounds present in a substance. These compounds are then compared and matched with those in another substance to determine the level of similarity.

3. What are the applications of "Closest Matching Chemical Fingerprint" analysis?

"Closest Matching Chemical Fingerprint" analysis has various applications in fields such as forensic science, pharmaceuticals, and environmental research. It can be used to identify unknown substances, determine the purity of a substance, and track the source of a substance.

4. How accurate is a "Closest Matching Chemical Fingerprint" analysis?

The accuracy of a "Closest Matching Chemical Fingerprint" analysis depends on the sensitivity and precision of the techniques used, as well as the quality of the samples being analyzed. With advanced technology and proper calibration, this analysis can be highly accurate in identifying and matching chemical compounds.

5. Are there any limitations to "Closest Matching Chemical Fingerprint" analysis?

One limitation of "Closest Matching Chemical Fingerprint" analysis is that it requires a known reference sample or database of chemical fingerprints to compare with the unknown sample. If a reference sample is not available or if the substances being compared have very different chemical compositions, the analysis may not be accurate. Additionally, this analysis cannot determine the exact quantity of each compound present in a sample.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
17
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
764
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
3K
  • Materials and Chemical Engineering
Replies
4
Views
2K
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
20
Views
3K
Replies
16
Views
2K
Back
Top