Is it common for physicists to trash data?

In summary: You are an experimentalist and you have an apparatus that records the data every x seconds. You start the experiment and the machine starts making the measurements. The system produces a lot of data and after a while it stops. You go to the machine and see that it recorded the data for t=1, 2, 3,... 10 seconds. You decide to trash the data from t=1 to t=9 seconds and keep the data from t=10 to t=19 seconds.In both cases, the correct thing to do is to make a linear regression and set a threshold for "linearity" or whatever.
  • #1
fluidistic
Gold Member
3,924
261
I'll soon start to take some data (i.e. making measurements) and from the little I've seen from a physicist doing research, I believe he trashes some data. In other words he doesn't publish all the data he extracts from the laboratory. He applies some non rigorous/mathematical criterion to the data to trash and keep only what seems reasonable to him, be it for his data to agree with previous research or because the data is so unexpected that it doesn't fit in any theoretical model and it may mean he has goofed the experiment.
I was wondering whether this behavior was common between physicists and if there are some non obvious cases where trashing some data is acceptable.
 
Physics news on Phys.org
  • #2
It would be very rare for a researcher to publish all of their data. There are plenty of reasons to hold data back or trash it outright, some better than others. Holding it back simply because it doesn't agree with your preconceived ideas about what the results should be or invalidates a previous theory of yours is unethical. Holding it back because it is unexpected and you want to do additional tests to see if it was an error or correct is completely normal.
 
  • Like
Likes Superposed_Cat
  • #3
fluidistic said:
He applies some non rigorous/mathematical criterion to the data to trash and keep only what seems reasonable to him, be it for his data to agree with previous research or because the data is so unexpected that it doesn't fit in any theoretical model and it may mean he has goofed the experiment.

It's hard to say whether this is good or bad since we don't know the details. He may in fact have a very good reason for rejecting the data but be very bad at explaining the reasons to you. On the other hand, he could very well be trashing data without a good reason.
 
  • #4
fluidistic said:
I'll soon start to take some data (i.e. making measurements) and from the little I've seen from a physicist doing research, I believe he trashes some data. In other words he doesn't publish all the data he extracts from the laboratory. He applies some non rigorous/mathematical criterion to the data to trash and keep only what seems reasonable to him, be it for his data to agree with previous research or because the data is so unexpected that it doesn't fit in any theoretical model and it may mean he has goofed the experiment.
I was wondering whether this behavior was common between physicists and if there are some non obvious cases where trashing some data is acceptable.

Have you ASKED him why he used certain set of data and not others? That could have easily answered what you are asking here.

And no, as an experimentalist, I do not "thrash" out ANY data, even faulty ones, and even ones I do not use for some reason. That is a no-no. All data are kept and archived.

Zz.
 
  • #5
Thank you guys. I apparently don't know how to multi quote automatically.
Holding it back because it is unexpected and you want to do additional tests to see if it was an error or correct is completely normal.
Say that you got an unexpected "point" on the graph of your data and you want to make additional measurements for that particular point and that the new data seems to fit better the apparent curve... How do you really know that your 2nd measurement was indeed better? Just by the aesthetic of the data when plotted? That's basically what the physicist did.
The little I know about statistics tells me that taking more measurements because the result of the original one doesn't match what I expected is bad and skew the result (http://www.evanmiller.org/how-not-to-run-an-ab-test.html). I am not sure this is applyable in the case of a physics experiment hence my question here.
ZapperZ said:
Have you ASKED him why he used certain set of data and not others? That could have easily answered what you are asking here.

And no, as an experimentalist, I do not "thrash" out ANY data, even faulty ones, and even ones I do not use for some reason. That is a no-no. All data are kept and archived.

Zz.
Yes. And that happened in two very different cases:
1)There's a device that automatically register data every x seconds. You start the experiment and start the machine that makes the measurements. The system needs say 20 minutes to reach a temporary stability that last for say around 10 minutes. In the end you have data from t=0 to t=40 minutes. You plot the data and with your eyes you determine that the graph looks "linear" or "beautiful" or whatever criteria you assign and you decide to trash all the data but the one from t=22 min to t=29 min based on eyeballing the data.
Is that a correct behavior? Shouldn't he at least make a linear regression and set a threshold for the residuals, for all ranges of points? For example make a linear regression for the data from t=19min to t=28 min and caclulate the residuals. Do the same for t=20 to t=28 min, etc. and pick the one with least residuals? Of course these calculations would be automated by a program. Wouldn't that be much better than eyeballing the data?

2)The experiment is so noisy that measuring twice in a row under the same conditions give very different results. Hence some averages are made. In the end the plotted data is supposed not to have an enormous error bar, at least the ones that get published in serious journals. Now when a "point" on the graph looks way above or under the "trend of the expected curve", the physicist trashes the data in the sense that he'd rather not publish that point into a journal. Of course he will reproduce the experiment (and not trash the ugly looking data... it's just that it will never publish it) until he gets a point that fits well into his graph. He can repeat the experiment many times until that happens. So in the end he will publish the most beautiful data he measured in the laboratory but under the rock he hides a pack of discarted data. And if he never gets to measure a beautiful data for that ugly point, he will simply rather discard the point than introducing the ugly one.

Edit: It's in this way that I meant "trash", i.e. not publish that ugly looking data. Of course the data is not trashed from the computer or sheet of paper.
 
  • #6
But did you ASK him why he's doing that?! You didn't answer that explicitly, and didn't post what the response from him was if you did.

I have a UV-VIS set up that gives very noisy signal outside of the optical range. It tells me nothing about the data outside of that range. When I publish the result, I don't show that part because it tells me nothing, and I also do not conclude anything outside of that range. It is just not relevant to what I'm doing.

That is just one reason why you don't publish ALL the data. That is why you have to ask this person why he ignored the data outside of that range. Do the data not change anything even if they are included? Are they not in the range of interest? There can be a number of reasons, but this is all speculative unless you directly ask! Asking it here doesn't solve anything.

BTW, don't ever use the term "trash" in this situation. It is gives a misleading impression that the data was destroyed and thrown away! You could get yourself (and those you work with) in a hot mess if you are careless in the language and the words that you use.

Zz.
 
  • #7
ZapperZ said:
But did you ASK him why he's doing that?! You didn't answer that explicitly, and didn't post what the response from him was if you did.
I didn't ask him explicitely, he was explaining to me what he was doing. After a few hours of measuring he noticed an unexpected "point" in a graph and he said that he would not include it and that he would redo the experiment and if the point was still ugly he would simply remove that point from the graph, i.e. not publish that point. I concluded that the reason was because of an aestethical judgement.

I have a UV-VIS set up that gives very noisy signal outside of the optical range. It tells me nothing about the data outside of that range. When I publish the result, I don't show that part because it tells me nothing, and I also do not conclude anything outside of that range. It is just not relevant to what I'm doing.

That is just one reason why you don't publish ALL the data. That is why you have to ask this person why he ignored the data outside of that range. Do the data not change anything even if they are included? Are they not in the range of interest? There can be a number of reasons, but this is all speculative unless you directly ask! Asking it here doesn't solve anything.

In that case (case 1 in my post), he didn't include irrelevant data because the data was outside the range of interest, so it's similar to your case. Except that the range of interest is not "well defined" in the case of the physicist I've seen and he used eyeballing criteria of plotted data to set the range of interest. That influences the value of data of other plots. In your case you have an "optical range", may I ask whether your optical range is well defined and always the same regardless of your measurements? Or does it changes and if so, how do you determine the region of interest?
BTW, don't ever use the term "trash" in this situation. It is gives a misleading impression that the data was destroyed and thrown away! You could get yourself (and those you work with) in a hot mess if you are careless in the language and the words that you use.

Zz.
Sorry about that, you are right.
 
  • #9
If there is an anomalous result. then either the theory behind the experiment is wrong or there is something wrong in the way the experiment was carried out.
It's generally best to check the experimental setup first, and if it becomes obvious that something is wrong with that, then attempt to fix the problem.
A good recent example is this
http://en.wikipedia.org/wiki/Faster-than-light_neutrino_anomaly

There isn't any good reason to destroy the data though, instead note simply that the experiment was found to be invalid for whatever reason.
 
Last edited:

Related to Is it common for physicists to trash data?

1. Is it common for physicists to intentionally manipulate or discard data?

No, it is not common for physicists to intentionally manipulate or discard data. As scientists, we strive to accurately collect and analyze data in order to make valid conclusions. Manipulating or discarding data goes against the principle of scientific integrity and can lead to false or biased results.

2. Are there any legitimate reasons for physicists to discard data?

Yes, there are legitimate reasons for physicists to discard data. For example, if there are errors in the data collection process or if the data is found to be unreliable, it may be necessary to discard that data in order to ensure the accuracy of the overall results.

3. How do physicists handle data that may be considered outliers or unexpected?

Physicists typically handle outliers or unexpected data by first examining the data closely to determine if there are any errors or anomalies. If the data is found to be valid, it may be included in the analysis, but with a note explaining its unusual nature. In some cases, the data may be discarded if it is deemed to be unreliable or not relevant to the research question.

4. What measures do physicists take to ensure the integrity of their data?

Physicists take several measures to ensure the integrity of their data, including using reliable and accurate data collection methods, properly documenting the data, and conducting thorough analyses to detect any errors or inconsistencies. Additionally, data is often peer-reviewed by other scientists to verify its validity and reliability.

5. Are there any consequences for physicists who are found to have manipulated or discarded data?

Yes, there can be consequences for physicists who are found to have manipulated or discarded data. This can include damage to their reputation and credibility within the scientific community, potential retraction of published research, and in severe cases, legal consequences. Scientific misconduct is taken very seriously and can have serious implications for the individual and the scientific field as a whole.

Similar threads

Replies
1
Views
1K
Replies
47
Views
4K
  • Science and Math Textbooks
Replies
28
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
12
Views
3K
  • General Discussion
Replies
5
Views
1K
  • STEM Educators and Teaching
Replies
4
Views
2K
Replies
26
Views
1K
  • Beyond the Standard Models
Replies
19
Views
5K
Replies
1
Views
1K
  • STEM Career Guidance
Replies
3
Views
2K
Back
Top