Data analysis - Comparing temperature data from multiple sources

In summary, the conversation discusses the process of comparing temperature data from multiple sources, specifically in urban and rural areas. The steps involved in this comparison include averaging the data from each sensor and each location, and then comparing the results. The conversation also touches on the issue of uncertainties and the need to consider various factors such as weather effects and calibration uncertainties. There is also a discussion about the interpretation of results and the use of different statistical tests, such as the T-test. However, there is some confusion about the significance of the data and when it is necessary to check for significance in comparisons between different data sets.
  • #1
Deepak Bikram
5
0
Data analysis -- Comparing temperature data from multiple sources

Dear Peter,

I am Engineering student from Japan.I have installed 8 sensors,4 at rural and 4 at urban area.Those sensors measures the temperature and other property in time series format.Now, I am using one month data and comparing the temperature tendency between Urban and rural. The steps which i followed to compare the urban data and rural are as follows.
1) I averaged the 4 sensors data in time series format.(for both Urban and rural for a day)\


2) like wise i averaged the above values for 1 month in time series format for both urban and rural.
3) now i am comparing the data between the urban and rural in time series format.

is it the proper way to compare or there are some other techniques available?

for this case do i need to check the significance of the data? if yes, what can be the possible significance test?
 
Physics news on Phys.org
  • #2
Who is Peter?

Where are your sensors located? Far away, or within the same region?
Do you have uncertainties from your individual datapoints? Do you have calibration uncertainties? Did you test all 8 sensors in the same place at some point?

Averaging over a whole month and then comparing results is problematic - there is no meaningful way to assign an uncertainty in between those steps. Clearly the temperature values will have large variations due to the day/night cycle and weather effects - but that is common to all sensors.

For each timestep, you can subtract the average (of all 8 sensors) - that way, your values are just sensitive to temperature differences. Averaging those should give a way more relevant figure, and you can use standard uncertainty propagation to find an uncertainty on the average.
Be careful with the interpretation, however - a difference between measurements in rural and urban areas does not have to come from rural and urban areas, there are many possible sources for differences.
 
  • #3
Greetings,

Thank you very much for your reply.I am also having the similar problem with remote sensing data for short wave radiation.

My Points are located almost 1 km apart both in urban(4 points) and rural(4 Points) and the distance between urban and rural is almost 6 KM in same plane, I am taking data from remote sensing.I am analyzing Short wave radiation for day time only,Due to cloud formation difference the Short wave radiation also varies from point to point.The data is in the form of Time series series daily(6am to 6pm).

to compare the difference between rural and urban Shortwave radiation, I averaged the time series values of each points from Urban and same for the rural.

Urban average(i) 6:00= (U1+U2+U3+U4)/4
Urban average(i) 7:00= (U1+U2+U3+U4)/4
Urban average(i) 8:00= (U1+U2+U3+U4)/4
...
...
Rural average(i) 6:00=(R1+R2+R3+R4)/4
Rural average(i) 7:00=(R1+R2+R3+R4)/4
Rural average(i) 8:00=(R1+R2+R3+R4)/4
...
...
...
The values of Urban average and the Rural average are calculated for 1 month in the similar fashion in time series format.

The ( U1,U2,U3,U4 ) and (R1,R2,R3,R4) has variations respectively, because of cloud thickness difference.

Now,the Urban average for each time for whole month is averaged to get Uavg and so for the Ravg in time series format.

I made a virtual data set by picking the maximum from all the days in each timeseries format which resembales the ideal format of the shortwave radiation SWmax.

To compare the result I used ((SWmax-Uavg)/Uavg*100)% and ((SWmax-Ravg)/SWmax*100)%

The difference of % are not so high in some point of time. so, I am little bit confused,are these difference% significant.
because there are lot more variations in the data set. so please suggest me the way how I can compare it properly.and in a significant way. Deepak
 
  • #4
I don't see what those values are supposed to mean. The first one is "how large is the maximum (only urban or both?) compared to the urban average that day", okay (I don't see the interpretation of this value. What does "50%" tell you, for example?). But the second, where you divide by SWmax? "how large is the rural average compared to the maximum, and then take the negative value of this".
 
  • #5
I have taken the maximum values of each time (time series format)combining both Urban and Rural,and of whole 1 month.I simply used the command =max(..:..) in excel to get this. The final comparison is in the form of Percentage But the formula above i mentioned was little mistake Sorry...

To compare the result I used ((SWmax-Uavg)/Uavg*100)% and ((SWmax-Ravg)/SWmax*100)% (Wrong)
To compare the result I used ((SWmax-Uavg)/SWmax*100)% and ((SWmax-Ravg)/SWmax*100)%(Correct)

actually the percentage gives the "decreased ratio of short wave flux.50% means the there is the presence of cloud which is thicker enough to block the 50% of the Short wave radiation at that time.

In my case i got less then 1% difference between Urban and the rural%.In this case can i conclude my results with such less values difference? and do I need to check the significance of the raw data.As i tested T test for each time the % came out to be more then 40% in some times even 90%.
 
  • #6
In my case i got less then 1% difference between Urban and the rural%.In this case can i conclude my results with such less values difference?
This is hard to tell without knowing the actual setup and seeing the data, but I guess 1% is below multiple systematic effects, maybe even below the statistical effects.
and do I need to check the significance of the raw data
What is the significance of data? Significance of what?
 
  • Like
Likes 1 person
  • #7
I am told that,for the validation of data and to compare the two different things, let's say Urban and Rural in my case I have to perform Ttest between the urban data set and the rural data set.But I am not sure why to check the significance between Urban and rural.

Can you explain when and at what circumstances do we need to compare the significance.and do i need to check the t-test between Urban and rural?
 
  • #8
I don't know what you mean with "compare the significance".
and do i need to check the t-test between Urban and rural?
I don't think so, but it depends on what exactly you want to find out.

Data analysis methods always have some purpose. You cannot just "analyze data" and then everything is done. You have to define what you want to know first.
 
  • #9
My purpose for data analysis is to compare the Short wave radiation between Urban and Rural.like I mentioned in #3.so, My question is for this comparison between urban Shortwave and rural shortwave do we need to test the data like T-test of something else?

Thank you very much for your kind suggestion.
 
  • #10
My purpose for data analysis is to compare the Short wave radiation between Urban and Rural.
That is a very broad topic. And I doubt you can get this at all with your data, as there are so many other factors that can influence your values.

I guess you can use a t-test, but then you have to prepare a data sample that is suitable for it. Especially the required normal distribution of your values looks problematic - none of your measurement series will have a normal distribution. Maybe the difference between the averages, for each time step, can satisfy that.
 
  • Like
Likes 1 person
  • #11
If your hypothesis is that one set is higher than the other, you can rank order the 8 data for each day (or make one giant rank ordering for all data) and apply some non-parametric statistics tests. Then you would not need to make a model for either set of data.
 

Related to Data analysis - Comparing temperature data from multiple sources

1. What is the purpose of comparing temperature data from multiple sources?

The purpose of comparing temperature data from multiple sources is to gain a more comprehensive understanding of temperature patterns and trends. By analyzing data from different sources, scientists can identify any discrepancies or inconsistencies, and determine the most accurate and reliable data to use for their research.

2. How do scientists compare temperature data from multiple sources?

Scientists compare temperature data from multiple sources by using statistical techniques such as regression analysis or correlation analysis. They also conduct visual comparisons by creating graphs or charts to visualize the data. Additionally, they may use computer programs or software specifically designed for data analysis.

3. What are the potential challenges of comparing temperature data from multiple sources?

One of the main challenges of comparing temperature data from multiple sources is the potential for discrepancies or inconsistencies between the data. This can be caused by differences in measurement methods, instrumentation, or location. Another challenge is ensuring the data is accurate and reliable, as some sources may have biased or incomplete data.

4. How does comparing temperature data from multiple sources contribute to our understanding of climate change?

Comparing temperature data from multiple sources is crucial in understanding climate change because it allows scientists to identify global trends and changes in temperature over time. By analyzing data from different sources, scientists can also identify any regional variations in temperature patterns, which can provide insight into the causes and impacts of climate change.

5. What are some potential applications of comparing temperature data from multiple sources?

One potential application of comparing temperature data from multiple sources is in climate modeling. By using data from different sources, scientists can create more accurate and reliable models to predict future temperature patterns. Comparing temperature data can also help inform policy decisions related to climate change and guide efforts to mitigate its effects.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • STEM Educators and Teaching
Replies
5
Views
834
  • General Math
Replies
1
Views
864
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Mechanical Engineering
Replies
4
Views
985
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Mechanical Engineering
Replies
3
Views
408
Back
Top