What is Statistics: Definition and 998 Discussions
Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation.
Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation). Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.
A standard statistical procedure involves the collection of data leading to test of the relationship between two statistical data sets, or a data set and synthetic data drawn from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null can be proven false, given the data that are used in the test. Working from a null hypothesis, two basic forms of error are recognized: Type I errors (null hypothesis is falsely rejected giving a "false positive") and Type II errors (null hypothesis fails to be rejected and an actual relationship between populations is missed giving a "false negative"). Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis. Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also occur. The presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems.
Andrei has a bag of x sweets.
He removes two sweets from the bag simultaneously (without replacement).
He now removes a third sweet.
The probability that the third sweet is red is (x/2) - 1.
How many red sweets were in Andrei's bag to begin with?
Could somebody please tell me if (and how) it is...
This could as well go to the statistics, but I am looking at it from particle physics point of view...
Why adding systematic uncertainties worsen the expected upper limits to the signal strength?
I am trying to find where the flaw enters in the following logic:
0. The model most analyses use is...
An interesting article in Ars Technica on p-hacking vs deep data dives:
https://arstechnica.com/science/2017/04/the-peer-reviewed-saga-of-mindless-eating-mindless-research-is-bad-too/
We shouldn't look at data trying to find something interesting but should instead have a hypothesis in mind...
I’m currently evaluating the "realism" of two survival models in R by comparing the respective Kullback-Leibler divergence between their simulated survival time dataset (`dat.s1` and `dat.s2`) and a “true”, observed survival time dataset (`dat.obs`). Initially, directed KLD functions show that...
Homework Statement
Compare and contrast the given data
Homework Equations
None needed for this
The Attempt at a Solution
I'm never too sure what kind of thing I'd be expected to do for something like
this.
Here's how I would go about it, but would appreciate any pointers / things to...
Is expected value all that matters? I have heard of the Kelly criterion but what should you do if you cannot allocate the optimal amount?
For example, if you have a 0.01% chance of winning $100,000,000 but a 99.9% chance of losing $10,000 and you could only bet once, would you accept the bet...
I am writing a report for my boss quoting the success rates for tests of various components. If something works 19 times out of twenty, then it's 95%. But what is the uncertainty on this? 95% +/- ?
And if a component passes every test (100%), what is the lower limit on the actual rate? How many...
Homework Statement
A set of eight numbers has a median of 19.
a) What is the sum of the fourth and fifth data points
b) What is the sum of the fourth and sixth data points
(no answers available)
Homework Equations
(N+1)/2 -- Position of the median
(N+1)/4 -- Position of Q1
3(N+1)/4 --...
Homework Statement
Customers arrive at an ATM at a rate of 12 per hour and spend 2 minutes using it, on average. Model this system using a Bernoulli single-server queuing process with 1-minute frames.
a. Compute the transition probability matrix for the system.
b. If the ATM is idle now, find...
Homework Statement
[/B]Suppose your office telephone has two lines, allowing you to talk with someone and have at most one other person on hold. You receive 10 calls per hour and a conversation takes 2 minutes, on average. Use a Bernoulli single-server queuing process with limited capacity and...
In 2D, if we define exchange statistics in terms of the phase change of the wavefunction of two identical particles when there are exchanged via adiabatic transport (https://arxiv.org/abs/1610.09260), we would discover that this phase can be arbitrary due to the topology of relative...
Homework Statement
f(xy)=49/8*e^(−3.5*y) 0 < y < inf and −y < x < y
0 otherwise
a. Find the marginal probability density function of X, fX(x). Enter a formula in the first box, and a number for the second and the third box corresponding to the range of x. Use * for multiplication, / for...
Homework Statement
My question is why is the assumption necessary to make? (Please see the image).
Homework EquationsThe Attempt at a Solution
We can easily proceed by treating the two samples as two different population, find their individual unbiased estimate of variance and then use the...
'In inferential statistics, the term "null hypothesis" is a general statement or default position that there is no relationship between two measured phenomena, or no association among groups.' (wiki)
The book I'm following has to say :
Q: In a nutritional study 13 students were given a usual...
I was reading a book that said:
Unpaired t-test is applied to unpaired data of independent observations made on individuals of two different groups (of a single sample) or samples drawn from two populations.
Now what wiki says is that they are not unpaired the e.g. given is one with 50 and 50...
(Sorry for the terrible title. If anybody have a better idea, post and I will edit. Also I have no idea of the level so now I just put undergraduate since the problem is fairly easy to state.)
Suppose I buy ## N## sensors which the manufacturer tells me will fail at some point and the failure...
Hi.
I'm afraid I might just be discovering quite a big misunderstanding of mine concerning the meaning of the expectation value of a commutator for a given state.
I somehow thought that if the expectation value of the commutator of two observables ##A, B## is zero for a given state...
Homework Statement
Each week, Stéphane needs to prepare 4 exercises for the following week's homework assignment. The number of problems he creates in a week follows a Poisson distribution with mean 6.9.
a. What is the probability that Stéphane manages to create enough exercises for the...
I've been looking into time series analysis from a statistical perspective (looking to expand my bag of tools in analyzing experimental data) and I repeatedly run into the concept of moment and cumulant spectra. The problem is that my undergraduate course on statistics back in the day...
Problem: I'm interested in studying the probability of an event involving a random vector.
Specifically, I'm interested in
(∂/∂a)Pr[X>( (Y-a)/Z )]
Where "a" is a non-random parameter and the random vector {X,Y,Z} is distributed Normal( µ, Σ)
for µ={0,0,0}
and Σ= {{1, 0.5, 0.5}, {0.5, 1, 0}...
I am currently doing some research on the impact of social media bots. I am looking into the social and cultural aspects of bot use in social media.
The technology I have under wraps but I do not have much information on the social aspect and impact of bot use. I also like to find some...
Homework Statement
I generated data for a dice experiment. For the first case, two dies were rolled and the minimum number and the sum were recorded. For the other cases with three, four, and five dies, the minimum and sum were also recorded. I attached a picture of the tables with my data and...
Hi all,
in several vehicles, I measured the engine torque and speed and the engaged gear while it was driven for around 100km/h. I computed the average engine speed and torque of all the times the vehicle was run with each gear and also I computed the relative frequency of the gear used. So for...
which do you think is the smarter choice, in terms of employ-ability: ms in statistics or Ms in data science? do you think "data science" is just a buzz words that will die out? is a data scientist someone who can't program as well as the computer engineer, and can't build models as well as a...
Im trying to solve the following problem from the book 'Learning with kernels', and would really appreciate a little help.
Background information
- Let $\{(x_{1},y_{1}),...,(x_{N},y_{N})\}$ be a dataset, L a Loss function and $H(k)$ a reproducing kernel Hilbert space with kernel $k$. The...
Hey
I'm currently studying psychology and the statistics course really opened my mind. Everything seems intuitive and reasonable and I really want to get deeper into this subject. Not sure if I should just jump into the water and major also in statistics or first just try and take another...
I am working on astrophysical data and I have a large number of redshift values of quasars. Now, each redshift estimate comes with its estimated standard error naturally. If I plot a histogram of these redshifts, I would expect the bins counts to also have some sort of uncertainty.
I am unable...
I have a model which is quadratic (e.g. ##y = k x^2##). I'm comparing it against a large set of data (galaxy cluster masses) which spans several Log10 decades (e.g. ##10^{11}## to ##10^{15}## solar masses). What is the right way to say how good the data fits the model? Obviously the errors in...
This is a problem that I thought I 'solved' many years ago.
In actual fact there are many things about it that are not clear to me, and I would like to hear your opinion, please.
Very briefly, there was this TV programme where a (supposedly psychic) guy had to match 5 (husband-wife) couples...
So the problem asks:
A computer server runs smoothly for Exp(0.2) days and then takes Exp(0.5)days to fix. The server is running fine on Monday morning, t=0. Find the probability that the server was fixed at least once (i.e. at least one complete repair was done) in the next 7 days and the...
Hi,
I was wondering which course will be more beneficial to take first for a first year student majoring in computer science?
Note: I tend to dislike proofs and theories.
Discrete Structures - An introduction to the basic concepts of statistical analysis with special emphasis on engineering...
Hello,
So i'am really facing a hard time deciding weather to choose a Ms in Statistics or data-mining
(please bear with my english as it's not my first language)
a little bit about my background :
.)a bachelor of science in applied mathematics and computer science
.)good Gpa
.)like...
I am looking for a software that enable me to do the basic data analysis like ANOVA, regression, factor analysis and to do nice graphing. I need something similar to SPSS with data entry in the form of variables no coding input like SAS.
Homework Statement
We have a 1-D lattice [a line] of ##L## sites. Sites are occupied with probability ##p##. Find the probability that a given site is a member of a cluster of size ##s##. (A cluster is a set of adjacent occupied sites. The cluster size is the number of occupied sites in the...
I am a Mathematics and Chemistry major with a Physics minor. I need to take one more mathematics elective course next semester. I had two picked out but both unfortunately overlap with other classes I am taking, so I am now trying to choose between Probability or Mathematical Statistics (course...
My background is applied mathematics and statistics. A lot of the problems that I have been applying statistical methods towards have been very dry(medical studies, etc).
Actually my favorite subject is physics, but I don't have too much experience in it. I am willing to learn. Any suggestions...
Homework Statement
In a physics lab, Logger Pro software generated statistical estimators such as the standard deviation σ = 0.04021 of a sample of size n = 29.
Among other things, I must calculate the standard error of the mean σmean.
My question is: Must σmean have four sig figs or two...
Hi all, I'm considering getting a masters in Analytics at U San Francisco: https://www.usfca.edu/arts-sciences/graduate-programs/analytics
i'm interested in this because i would have met all the prereqs after i finish my undergrad, and also because i live in SF. However, the program costs $45k...
I am reading an articles introducing the Nobel Price on Bose-Einstein condensates from where I have further reading on Bosonic and Fermionic statistics on some texts. I know one of the mathematical difference is the +/- 1 term in the denominator of the distribution function as below
##f_{BE} =...
Hi all,
I am reading an introduction on classical and quantum-mechanical statistics. The material considers a 4-particle system with discrete energy level 0E, 1E, 2E, 3E, 4E, 5E and 6E. It is said that the classical particle is indistinguishable but you can identify the different particle by...
Three conditions must be met in order for the Poisson Distribution to be used:
1) The average count rate is constant over time
2) The counts occurring are independent
3) The probability of 2 or more counts occurring in the interval $n$ is zero
Simply, why must these conditions be met for valid...
Problem:
Suppose I have a production process that yields output in batches of n items. For each batch, I can test whether they are of good or bad quality. Let q_i ∈ {1,0} be the quality of tested item i.
If more than half of the items are ‘bad’, the batch should be discarded. In other words...
Hi all! I'm an undergrad sophomore in engineering trying to decide on which major to pursue (namely civil/mechanical). Recently I came across this fascinating piece of publication, which contained a massive amount of data regarding engineering enrollments by major, demographic, and school...
Eleanor scores 680 on the mathematics part of the SAT. The distribution of SAT math scores in recent years has been Normal with mean 547 and standard deviation 85.
Gerald takes the ACT Assessment mathematics test and scores 27. ACT math scores are Normally distributed with mean 21.3 and...
Homework Statement
(1) Let the random variable X be the body temperature in ◦C for a randomly chosen person during waking hours. X is assumed to be a normally distributed with mean E(X) = 37.5 and standard deviation sd(X) = 0.3. Let Y be the body temperature in ◦F for a randomly chosen person...
Homework Statement
In a probability experiment, a fair die is rolled twice.
• If the first roll is odd, the outcomes are recorded as they appear.
• If the first roll is even, the recorded outcome for the second die is doubled. For example, if the first die was 2 and the second 4, the...
Homework Statement
(a) Suppose a fair six-sided die is rolled once. Let A be the event that an even face occurs and B be the event that a face less than 4 occurs. Are the events A and B independent? Show this mathematically.
(b) A fair coin is tossed three times. Let A be the event that the...
Question: So I hold a Bachelor's degree in Chemistry and have just started a M.S./Ph.D. track in Chemical Engineering. My dream job is in mineral processing (the biggest dream is with an asteroid mining company), but I understand that the school I'm going to is heavily focused on drug discovery...
Hello. I have a question about a step in the factorization theorem demonstration.
1. Homework Statement
Here is the theorem (begins end of page 1), it is not my course but I have almost the same demonstration : http://math.arizona.edu/~jwatkins/sufficiency.pdf
Screenshot of it:
Homework...
I am trying to better understand the concept of second order coherence G2(τ) (in particular G2(0)) and a few questions have arisen. Note that I am trying to get a physical idea of what is happening so I would appreciate it if your responses can keep the math to the minimum possible. :)
How do...