What is Statistics: Definition and 998 Discussions

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation.
Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation). Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.
A standard statistical procedure involves the collection of data leading to test of the relationship between two statistical data sets, or a data set and synthetic data drawn from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null can be proven false, given the data that are used in the test. Working from a null hypothesis, two basic forms of error are recognized: Type I errors (null hypothesis is falsely rejected giving a "false positive") and Type II errors (null hypothesis fails to be rejected and an actual relationship between populations is missed giving a "false negative"). Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis. Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also occur. The presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems.

View More On Wikipedia.org
  1. thebosonbreaker

    B Sweets in a bag probability problem

    Andrei has a bag of x sweets. He removes two sweets from the bag simultaneously (without replacement). He now removes a third sweet. The probability that the third sweet is red is (x/2) - 1. How many red sweets were in Andrei's bag to begin with? Could somebody please tell me if (and how) it is...
  2. ChrisVer

    A Help with the statistics of Upper Limits?

    This could as well go to the statistics, but I am looking at it from particle physics point of view... Why adding systematic uncertainties worsen the expected upper limits to the signal strength? I am trying to find where the flaw enters in the following logic: 0. The model most analyses use is...
  3. jedishrfu

    News Dangers of Using Statistics Wrongly in Scientific Research

    An interesting article in Ars Technica on p-hacking vs deep data dives: https://arstechnica.com/science/2017/04/the-peer-reviewed-saga-of-mindless-eating-mindless-research-is-bad-too/ We shouldn't look at data trying to find something interesting but should instead have a hypothesis in mind...
  4. N

    A Comparing Kullback-Leibler divergence values

    I’m currently evaluating the "realism" of two survival models in R by comparing the respective Kullback-Leibler divergence between their simulated survival time dataset (`dat.s1` and `dat.s2`) and a “true”, observed survival time dataset (`dat.obs`). Initially, directed KLD functions show that...
  5. gelfand

    How Do Mean, Standard Deviation, and IQR Reflect Differences in Data Sets?

    Homework Statement Compare and contrast the given data Homework Equations None needed for this The Attempt at a Solution I'm never too sure what kind of thing I'd be expected to do for something like this. Here's how I would go about it, but would appreciate any pointers / things to...
  6. B

    B How to determine when to take a bet?

    Is expected value all that matters? I have heard of the Kelly criterion but what should you do if you cannot allocate the optimal amount? For example, if you have a 0.01% chance of winning $100,000,000 but a 99.9% chance of losing $10,000 and you could only bet once, would you accept the bet...
  7. G

    I How to calculate the uncertainty of success rates?

    I am writing a report for my boss quoting the success rates for tests of various components. If something works 19 times out of twenty, then it's 95%. But what is the uncertainty on this? 95% +/- ? And if a component passes every test (100%), what is the lower limit on the actual rate? How many...
  8. E

    One Variable Statistics Homework Question

    Homework Statement A set of eight numbers has a median of 19. a) What is the sum of the fourth and fifth data points b) What is the sum of the fourth and sixth data points (no answers available) Homework Equations (N+1)/2 -- Position of the median (N+1)/4 -- Position of Q1 3(N+1)/4 --...
  9. Z

    Statistics Bernoulli single-server queuing process with ATMS

    Homework Statement Customers arrive at an ATM at a rate of 12 per hour and spend 2 minutes using it, on average. Model this system using a Bernoulli single-server queuing process with 1-minute frames. a. Compute the transition probability matrix for the system. b. If the ATM is idle now, find...
  10. Z

    Statistics Bernoulli single-server queuing process

    Homework Statement [/B]Suppose your office telephone has two lines, allowing you to talk with someone and have at most one other person on hold. You receive 10 calls per hour and a conversation takes 2 minutes, on average. Use a Bernoulli single-server queuing process with limited capacity and...
  11. throneoo

    I Defining exchange statistics of anyons in terms of Berry phase

    In 2D, if we define exchange statistics in terms of the phase change of the wavefunction of two identical particles when there are exchanged via adiabatic transport (https://arxiv.org/abs/1610.09260), we would discover that this phase can be arbitrary due to the topology of relative...
  12. R

    Marginal pdf, what am I doing wrong?

    Homework Statement f(xy)=49/8*e^(−3.5*y) 0 < y < inf and −y < x < y 0 otherwise a. Find the marginal probability density function of X, fX(x). Enter a formula in the first box, and a number for the second and the third box corresponding to the range of x. Use * for multiplication, / for...
  13. F

    Why is it necessary to assume equal variances in two sample t tests?

    Homework Statement My question is why is the assumption necessary to make? (Please see the image). Homework EquationsThe Attempt at a Solution We can easily proceed by treating the two samples as two different population, find their individual unbiased estimate of variance and then use the...
  14. T

    I Is this null hypothesis wrong?

    'In inferential statistics, the term "null hypothesis" is a general statement or default position that there is no relationship between two measured phenomena, or no association among groups.' (wiki) The book I'm following has to say : Q: In a nutritional study 13 students were given a usual...
  15. T

    I Is the Definition of Unpaired t-test in My Book Correct?

    I was reading a book that said: Unpaired t-test is applied to unpaired data of independent observations made on individuals of two different groups (of a single sample) or samples drawn from two populations. Now what wiki says is that they are not unpaired the e.g. given is one with 50 and 50...
  16. S

    I Are Sensor Failures Matching Manufacturer Claims?

    (Sorry for the terrible title. If anybody have a better idea, post and I will edit. Also I have no idea of the level so now I just put undergraduate since the problem is fairly easy to state.) Suppose I buy ## N## sensors which the manufacturer tells me will fail at some point and the failure...
  17. G

    I Commuting observables vs. exchanging measurements

    Hi. I'm afraid I might just be discovering quite a big misunderstanding of mine concerning the meaning of the expectation value of a commutator for a given state. I somehow thought that if the expectation value of the commutator of two observables ##A, B## is zero for a given state...
  18. R

    Statistics probability questions

    Homework Statement Each week, Stéphane needs to prepare 4 exercises for the following week's homework assignment. The number of problems he creates in a week follows a Poisson distribution with mean 6.9. a. What is the probability that Stéphane manages to create enough exercises for the...
  19. boneh3ad

    Prob/Stats Statistics textbooks covering moments/cumulants

    I've been looking into time series analysis from a statistical perspective (looking to expand my bag of tools in analyzing experimental data) and I repeatedly run into the concept of moment and cumulant spectra. The problem is that my undergraduate course on statistics back in the day...
  20. E

    A Comparative statistics of (trivariate) random event

    Problem: I'm interested in studying the probability of an event involving a random vector. Specifically, I'm interested in (∂/∂a)Pr[X>( (Y-a)/Z )] Where "a" is a non-random parameter and the random vector {X,Y,Z} is distributed Normal( µ, Σ) for µ={0,0,0} and Σ= {{1, 0.5, 0.5}, {0.5, 1, 0}...
  21. Michael27

    Statistics on social media bot use and detection

    I am currently doing some research on the impact of social media bots. I am looking into the social and cultural aspects of bot use in social media. The technology I have under wraps but I do not have much information on the social aspect and impact of bot use. I also like to find some...
  22. jdawg

    Statistics: Pattern present in dice data?

    Homework Statement I generated data for a dice experiment. For the first case, two dies were rolled and the minimum number and the sum were recorded. For the other cases with three, four, and five dies, the minimum and sum were also recorded. I attached a picture of the tables with my data and...
  23. S

    A Correlation coefficient among trends

    Hi all, in several vehicles, I measured the engine torque and speed and the engaged gear while it was driven for around 100km/h. I computed the average engine speed and torque of all the times the vehicle was run with each gear and also I computed the relative frequency of the gear used. So for...
  24. A

    Job Skills Masters in Statistics vs in data science? Is DS just buzz?

    which do you think is the smarter choice, in terms of employ-ability: ms in statistics or Ms in data science? do you think "data science" is just a buzz words that will die out? is a data scientist someone who can't program as well as the computer engineer, and can't build models as well as a...
  25. E

    Sparsity of Support vector machines over an RKHS

    Im trying to solve the following problem from the book 'Learning with kernels', and would really appreciate a little help. Background information - Let $\{(x_{1},y_{1}),...,(x_{N},y_{N})\}$ be a dataset, L a Loss function and $H(k)$ a reproducing kernel Hilbert space with kernel $k$. The...
  26. bm1125

    Programs Majoring in Statistics: Seeking Advice

    Hey I'm currently studying psychology and the statistics course really opened my mind. Everything seems intuitive and reasonable and I really want to get deeper into this subject. Not sure if I should just jump into the water and major also in statistics or first just try and take another...
  27. Sunil Simha

    I Computing uncertainties in histogram bin counts

    I am working on astrophysical data and I have a large number of redshift values of quasars. Now, each redshift estimate comes with its estimated standard error naturally. If I plot a histogram of these redshifts, I would expect the bins counts to also have some sort of uncertainty. I am unable...
  28. Jules Winnfield

    I How do I compare a model to logarithmic data?

    I have a model which is quadratic (e.g. ##y = k x^2##). I'm comparing it against a large set of data (galaxy cluster masses) which spans several Log10 decades (e.g. ##10^{11}## to ##10^{15}## solar masses). What is the right way to say how good the data fits the model? Obviously the errors in...
  29. L

    I The statistics of 'psychic challenge'

    This is a problem that I thought I 'solved' many years ago. In actual fact there are many things about it that are not clear to me, and I would like to hear your opinion, please. Very briefly, there was this TV programme where a (supposedly psychic) guy had to match 5 (husband-wife) couples...
  30. iikii

    Computer Server Down Probability

    So the problem asks: A computer server runs smoothly for Exp(0.2) days and then takes Exp(0.5)days to fix. The server is running fine on Monday morning, t=0. Find the probability that the server was fixed at least once (i.e. at least one complete repair was done) in the next 7 days and the...
  31. T

    Courses Discrete Structures or Probability and Statistics Engineers

    Hi, I was wondering which course will be more beneficial to take first for a first year student majoring in computer science? Note: I tend to dislike proofs and theories. Discrete Structures - An introduction to the basic concepts of statistical analysis with special emphasis on engineering...
  32. S

    Programs MS in Statistics or Data-mining

    Hello, So i'am really facing a hard time deciding weather to choose a Ms in Statistics or data-mining (please bear with my english as it's not my first language) a little bit about my background : .)a bachelor of science in applied mathematics and computer science .)good Gpa .)like...
  33. A

    Cheap or free statistics software to do the basic stuff

    I am looking for a software that enable me to do the basic data analysis like ANOVA, regression, factor analysis and to do nice graphing. I need something similar to SPSS with data entry in the form of variables no coding input like SAS.
  34. bananabandana

    Percolation Problem Homework: Probability of Cluster Size s

    Homework Statement We have a 1-D lattice [a line] of ##L## sites. Sites are occupied with probability ##p##. Find the probability that a given site is a member of a cluster of size ##s##. (A cluster is a set of adjacent occupied sites. The cluster size is the number of occupied sites in the...
  35. TLeit

    Probability or Mathematical Statistics?

    I am a Mathematics and Chemistry major with a Physics minor. I need to take one more mathematics elective course next semester. I had two picked out but both unfortunately overlap with other classes I am taking, so I am now trying to choose between Probability or Mathematical Statistics (course...
  36. FallenApple

    Physics Areas of physics that uses Statistics

    My background is applied mathematics and statistics. A lot of the problems that I have been applying statistical methods towards have been very dry(medical studies, etc). Actually my favorite subject is physics, but I don't have too much experience in it. I am willing to learn. Any suggestions...
  37. E

    Calculating Standard Error of Mean with Significant Figures

    Homework Statement In a physics lab, Logger Pro software generated statistical estimators such as the standard deviation σ = 0.04021 of a sample of size n = 29. Among other things, I must calculate the standard error of the mean σmean. My question is: Must σmean have four sig figs or two...
  38. A

    Job Skills Masters in "Analytics" vs Masters in "Statistics"?

    Hi all, I'm considering getting a masters in Analytics at U San Francisco: https://www.usfca.edu/arts-sciences/graduate-programs/analytics i'm interested in this because i would have met all the prereqs after i finish my undergrad, and also because i live in SF. However, the program costs $45k...
  39. K

    I Does the +/- 1 term in bosonic and fermionic statistics matter

    I am reading an articles introducing the Nobel Price on Bose-Einstein condensates from where I have further reading on Bosonic and Fermionic statistics on some texts. I know one of the mathematical difference is the +/- 1 term in the denominator of the distribution function as below ##f_{BE} =...
  40. K

    I About classical and quantum-mechanical statistics

    Hi all, I am reading an introduction on classical and quantum-mechanical statistics. The material considers a 4-particle system with discrete energy level 0E, 1E, 2E, 3E, 4E, 5E and 6E. It is said that the classical particle is indistinguishable but you can identify the different particle by...
  41. chi_rho

    I Why do we require conditions for the Poisson Distribution?

    Three conditions must be met in order for the Poisson Distribution to be used: 1) The average count rate is constant over time 2) The counts occurring are independent 3) The probability of 2 or more counts occurring in the interval $n$ is zero Simply, why must these conditions be met for valid...
  42. E

    A Stopping rule for quality control problem

    Problem: Suppose I have a production process that yields output in batches of n items. For each batch, I can test whether they are of good or bad quality. Let q_i ∈ {1,0} be the quality of tested item i. If more than half of the items are ‘bad’, the batch should be discarded. In other words...
  43. kuan9611

    Engineering Engineering Enrollment Statistics - Thoughts?

    Hi all! I'm an undergrad sophomore in engineering trying to decide on which major to pursue (namely civil/mechanical). Recently I came across this fascinating piece of publication, which contained a massive amount of data regarding engineering enrollments by major, demographic, and school...
  44. J

    MHB Statistics Normal Distribution

    Eleanor scores 680 on the mathematics part of the SAT. The distribution of SAT math scores in recent years has been Normal with mean 547 and standard deviation 85. Gerald takes the ACT Assessment mathematics test and scores 27. ACT math scores are Normally distributed with mean 21.3 and...
  45. M

    How Do You Convert Body Temperature from Celsius to Fahrenheit in Statistics?

    Homework Statement (1) Let the random variable X be the body temperature in ◦C for a randomly chosen person during waking hours. X is assumed to be a normally distributed with mean E(X) = 37.5 and standard deviation sd(X) = 0.3. Let Y be the body temperature in ◦F for a randomly chosen person...
  46. M

    Double Dice Probability: Event A and B in Sample Space | Solved

    Homework Statement In a probability experiment, a fair die is rolled twice. • If the first roll is odd, the outcomes are recorded as they appear. • If the first roll is even, the recorded outcome for the second die is doubled. For example, if the first die was 2 and the second 4, the...
  47. M

    Help with probability question

    Homework Statement (a) Suppose a fair six-sided die is rolled once. Let A be the event that an even face occurs and B be the event that a face less than 4 occurs. Are the events A and B independent? Show this mathematically. (b) A fair coin is tossed three times. Let A be the event that the...
  48. E

    Chemistry Level of statistics required in Process Chemistry and ChemE

    Question: So I hold a Bachelor's degree in Chemistry and have just started a M.S./Ph.D. track in Chemical Engineering. My dream job is in mineral processing (the biggest dream is with an asteroid mining company), but I understand that the school I'm going to is heavily focused on drug discovery...
  49. H

    [Statistics] Factorisation theorem proof

    Hello. I have a question about a step in the factorization theorem demonstration. 1. Homework Statement Here is the theorem (begins end of page 1), it is not my course but I have almost the same demonstration : http://math.arizona.edu/~jwatkins/sufficiency.pdf Screenshot of it: Homework...
  50. T

    A Qn on photon statistics (second order correlation function)

    I am trying to better understand the concept of second order coherence G2(τ) (in particular G2(0)) and a few questions have arisen. Note that I am trying to get a physical idea of what is happening so I would appreciate it if your responses can keep the math to the minimum possible. :) How do...
Back
Top