Conceptual Problems with Random Variables and Sample Theory

In summary: P(X=2) but if X really is a constant function then that set is either the whole sample space S (if 2 is in the range of X) or it's an empty set (if 2 isn't in the range of X) and so P(X=2) is either 1 or 0.
  • #1
siddharth5129
94
3
Hi
I'm having a few conceptual difficulties with random variables and I was hoping someone could clear up a few things for me:

1) Firstly, what exactly do we mean when we say that two random variables X and Y are equal. I understand what identically distributed means, but my difficulty is with equality.
My professor says that equality of X and Y means that for every outcome ω in the sample space, X(ω) = Y(ω). Now, if these variable are continuously distributed, isn't it also true that P(X=Y) = 0 and that P( X and Y ∈ (a,b)) < 1 for (a,b) ⊂ ℝ . I don't see any inconsistency here, but it seems off. Is this really the definition of equality?

2) Also, I'm not entirely sure what it means to add two random variables. Can I go with the above and say that Z = X + Y if for every outcome ω of the sample space, Z(ω) = X(ω) + Y(ω).

3) My final conceptual difficulty is with the large sample theory. Why do we look at N observations in a population as N random variables in their own right and not as N instances of a single random variable ( which is what they intuitively seem to be ) Is this just a convenient starting point or is their a solid rationale behind it ? Surely the random variable is random and variable across the population studied, not for every individual. Does it make sense, for example, to talk about disease frequency being a random variable in the context of a single person ?

I'd appreciate any sort of clarification. Thanks :)
 
Physics news on Phys.org
  • #2
siddharth5129 said:
Hi
I'm having a few conceptual difficulties with random variables and I was hoping someone could clear up a few things for me:

1) Firstly, what exactly do we mean when we say that two random variables X and Y are equal. I understand what identically distributed means, but my difficulty is with equality.
My professor says that equality of X and Y means that for every outcome ω in the sample space, X(ω) = Y(ω). Now, if these variable are continuously distributed, isn't it also true that P(X=Y) = 0 and that P( X and Y ∈ (a,b)) < 1 for (a,b) ⊂ ℝ . I don't see any inconsistency here, but it seems off. Is this really the definition of equality?
If the variables X and Y describe the same thing, they are equal. For example, X(w) might be a binary variable like "is male". X(w) is 1 if w is a male and 0 otherwise. Y equals X by the definition above if Y(w) also equals 1 if and only if w is a male, even if the description of Y might be different.

P(X=Y) would equal zero only if the two are both independent and continuously distributed.

2) Also, I'm not entirely sure what it means to add two random variables. Can I go with the above and say that Z = X + Y if for every outcome ω of the sample space, Z(ω) = X(ω) + Y(ω).
That looks right.
3) My final conceptual difficulty is with the large sample theory. Why do we look at N observations in a population as N random variables in their own right and not as N instances of a single random variable ( which is what they intuitively seem to be ) Is this just a convenient starting point or is their a solid rationale behind it ? Surely the random variable is random and variable across the population studied, not for every individual. Does it make sense, for example, to talk about disease frequency being a random variable in the context of a single person ?

I'd appreciate any sort of clarification. Thanks :)
You probably wouldn't discuss frequency of disease in a single person, but probability of a random person having a disease seems fair. Then, if you select 10 random people, you have selected 10 random (binary) variables. Each one being p% likely to have the disease. The frequency is the observed proportion, which you would use to make inferences about what the true probability is within the population at large.
 
  • #3
siddharth5129 said:
isn't it also true that P(X=Y) = 0 and that P( X and Y ∈ (a,b)) < 1 for (a,b) ⊂ ℝ .

The notation P(X=Y) is ambiguous. It would be unusual to interpret it to mean "The probability that X(w) = Y(w) for each w in the sample space S". For that to make sense, you'd need to considering a different sample space than S. You'd be considering a sample space where the event (X=Y) is defined by "we pick two random variables at random and find they are equal as random variables". Using that interpretation, you can't say if P(X=Y)= 0 without more information.

You might be thinking of a situation where we are given that X=Y ( as random variables) and we sample two possibly different outcomes w1 and w2 from the space S and define the event "X=Y" to mean X(w1) = Y(w2).

In that case, even given that X and Y are continuous random variables we can't say conclude P(X=Y)=0 unless we have more information - for example information about the joint distribution of w1 and w2. Perhaps you are thinking of a special situation - such as letting w1 and w2 be two independent random samples (i.e. single numbers) taken from a normal distribution.

To be clear, notation has to distinguish between "a random variable" and "a realization of a random variable", but it's common to be careless about notation and leave it to the reader to figure things out. For example, if X is a random variable then the notation "X=2" would literally say "X is the constant function X(w) = 2 for each outcome w" But what most people mean by "X=2" when used inside "P(X=2)" is "The set of all outcomes w such that X(w) = 2"
 

Related to Conceptual Problems with Random Variables and Sample Theory

1. What are random variables?

Random variables are mathematical quantities that take on different values based on the outcome of a random event. They are used to represent uncertain or unknown quantities in a system.

2. What is sample theory?

Sample theory is a branch of statistics that deals with the study of a subset of individuals or objects from a larger population. It involves the collection, analysis, and interpretation of data from a sample in order to make inferences about the entire population.

3. What are some conceptual problems with random variables?

Some conceptual problems with random variables include the difficulty in accurately modeling complex real-world situations, the limitation of assuming a random event has only two outcomes, and the challenge of assigning probabilities to events that may not have a clear definition.

4. How do you calculate the expected value of a random variable?

The expected value of a random variable is calculated by multiplying each possible value of the variable by its probability, and then summing up all of these products. This provides an estimate of the average value of the random variable over a large number of trials.

5. What is the Central Limit Theorem?

The Central Limit Theorem states that as the sample size increases, the distribution of the sample means will approach a normal distribution, regardless of the underlying distribution of the population. This theorem is fundamental in understanding the behavior of sample means and allows for the use of statistical methods to make inferences about a population based on a sample.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
662
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
632
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
708
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
299
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
936
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
Back
Top