EM Method for censored data - Statistical Inference

In summary, the conversation discusses the use of censored data in a random sample and how it affects the likelihood and maximum likelihood estimation (MLE). The EM method is suggested as an alternative approach, but the individual is struggling with setting up the E step and M step. The conversation also mentions the memoryless property and the expectation of censored data, but the method of showing it is uncertain. The individual eventually arrives at an initial guess for the M step formula.
  • #1
DKOli
8
0
For censored data.

Random sample X1,...,Xn

Censored such that x1,...xm are observed but xm+1,...,xn are not - we just know they exceed T.

fx = exponential = theata exp(-theta.x)



L = ∏ (from 1 going to m) f(x;theta) ∏ (m+1 - n) 1 - F(T;theta)

Using F = int f I get

L = ∏∅exp(-∅x) ∏ exp(-∅T)

I can now work out the MLE but I want to use EM method.

Reading online I get that this censor (or right censor) would give E(X|X≥T) = T + 1/∅ and I get it but don't really know how to show it. I am not sure how to write the complete data likelihood or log-likelihood for this EM (im more used to mixed distributions or id just solve MLE).

I just don't really know how to set up the E step or M step. It should be quite trivial given what I know already but I just keep confusing myself with the whole

Q(∅,∅i) = E[l(∅;x1,...,xn)|∅i;x1,...,xm).

i have some intial data and then iterating using the M step should also be trivial, I am just falling down at the one of the first hurdles.

Thanks in advance.
 
Physics news on Phys.org
  • #2
DKOli said:
...Reading online I get that this censor (or right censor) would give E(X|X≥T) = T + 1/∅ and I get it but don't really know how to show it. I am not sure how to write the complete data likelihood or log-likelihood for this EM (im more used to mixed distributions or id just solve MLE)...

Hint: write X_{m+1} = T + Y_{m+1} etc, where the Y_i are iid to X_1.
 
  • #3
So I can just say E(X) = 1/theta (from 1 - m, as its distribution is exponential) and write X_m+1 - X_N as T + Y_m+1 - T + Y_n where Y_i are iid to X_i (or was X_1 right, I assumed it was a typo) and thus the expectation of the censored data is simply T + 1/theta.

If I solve as MLE I would have l=mlog(theta) - mthetax - (n-m)thetaT,
but in terms of EM how would I write down the data log likelihood (in this case I would treat all x1-xn as observed).
 
  • #4
DKOli said:
So I can just say E(X) = 1/theta (from 1 - m, as its distribution is exponential)
no (also it's better to use "to" or ".." instead of "-" to indicate ranges)
and write X_m+1 - X_N as T + Y_m+1 - T + Y_n where Y_i are iid to X_i (or was X_1 right, I assumed it was a typo)
oops, actually the Y_i are iid to none of the X_i (since X_1, ..., X_m are restricted to the range [0,T), you'll need to include a normalizing factor in the distribution). The Y_i are exponential because of the memoryless property
and thus the expectation of the censored data is simply T + 1/theta.
yes but you don't need this fact yet
If I solve as MLE I would have l=mlog(theta) - mthetax - (n-m)thetaT,
but in terms of EM how would I write down the data log likelihood (in this case I would treat all x1-xn as observed).
no the log-likelihood includes random variables because of the unobserved data - this is why the E step is done.
 
  • #5
Right well Ill just call my complete data Z = (x1,...,xm,T)
where T=(xm+1,...,xn) are censored/unobserved.

Then my complete data log likelihood will just be:

l(x) = nlog(∅) - Ʃx∅ all sums go from n starting at i=1

Then given the memoryless property we have E[X|X>=T] = T + 1/∅ (which I am still unsure of how to show)

I get my E step to be:

Q(∅,∅i) = nlog(∅) - ∅(ƩT + (n-m)∅i)So my M step becomes:

∅i+1 = { ƩT + (n-m)∅i } / n
 
Last edited:
  • #6
^^^ this is wrong, should still be sum of x, but should also involve T. initial guess:

∅i+1 = { Ʃx + T + (n-m)∅i } / n ?
 

Related to EM Method for censored data - Statistical Inference

1. What is the EM method for censored data?

The EM (Expectation-Maximization) method is a statistical technique used to estimate the parameters of a probability distribution when some of the data is censored, meaning that the exact values are unknown but only fall within a certain range. It is commonly used in survival analysis and reliability studies.

2. How does the EM method work?

The EM method works by iteratively estimating the missing values in the censored data using a two-step process. In the first step (Expectation), the missing values are estimated based on the current values of the parameters. In the second step (Maximization), the estimated values are used to update the parameters. This process is repeated until the estimates converge to a stable solution.

3. What are the advantages of using the EM method for censored data?

The EM method allows for the inclusion of censored data in statistical analysis, which would otherwise be excluded or lead to biased results. It also provides more accurate parameter estimates compared to other methods, such as the complete case analysis or imputation methods.

4. What are some applications of the EM method for censored data?

The EM method has various applications in the fields of survival analysis, reliability studies, and epidemiology. It is commonly used to estimate the survival function, hazard function, and other parameters in time-to-event data, where censoring is often present.

5. Are there any limitations to the EM method for censored data?

The EM method assumes that the data follows a specific probability distribution, which may not always be the case in real-world scenarios. It also requires a large sample size to produce accurate estimates. Additionally, the convergence of the estimates may be slow or not possible if the initial values of the parameters are far from the true values.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
995
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
604
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
982
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
991
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
2
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
Back
Top