Can Loan Default Probability Be Predicted from Personal Attributes?

ghostyc · Jun 3, 2011

The data here are concerned with whether people default on a loan taken from a particular bank and for identical interest rates and for a fixed period. The information on each individual is their sex (male of female); their income (in pounds), whether the person is a home owner or not, their age (in years), and the amount of the loan (in pounds).

The information recorded is whether the individal defaulted on the loan or not. Study the data and try and understand a relation between the persons characteristics and defaulting. Specifically, what is your estimated probability that a female aged 42, who is not a home owner, has an income of 23,500, and took a loan of 12,000, defaults on the loan?

The table holding the data have headings as follows:

m/f: male=1, female=0
age: age in years
home: home=1 is a home owner, home=0 is not a home owner
inc: income
loan: amount of loan
def: default=1, non-default=0.

Dataset is given in file "tabl3.txt".

I know it has something to do with Binary response and probably we should use GLM to model it. However, I kind of stuck with it. My problem is that I can do identify which variable is the response in this case.

If I use
[tex]
\log \frac{p}{1-p}=default = sex+age+income+home+loan
[/tex]
with a logit link. It just dese not make sense to me. Because "default" take on 1 or 0. So the predicted value then takes 1 or 0.

Any suggestions?

For those who knows R, here is my code.

Code:

    Q3=read.table("tabl3.txt")
    colnames(Q3)=c("Sex","Age","Home","Inc","Loan","Def")
    Q3$Sex=as.factor(Q3$Sex)
    Q3$Home=as.factor(Q3$Home)
    Q3$Def=as.factor(Q3$Def)
    summary(Q3)

Thanks!

mmwave · Jun 3, 2011

Hello,

Thank you for sharing your thoughts on this forum post. It is great that you are considering using a GLM to model this data. However, you are correct that the response variable in this case is not clear. Since the goal is to understand the relationship between a person's characteristics and their likelihood of defaulting on a loan, we can consider the "def" variable as the response.

To start, you can try fitting a logistic regression model using the glm() function in R. The formula would look like this:

glm(Def ~ Sex + Age + Home + Inc + Loan, data = Q3, family = binomial)

This model will give you the estimated coefficients for each variable, which can help us understand the relationship between the characteristics and defaulting. You can also use the predict() function to calculate the estimated probability of default for a specific individual, such as a female aged 42, who is not a home owner, has an income of 23,500, and took a loan of 12,000.

I hope this helps. Let me know if you have any further questions. Best of luck with your analysis!

Can Loan Default Probability Be Predicted from Personal Attributes?

Attachments

Related to Can Loan Default Probability Be Predicted from Personal Attributes?

1. What is a binary response GLM model?

2. What types of data are suitable for a binary response GLM model?

3. How does a binary response GLM differ from a traditional linear regression model?

4. What is the purpose of using a binary response GLM model?

5. How do you interpret the results of a binary response GLM model?

Hot Threads

Recent Insights