Instrumental Variable in Logistic Regression Model

In the last post "The Difference of Occupation Choice Among Graduates with Different Majors and Degrees", I built a logistic model with log of salary as one of the independent variable. But we will concern about the endogeneity of it, because the salary may related to other unincluded variables that may affect the probability of working at education institute, and moreover, as a current salary, the occupation itself may affect the salary. So some instruments are needed to solve the possible endogeneity problem in the regression analysis. In the dataset, a variable "satis" measuring the satisfaction of salary seems to be a good possible instrument. For this variable, 4 means very satisfied, 3 means somewhat satisfied, 2 means somewhat unsatisfied, and 1 means very unsatisfied.

Let's firstly take a look at the model without using IV. The dependent variable is whether working at education institutes, and independent variables are degree, major, number of years after graduation, gender, log of salary, and citizenship. There's no interaction term and weight in the model. The SAS code and results of estimation are as follows. We can find that the log of salary has significant negative relationship with the probability to work at education institutes.

proc logistic data = proj_jobchoice descending;

class degree (ref = '1') ndgmemg (ref = '1') citizen (ref = '1') gender / param = ref;

model jobedu = degree ndgmemg gradyr gender lnsalary citizen;

run;

IV Logistic

But what if we use a logistic model with the satisfaction of salary as a IV? In SAS, we have to take two steps to do the logistic model estimation with instrumental variables. First, we estimate a linear regression model (reduced model), with the endogenous variable as the dependent variable, and the IV and other variables as the independent variables. Then, we obtain the residual of the linear model, and put it into the logistic model (full model) as a new independent variable. As the proc reg is not able to deal with the categorical variables, we should use proc glm to run the linear model with categorical variables. The output statement is used to output the result data, prediction values (p = ), and residual (residual = ). The SAS code and estimation results are as follows.

proc glm data = proj_jobchoice;

class degree ndgmemg citizen gender;

model lnsalary = satis degree ndgmemg gradyr gender citizen / solution;

output out = temp residual = vhat;

run;

proc logistic data = temp;

class degree (ref = '1') ndgmemg (ref = '1') citizen (ref = '1') gender / param = ref;

model jobedu = degree ndgmemg gradyr gender lnsalary vhat citizen;

run;

2SLS

Then how to use 2SLS to run the IV logistic model? According to the two stages of 2SLS, we firstly need to estimate a linear model with endogenous variables as dependent variable, and IV and other variables as independent variables. After obtaining the predicted values of dependent variable of linear model, we replace the endogenous variable with the predicted values as the independent variable of the second stage logistic model. The SAS code and estimation results are as follows.

proc glm data = proj_jobchoice;

class degree ndgmemg citizen gender;

model lnsalary = satis degree ndgmemg gradyr gender citizen / solution;

output out = temp p = xhat residual = vhat;

run;

proc logistic data = temp;

class degree (ref = '1') ndgmemg (ref = '1') citizen (ref = '1') gender / param = ref;

model jobedu = degree ndgmemg gradyr gender xhat citizen;

run;

Jason's Blog

Search This Blog

Instrumental Variable in Logistic Regression Model

Labels

Comments

Post a Comment

Popular posts from this blog

Weighted Percentile in Python Pandas

Rcpp Example: Partition Based Selection Algorithm

Trend Removal Using the Hodrick-Prescott (HP) Filter