Skip to main content

Posts

Showing posts from March, 2017

Weighted Percentile in Python Pandas

Unfortunately, there is no weighted built-in functions in Python. If we want to get some weighted percentiles by Python, one possible method is to extend the list of data, letting the values of weight as the numbers of elements, which is discussed in a Stack Overflow poster . For example, if we have a data like, score   weight 5          2 4          3 2          4 8          1 we firstly extend the list of scores to {5, 5, 4, 4, 4, 2, 2, 2, 2, 8}, and then find the percentiles such as 10% or 50% percentile. The limitations of this method are, (1) weight must be integers; (2) values of weight cannot be very large. What if we want to calculate the weighted percentiles of a large dataset with very large non-integer weights? In this article, I want to show you an alternative method, under Python pandas. step1: given percentile q, (0<=q<=1), calculate p = q * sum of weights; step2: sort the data according the column we want to calculate the weighted percentile thereof;

Instrumental Variable in Logistic Regression Model

In the last post "The Difference of Occupation Choice Among Graduates with Different Majors and Degrees", I built a logistic model with log of salary as one of the independent variable. But we will concern about the endogeneity of it, because the salary may related to other unincluded variables that may affect the probability of working at education institute, and moreover, as a current salary, the occupation itself may affect the salary. So some instruments are needed to solve the possible endogeneity problem in the regression analysis. In the dataset, a variable "satis" measuring the satisfaction of salary seems to be a good possible instrument. For this variable, 4 means very satisfied, 3 means somewhat satisfied, 2 means somewhat unsatisfied, and 1 means very unsatisfied. Let's firstly take a look at the model without using IV. The dependent variable is whether working at education institutes, and independent variables are degree, major, number of years af

The Difference of Occupation Choice Among Graduates with Different Majors and Degrees

The percentages of graduates who get work in educational institutes, industry, or government are different among them with different majors and degrees. This article tries to take a look at how the majors and degrees, and other factors may affect people's occupation choices. The data is from 2013 National Survey of Graduates  with totally 104599 observations and 515 variables. A subdata is extracted with 87145 observations who had graduated before 2013 and currently had jobs during the survey reference month (Feb 2013). There are 9 interested variables: Variable in the Raw Data Description Variable Used in Regression Models Catogories emsecsm Employer sector: 1 = education institute, 2 = government, 3 = industry job_edu 1 = education institute, 0 = others dgrdg  Degree: 1 = bachelor, 2 = master, 3 = PhD, 4 = professional degree 1 = bachelor, 2 = master, 3 = PhD or professional