Skip to main content

The Difference of Occupation Choice Among Graduates with Different Majors and Degrees

The percentages of graduates who get work in educational institutes, industry, or government are different among them with different majors and degrees. This article tries to take a look at how the majors and degrees, and other factors may affect people's occupation choices.

The data is from 2013 National Survey of Graduates with totally 104599 observations and 515 variables. A subdata is extracted with 87145 observations who had graduated before 2013 and currently had jobs during the survey reference month (Feb 2013). There are 9 interested variables:


Variable in the Raw Data
Description
Variable Used in Regression Models
Catogories
emsecsm
Employer sector:
1 = education institute,
2 = government,
3 = industry
job_edu
1 = education institute,
0 = others
dgrdg 
Degree:
1 = bachelor,
2 = master,
3 = PhD,
4 = professional
degree
1 = bachelor,
2 = master,
3 = PhD or professional
ndgmemg 
Major:
1 = Computer and mathematical sciences
2 = Biological, agricultural and environmental life sciences
3 = Physical and related sciences
4 = Social and related sciences
5 = Engineering
6 = S and E-Related Fields
7 = Non-S and E Fields
(same)

dgryr
Year of graduation
gradyr
 number of years after graduation:
2013 - dgryr
gender 
Gender
(same)

salary 
Salary
ln_salary
log of salary
ctzn 
Citizenship:
1 = US citizen, native,
2 = US citizen, naturalized,
3 = permanent resident,
4 = temporary resident
citizen
1 = US citizen,
2 = permanent resident,
3 = temporary resident
satsal 
Satisfaction of Salary:
1 = very satisfied,
2 = somewhat satisfied,
3 = somewhat dissatisfied,
4 = very dissatisfied
satis
4 = very satisfied,
3 = somewhat satisfied,
2 = somewhat dissatisfied,
1 = very dissatisfied
wtsurvy
 Weight
(same)


The table below shows the percentage of graduates who are working in industry or education institutes for different majors and degrees. Graduates with PhD degrees are more likely to work in education institutes, and most graduates with master or bachelor degrees are working in industry.


Education Institute Industry
Bachelor Master PhD Bachelor Master PhD
Computer & Math 0.143 0.218 0.476 0.771 0.703 0.451
Biology, Agriculture, & Environment 0.199 0.320 0.499 0.624 0.478 0.386
Physics 0.192 0.327 0.406 0.662 0.541 0.502
Social Science 0.162 0.320 0.491 0.690 0.492 0.398
Engineering 0.043 0.095 0.269 0.837 0.783 0.648
Science & Engineering Related 0.164 0.294 0.266 0.753 0.602 0.658
Non Science & Engineering 0.177 0.359 0.351 0.691 0.535 0.511

The results of this table above can be exported by SAS proc sql as follows:

proc sql;
create table job_major_degree as
select degree,
       ndgmemg,
       mean(emsecsm = '1') as Education,
       mean(emsecsm = '2') as Government,
       mean(emsecsm = '3') as Industry
from proj_jobchoice
group by ndgmemg,degree;
quit;

proc transpose data = job_major_degree out = job_major_degree2;
by ndgmemg;
id degree;
var Education;
run;

proc transpose data = job_major_degree out = job_major_degree3;
by ndgmemg;
id degree;
var Industry;

run;

The graph below shows the percentage of graduates who are working in industry, education institute, or government, against their year of graduation. We can find that much higher percentage of people who graduated in last 5 years (graduated after 2008) are working in education institute. Maybe it could be explained by the tenure track. Some faculties may switch their jobs to industry during 5 to 6 years of tenure track period.
The SAS code for the data output and graph is as follows. "Proc sgplot" is used to generate this line plot.

proc sql;
create table job_grad as
select dgryr,
       mean(emsecsm = '1') as Education,
       mean(emsecsm = '2') as Government,
       mean(emsecsm = '3') as Industry
from proj_jobchoice
group by dgryr;
quit;

proc sgplot data=job_grad (where=(1959<dgryr<2012));
title "Occupation of Graduates";
series x=dgryr y=Education / lineattrs = (thickness = 2);
series x=dgryr y=Government / lineattrs = (thickness = 2 pattern = 2);
series x=dgryr y=Industry / lineattrs = (thickness = 2 pattern = 4);
xaxis label = 'Year of Graduation';
yaxis label = 'Percentage';

run;

Then I want to construct a logistic model to study on the relationship between occupation sector and degree, major, or other factors. The dependent variable is whether working at education institute or not, "job_edu". And there are 6 independent variables in the model: degree, major, (interaction between degree and major), number of years after graduation, gender, log of salary, and citizenship. The SAS code is as follows. "plots = " statement is used to generate estimation plots such as ROC curve and confidence interval plot, "ctable" statement with "pprob = " is used to generate classification table with specific cutoffs, "lackfit" statement is used to do the lack of fit test.

ods graphics on;
proc logistic data = proj_jobchoice descending plots=all;
class degree (ref = '1') ndgmemg (ref = '1') citizen (ref = '1') gender / param = ref;
model jobedu = degree|ndgmemg gradyr gender lnsalary citizen / ctable pprob = 0.4 0.5 0.6 lackfit;
weight wtsurvy;
run;

ods graphics off;

I posted parts of results below, including estimation result, classification table, and ROC curve. From the estimation table below, we can find that all these 6 independent variables have significant relationship with the probability to work at education institute. PhDs are more likely to work at education institutes, the engineering students are most unlikely to work at education institutes, women are more likely to work at education institutes, foreigners with temporary visa are more likely to work at education institutes but who with permanent residence visa are more unlikely to work at education institutes, people who graduated earlier and earned higher salary are more unlikely to work at education institutes.



Comments

Popular posts from this blog

Weighted Percentile in Python Pandas

Unfortunately, there is no weighted built-in functions in Python. If we want to get some weighted percentiles by Python, one possible method is to extend the list of data, letting the values of weight as the numbers of elements, which is discussed in a Stack Overflow poster . For example, if we have a data like, score   weight 5          2 4          3 2          4 8          1 we firstly extend the list of scores to {5, 5, 4, 4, 4, 2, 2, 2, 2, 8}, and then find the percentiles such as 10% or 50% percentile. The limitations of this method are, (1) weight must be integers; (2) values of weight cannot be very large. What if we want to calculate the weighted percentiles of a large dataset with very large non-integer weights? In this article, I want to show you an alternative method, under Python pandas. step1: given percentile q, (0<=q<=1), calculate p = q * sum of weights; step2: sort the data according the column we want to calculate the weighted percentile thereof;

Rcpp Example: Partition Based Selection Algorithm

In this post, I'm going to take a Rcpp example that call a C++ function to find kth smallest element from an array. A partition-based selection algorithm could be used for implementation. A most basic partition-based selection algorithm, quickselect , is able to achieve linear performance to find the kth element in an unordered list. Quickselect is a variant of quicksort , both of which choose a pivot and then partitions the data by it. The procedure of quickselect is to firstly move all elements smaller than the pivot to the left and what greater than the pivot the the right by exchanging the location of them, given a pivot such as the last element in the list; and then to move the elements in the left or right sublist again according to a new pivot until getting exact kth elements. The difference from quicksort is that quickselect only need to recurses on one side where the desired kth element is, instead of recursing on both sides of the partition which is what quicksort

Trend Removal Using the Hodrick-Prescott (HP) Filter

Hodrick-Prescott filter (see Hodrick and Prescott (1997)) is a popular tool in macroeconomics for fitting smooth trend to time series. In SAS, we can use PROC UCM to realize the HP filter.  The dataset considered in this example consists of quarterly real GDP for the United States from 1947-2016  (b illions of chained 2009 dollars ,  seasonally adjusted annual rate ). The data can be download from this link  https://fred.stlouisfed.org/series/GDPC1   %macro hp(input= ,date= ,int= ,var= ,par= ,out= ); proc ucm data=&input; id &date interval=&int; model &var; irregular plot=smooth; level var= 0 noest plot=smooth; slope var=&par noest; estimate PROFILE; forecast plot=(decomp) outfor=&out; run; %mend ; % hp (input=gdp,date=year,int=qtr,var=gdp,par= 0.000625 ,out=result); I use SAS MACROS to define a function for HP filter. "input" is the data file you use, "date" is the variable for time, "int&qu