PROC GLM Example: Linear Regression with Dummies

Absenteeism.

Data on 77 employees of the ABX Company have been collected. The dependent variable is absenteeism (ABSENT). The possible explanatory variables are

COMPLX = measure of job complexity
SENIOR = seniority
SATIS = response to "How satisfied are you with your foreman?"

In this example, use SENINV = 1/SENIOR, which is the reciprocal of the seniority variable, and COMPLX as two of the explanatory variables. The variable SATIS should be transformed into indicator variables (1 is very dissatisfied, 2 is somewhat dissatisfied, 3 is neither satisfied nor dissatisfied, 4 is somewhat satisfied, 5 is very satisfied).

Q1. Is there a difference in average absenteeism for employees in different supervisor satisfaction groups?

Q2. Using the model chosen, what would be your estimate of the average absenteeism rate for all employees with COMPLX = 60 and SENIOR = 30 who were very dissatisfied with their supervisor? What if they were very satisfied with their supervisor but COMPLX and SENIOR were the same values?

Solution:

As Q2 is asked to estimate new observations, before DATA step, we use PROC SQL to insert two new observations. DATA step is used to generate the new variable SENINV. And the we can use PROC GLM to estimate the model.

proc sql;
insert into ABSENT7 (SENIOR,COMPLX,SATIS)
values (30,60,1)
values (30,60,5);

data ABSENT7;
set ABSENT7;
SENINV = 1/SENIOR;
run;

proc glm data = ABSENT7;
class SATIS;
model ABSENT = SENINV COMPLX SATIS / solution ss3 clparm clm;
lsmeans SATIS / cl;
run;

In PROC GLM, "solution" will report the estimate of parameters for all explanatory variables including dummies; "ss3" reports the Type III SS and F test for the mean differences of dummies; "clparm" reports the confidence intervals of parameter estimators; "clm" reports the confidence intervals of predicted responses; and "lsmeans ... /cl" reports the mean responses for each dummy level and their confidence intervals.

Comments

Weighted Percentile in Python Pandas

Unfortunately, there is no weighted built-in functions in Python. If we want to get some weighted percentiles by Python, one possible method is to extend the list of data, letting the values of weight as the numbers of elements, which is discussed in a Stack Overflow poster . For example, if we have a data like, score weight 5 2 4 3 2 4 8 1 we firstly extend the list of scores to {5, 5, 4, 4, 4, 2, 2, 2, 2, 8}, and then find the percentiles such as 10% or 50% percentile. The limitations of this method are, (1) weight must be integers; (2) values of weight cannot be very large. What if we want to calculate the weighted percentiles of a large dataset with very large non-integer weights? In this article, I want to show you an alternative method, under Python pandas. step1: given percentile q, (0<=q<=1), calculate p = q * sum of wei...

Rcpp Example: Partition Based Selection Algorithm

In this post, I'm going to take a Rcpp example that call a C++ function to find kth smallest element from an array. A partition-based selection algorithm could be used for implementation. A most basic partition-based selection algorithm, quickselect , is able to achieve linear performance to find the kth element in an unordered list. Quickselect is a variant of quicksort , both of which choose a pivot and then partitions the data by it. The procedure of quickselect is to firstly move all elements smaller than the pivot to the left and what greater than the pivot the the right by exchanging the location of them, given a pivot such as the last element in the list; and then to move the elements in the left or right sublist again according to a new pivot until getting exact kth elements. The difference from quicksort is that quickselect only need to recurses on one side where the desired kth element is, instead of recursing on both sides of the partition which is what quicksort ...

Trend Removal Using the Hodrick-Prescott (HP) Filter

Hodrick-Prescott filter (see Hodrick and Prescott (1997)) is a popular tool in macroeconomics for fitting smooth trend to time series. In SAS, we can use PROC UCM to realize the HP filter. The dataset considered in this example consists of quarterly real GDP for the United States from 1947-2016 (b illions of chained 2009 dollars , seasonally adjusted annual rate ). The data can be download from this link https://fred.stlouisfed.org/series/GDPC1 %macro hp(input= ,date= ,int= ,var= ,par= ,out= ); proc ucm data=&input; id &date interval=∫ model &var; irregular plot=smooth; level var= 0 noest plot=smooth; slope var=&par noest; estimate PROFILE; forecast plot=(decomp) outfor=&out; run; %mend ; % hp (input=gdp,date=year,int=qtr,var=gdp,par= 0.000625 ,out=result); I use SAS MACROS to define a function for HP filter. "input" is the data file you use, "date" is the variable for time, "int...

Jason's Blog

Search This Blog