See attached data file.
2. Sex discrimination. The dataset salary.dat contains salaries and other characteristics of all faculty
members of a small college. The data were collected for presentation in legal proceedings in which
discrimination against women in salary was at issue. All faculty members represented in the dataset
hold tenured or tenure-track positions; temporary faculty are not included. The data were collected
from personnel files and consist of the following:
SX = Sex, coded 1 for female and 0 for male
RK = Rank, coded 1 for Assistant Professor, 2 for Associate Professor and 3 for Full Professor
YR = Number of years in current rank
DG = Highest degree, coded 1 if Doctorate, 0 if Masters
YD = Number of years since highest degree was earned
SL = Academic year salary in dollars
In this problem, treat the rank variable RK as categorical. That is, replace the rank variable by two
binary dummy variables to account for the different rank levels. For example, you might introduce
one variable indicating non-tenured (Assistant Professor) or tenured (Associate/Full Professor) ranks
and one variable indicating Full Professor rank. (Compare the example of echo-locating and non-echolocating
birds and bats discussed in class.)
(a) Test the hypothesis that salary adjusted for years in current rank, highest degree, and years since
highest degree is the same for each of the three ranks.
(b) Fit a linear model that predicts salary given all other variables, including the categorical dummy
variables. Examine the residuals. Explain the need to transform the response, salary, to some other
scale, and suggest an appropriate transformation.
(c) After transforming the response, examine the 'new' residuals. Comment on their appearance and
the adequacy of the regression model.
(d) Test the hypothesis that salary adjusted for rank, years in current rank, highest degree, and years
since highest degree is the same for men and women. Summarize your findings so far in a fashion
that might be useful in court.
(e) Finkelstein (1980), in a discussion of the use of regression in discrimination cases, wrote that
a "variable may reflect a position or status bestowed by the employer, in which case if there is
discrimination in the award of the position or status, the variable may be 'tainted'." For example,
if there is discrimination in the promotion of faculty to higher ranks, using rank to adjust salaries
before comparing the sexes may not be acceptable to the courts. Fit a model similar to that in parts
(c) and (d) to the data, but without adjusting for the effects of rank. Summarize and compare the
results of leaving out rank effects on inferences concerning differentials in pay by sex.
Step by step method for computing regression model for salary data is given in the answer.