# Xtreg and areg - Robust Regression Analysis with STATA

For this exercise, you will write your do-file from scratch. Don't forget the Help option if you don't know how to do something. Make sure your do-file is well organized and well annotated.

Note: At the end of this document, you will find instructions for an alternate way to create a do file.

Please submit:

a) A printed Stata log file documenting that you've completed each of the operations directed below as well as results. As always, be sure to run only your corrected Do file so your printed log file does not contain errors.

b) A printed write-up, which includes typed answers to the questions below.

1. Estimate a fixed effects regression with total expenditures per pupil as the dependent variable. Be sure to use schools as the fixed effect and robust standard errors. Use xtreg and robust (or r). Ask Help.

a. Use the following independent variables:

• percent black

• percent free lunch

• total enrollment

• total enrollment squared

• percent Hispanic

• percent Asian

• percent full-time special education

• percent immigrant

• percent female

• math z score

• year dummies

b. Why are the borough and middle dummies omitted? What would happen if they were included?

3. Now estimate the regression using areg with the absorb option and robust standard errors.

Areg estimates fixed effects by including 711 (one omitted) dummies for each school (but does not show the coefficients for these dummies). Xtreg estimates fixed effects by "deameaning" - that is subtracting each school's mean over the four years from each year's value for every variable and then uses these demeaned values in the regression.

Therefore, one of these estimates (areg vs. xtreg) has more variables in it (more k) and therefore a smaller number of degrees of freedom (n-k-1). In light of this explanation:

What differences can you detect between xtreg and areg, if any?

4. Interpret the coefficient (in the fixed effects context) on percent full-time special education.

ALTERNATE WAY TO WRITE A DO FILE

You can write your do-file another way as well, but if you choose this option, you will need to edit and annotate the final file. The alternative is to type commands into the command box in Stata (or even to use the drop down menu to find commands and put them into the command box). Then you can run each command or set in "real time" to see if it works.

When you are finished, with commands that do and do not work, you will paste them all into a do-file and at that point edit out the wrong commands and also annotate.

a. To create a new file from the commands you've already used:

i. Right click on the header in the Review panel (where all the commands are listed) and click on select all

ii. Right click inside the Review panel and select Send to Do-file Editor

iii. Save this as you would any other file using the .do extension

1. If you save the do-file before the end of your commands, then you will need to add new ones. To do so:

In the panel with the list of commands, select the commands that you want to add to your do-file.

a. Note: you can select multiple commands by holding the Ctrl key

b. Note: this means that you can "pick and choose" the commands you add (so you don't have to include the lines that didn't work)

2. Copy and paste these commands into your working do-file

3. Save the updated do-file

iv. Note: if you don't update your do-file as you go along (see the section above) you will need to re-save this at the end of the session - to save over the initial do-file:

1. Close the do-file you created earlier

2. Right click inside the panel with the list of commands and click on Select All

3. Right click in the panel again and select Save All

4. Save the do-file (using the same name as before) - Stata will ask if you want to replace the existing file, which you do

5. You will then need to open the do-file in the editor and delete any commands that were errors

https://brainmass.com/statistics/regression-analysis/xtreg-and-areg-robust-regression-analysis-with-stata-578143

#### Solution Summary

This solution is comprised of a detailed explanation Regression Analysis in STATA. This solution mainly discussed the different regression models with actual variable, dummy variables and other transformation of variables. This solution explained the fixed regression analysis with option xtreg and areg with option absorb in STATA.

Fixed model - Robust Regression Interpretation of results

Question One:

This problem employs a dataset on labor markets in 23 OECD countries for the years 1980 to 1998.

The variables used in the analysis (followed by descriptive statistics) are:

1. Productivity index [prod] = An index measuring country i's economic output (GDP) per hour worked in year t, normalized such that each country's index = 100 in 1995.

2. Unemployment rate [unr] = The total number of unemployed workers in country i and year t divided by the total number of labor force participants in that country and year, multiplied by 100.

3. Union density [ud] = The ratio of total reported union members (minus retired and unemployed members) in country i and year t to the total number of employees earning wages or salaries in that country and year, multiplied by 100.

4. Public sector growth [gempl] = The one-year percentage growth (from year t-1 to year t) in public sector employment in country i (measured as a proportion, 0 to 1).

5. USD exchange rate growth [usd]: The one-year percentage growth (from year t-1 to year t) in the value of country i's currency relative to the US dollar (measured as a proportion, 0 to 1).

6. Labor force (1K) [lf]: The total number of labor force participants in country i and year t, in thousands.

(See attached)

1. While there are no missing years in the dataset, there are missing observations for some of the variables.

a. If there were no missing values for any variables, how many observations (country-years) would there be for every variable in the summary table of descriptive statistics presented above?

b. Given the number of observations for each variable shown in the summary table, knowing there are no missing years in the data, and knowing that Stata regression drops a case when there are any missing values for any variable for a given country in a given year, what is the maximum number of countries that can be used in a regression analysis (assuming nothing is done to replace missing values)?

c. Given that there are 19 years of data in the regression analyses presented in Table 1, how many countries were used in the analyses?

d. Could we estimate the effect of usd on prod with FE if the value of every country's currency (relative to the US dollar) remained the same over the sample time period? Why or why not? Please answer in 2-3 sentences.

2. Write the general equations for the specifications in columns (1) and (2). Use lowercase b for the regression coefficients and, where appropriate, a to indicate fixed effects and/or T to indicate time effects. Use the variable names presented in brackets [ ] on the prior page, and use subscripts as appropriate. You do not have to include an error term.

Column 1:

Column 2:

3. Using the models estimated without time effects, interpret:

A. The effect of a 5-percentage-point increase in union density.

B. The effect of a 10-percent increase in the growth of the public sector.

4. Compare the specifications with time effects to those without time effects. What do the differences in the statistically significant coefficients imply about the time effects? Note that the time effects are jointly statistically significant with a p-value of 0.00. Please answer in 2-3 sentences.

Question Two:

Table 2 presents results of a study of the effect of differences in the fraction of new immigrants on crime rates in U.S. metropolitan area (MA's) over nine years.

^

a. Write the general equation for the regression in column 2. Use β for the regression coefficients (not the actual numbers in column 2), use the variable names presented in the table in brackets [ ], and use subscripts as appropriate. If appropriate, use MA and T as fixed effects.

b. Using the results in column 2:

i. Ignoring significance, what is the effect of a twenty (20 percentage point or .2 fraction) increase in new immigrants on the overall crime rate?

ii. Form a 95% confidence interval around the effect you've just calculated.

c. Two parts:

i. In what two ways does the coefficient on Fraction of new immigrants [IMM] differ between columns 1 and 2?

ii. Why does it differ and what does this indicate about the estimates in columns 1 and 2?

d. Using the results in column 2: For an MA with 10% (.10 fraction) Hispanic population, what is the effect of a one percentage point (.01 fraction) increase in the percent female on the metropolitan crime rate?

e. Using the results in column 2, what is the effect of a one percent increase in population of an MA on the overall crime rate of the MA?

f. Two parts:

i. What hypotheses do the p values for F's in column 2 at the bottom of the table test? (Hint: There are two different p values (F's) and thus two different hypotheses.)

ii. What do you conclude from the tests?

Table 2

Regression coefficients: Log metropolitan area (MA) overall crime rate (CR) on various variables

(See attached)

Source: Calculations from Current Population Surveys (CPSs) and Uniform Crime Reports (UCR)

Notes: Robust Standard errors are parentheses and constant included but not shown.

~ p-value from an F-test

z: "Fraction" varies from 0 to 1 and differs in measurement from percent, which varies from 1 to 100.