--------------------------------------------------------------------- name: log: /Users/home/Dropbox/Teaching 2018/StataPanelSeminar/Exer > cise/panel_seminar_log.log log type: text opened on: 6 Oct 2018, 13:56:24 . . * Import data . use "NLSY97.dta", clear // Note 'clear' options so you can > rerun the do file. . . ********** Data Preparation . . ***** Generate and format indicator variables . . * Female indicator variable . gen female = . // Start wi > th missing values (152,728 missing values generated) . replace female = 0 if sex == 1 // Set female = 0 if sex in > dicates Male (78,183 real changes made) . replace female = 1 if sex == 2 // Set female = 1 if sex in > dicates Female (74,545 real changes made) . label variable female "Female" // Label variable . drop sex // > Drop old sex variable . . * Urban indicator variable . gen urban = . // > Start with missing values (152,728 missing values generated) . replace urban = 0 if rural_urban == 0 // Set urban = 0 for rural > areas (26,108 real changes made) . replace urban = 1 if rural_urban == 1 // Set urban = 1 for urban > areas (100,313 real changes made) . label variable urban "Urban" // Label variable . drop rural_urban // > Drop old rural_urban variable . . * High school indicator variable . gen highschool = . // Same process for new completed H > S completion indicator (152,728 missing values generated) . replace highschool = 0 if highest_grade_compl < 12 (37,626 real changes made) . replace highschool = 1 if highest_grade_compl >= 12 & /// > !missing(highest_grade_compl) (54,417 real changes made) . label variable highschool "Completed High School" . . /* > Note the !missing(highest_grade_compl) condition when coding the 'h > ighschool' > indicator variable. Since 'highest_grade_compl' has missing values > , I need > the "not missing: condition or else all missing values would be cod > ed as > highschool = 1, since missing values evaluate to the largest possib > le values > in Stata. > */ . . . ***** Generate continuous variables . gen age = year - birthyr // Created > imputed age variable . gen logearnings = log(wage_salary_yr) // Generate log of earnings > variable (74,335 missing values generated) . replace hs_gpa = hs_gpa / 100 /* Note: the label for hs_gpa says > it's a 4.0 > sca > le, but the actual scale is 0-400. So I > res > cale it back to 0-4. */ variable hs_gpa was long now double (102,068 real changes made) . . . . ********** Exploratory Analysis . . set scheme sj /* This sets the color scheme of the Stata graphs s > o that they > are grayscale. I didn't as > k you to do this, but I like > for the graphs to appear th > is way. */ . . . *** Mean Earnings by Age . egen mean_earnings = mean(wage_salary_yr), by(age) // Gen mean earn > gings by age . graph twoway line mean_earnings age, sort /// > title("Mean Earnings by Age") /// > xtitle("age in years") /// > ytitle("Earnings (nominal USD)") . . graph export "earningsbyage.tif", replace (file earningsbyage.tif written in TIFF format) . . *** Scatterplot of GPA and Earnings . graph twoway scatter logearnings hs_gpa if year == 2015 /// > & census_region == 1 & age == 32, /// > title("Log earnings and high school GPA") /// > xtitle("High school GPA (4.0 Scale)") /// > ytitle("Log of earnings") . graph export "earningsandGPA.tif", replace (file earningsandGPA.tif written in TIFF format) . . . ********** Regression Analysis . . xtset person_id year // Set Panel Structure panel variable: person_id (strongly balanced) time variable: year, 1997 to 2015, but with gaps delta: 1 unit . . eststo clear /* When storing regression output to create tables, > it's best > to start with eststo clear > so that if you need to rerun the > code, it won't duplicate an > y table. */ . . eststo: regress logearnings asvab_score hs_gpa highschool /// > yrs_at_job effort_at_work dependable female urban /// > i.age i.race_ethnicity i.census_region i.year, robust Linear regression Number of obs = > 13,228 F(34, 13193) = > 360.56 Prob > F = > 0.0000 R-squared = > 0.4893 Root MSE = > .94647 --------------------------------------------------------------------- > --------- | Robust logearnings | Coef. Std. Err. t P>|t| [95% Conf. > Interval] -------------+------------------------------------------------------- > --------- asvab_score | 1.40e-06 3.87e-07 3.63 0.000 6.44e-07 > 2.16e-06 hs_gpa | -.1585972 .0190748 -8.31 0.000 -.1959866 > -.1212079 highschool | .1644034 .0275626 5.96 0.000 .1103768 > .21843 yrs_at_job | .1295166 .0047264 27.40 0.000 .1202521 > .1387811 effort_at_~k | .0527877 .0076846 6.87 0.000 .0377248 > .0678506 dependable | -.0181929 .009512 -1.91 0.056 -.0368377 > .000452 female | -.2124571 .0172349 -12.33 0.000 -.24624 > -.1786743 urban | .0260445 .020964 1.24 0.214 -.0150479 > .0671369 | age | 17 | .6204708 .098589 6.29 0.000 .4272222 > .8137195 18 | 1.168643 .0963843 12.12 0.000 .979716 > 1.35757 19 | 1.447202 .1064477 13.60 0.000 1.238549 > 1.655855 20 | 1.815458 .1115359 16.28 0.000 1.596832 > 2.034084 21 | 2.080071 .1163558 17.88 0.000 1.851997 > 2.308145 22 | 2.26876 .1206576 18.80 0.000 2.032254 > 2.505266 23 | 2.483872 .1242119 20.00 0.000 2.240399 > 2.727345 24 | 2.638945 .1273527 20.72 0.000 2.389315 > 2.888574 25 | 2.805506 .1300847 21.57 0.000 2.550521 > 3.06049 26 | 2.814619 .1334175 21.10 0.000 2.553102 > 3.076137 27 | 2.98372 .1382175 21.59 0.000 2.712794 > 3.254646 | race_ethni~y | Hispanic | .146562 .0308593 4.75 0.000 .0860733 > .2070508 Mixed Ra..) | .2792948 .0927275 3.01 0.003 .0975356 > .4610539 Non-Black.. | .1751107 .0243466 7.19 0.000 .1273877 > .2228336 | census_reg~n | North Cen.. | -.1023471 .0261842 -3.91 0.000 -.1536719 > -.0510223 South | -.0460424 .0263269 -1.75 0.080 -.0976469 > .0055621 West | -.0118527 .0279806 -0.42 0.672 -.0666986 > .0429932 | year | 2001 | .1159226 .0532017 2.18 0.029 .0116396 > .2202056 2002 | .0804097 .0607477 1.32 0.186 -.0386646 > .199484 2003 | .0992369 .0702071 1.41 0.158 -.038379 > .2368529 2004 | .1251351 .0782955 1.60 0.110 -.0283353 > .2786054 2005 | .2079047 .083853 2.48 0.013 .0435407 > .3722687 2006 | .2509079 .0889275 2.82 0.005 .0765972 > .4252185 2007 | .3049195 .093085 3.28 0.001 .1224596 > .4873794 2008 | .351181 .096862 3.63 0.000 .1613176 > .5410445 2009 | .2884693 .1011238 2.85 0.004 .0902521 > .4866865 | _cons | 6.4587 .1098502 58.80 0.000 6.243378 > 6.674022 --------------------------------------------------------------------- > --------- (est1 stored) . . eststo: xtreg logearnings asvab_score hs_gpa highschool /// > yrs_at_job effort_at_work dependable female urban /// > i.age i.race_ethnicity i.census_region i.year, re robust Random-effects GLS regression Number of obs = > 13,228 Group variable: person_id Number of groups = > 2,313 R-sq: Obs per group: within = 0.5835 min = > 1 between = 0.3221 avg = > 5.7 overall = 0.4865 max = > 10 Wald chi2(34) = > 8764.99 corr(u_i, X) = 0 (assumed) Prob > chi2 = > 0.0000 (Std. Err. adjusted for 2,313 clusters in p > erson_id) --------------------------------------------------------------------- > --------- | Robust logearnings | Coef. Std. Err. z P>|z| [95% Conf. > Interval] -------------+------------------------------------------------------- > --------- asvab_score | 2.43e-06 6.57e-07 3.70 0.000 1.14e-06 > 3.72e-06 hs_gpa | -.1336075 .0321512 -4.16 0.000 -.1966226 > -.0705923 highschool | .1661021 .0397046 4.18 0.000 .0882825 > .2439217 yrs_at_job | .097124 .0060808 15.97 0.000 .0852059 > .1090422 effort_at_~k | .0559893 .0134956 4.15 0.000 .0295384 > .0824402 dependable | -.0217601 .0168253 -1.29 0.196 -.0547371 > .0112168 female | -.2365603 .0292647 -8.08 0.000 -.293918 > -.1792026 urban | .0365131 .0272778 1.34 0.181 -.0169504 > .0899767 | age | 17 | .6701809 .0952641 7.03 0.000 .4834668 > .8568951 18 | 1.236587 .1003886 12.32 0.000 1.039829 > 1.433345 19 | 1.531404 .1174612 13.04 0.000 1.301184 > 1.761623 20 | 1.912966 .1290106 14.83 0.000 1.66011 > 2.165823 21 | 2.182201 .1399739 15.59 0.000 1.907857 > 2.456544 22 | 2.387631 .152673 15.64 0.000 2.088398 > 2.686865 23 | 2.603772 .1643891 15.84 0.000 2.281576 > 2.925969 24 | 2.780479 .1745557 15.93 0.000 2.438356 > 3.122602 25 | 2.941867 .1844739 15.95 0.000 2.580305 > 3.30343 26 | 2.985322 .1939797 15.39 0.000 2.605128 > 3.365515 27 | 3.120994 .2045331 15.26 0.000 2.720117 > 3.521871 | race_ethni~y | Hispanic | .1650294 .0509788 3.24 0.001 .0651127 > .264946 Mixed Ra..) | .2308825 .1550219 1.49 0.136 -.0729549 > .5347199 Non-Black.. | .2052931 .0406527 5.05 0.000 .1256152 > .284971 | census_reg~n | North Cen.. | -.0923181 .044117 -2.09 0.036 -.1787859 > -.0058503 South | -.0227871 .0429106 -0.53 0.595 -.1068904 > .0613163 West | .0087614 .0481999 0.18 0.856 -.0857087 > .1032315 | year | 2001 | .1098782 .0498231 2.21 0.027 .0122267 > .2075297 2002 | .0795058 .0646235 1.23 0.219 -.047154 > .2061655 2003 | .1077043 .0829139 1.30 0.194 -.054804 > .2702126 2004 | .0968735 .1013902 0.96 0.339 -.1018476 > .2955947 2005 | .1895836 .1169321 1.62 0.105 -.039599 > .4187663 2006 | .2232084 .1318664 1.69 0.091 -.035245 > .4816617 2007 | .2930453 .1441148 2.03 0.042 .0105856 > .5755051 2008 | .3402623 .1542698 2.21 0.027 .0378991 > .6426254 2009 | .3135486 .1647981 1.90 0.057 -.0094497 > .636547 | _cons | 6.177176 .1486992 41.54 0.000 5.885731 > 6.468621 -------------+------------------------------------------------------- > --------- sigma_u | .61058115 sigma_e | .77313164 rho | .38412495 (fraction of variance due to u_i) --------------------------------------------------------------------- > --------- (est2 stored) . . eststo: xtreg logearnings asvab_score hs_gpa highschool /// > yrs_at_job effort_at_work dependable female urban /// > i.age i.race_ethnicity i.census_region i.year, fe robust note: asvab_score omitted because of collinearity note: hs_gpa omitted because of collinearity note: effort_at_work omitted because of collinearity note: dependable omitted because of collinearity note: female omitted because of collinearity note: 2.race_ethnicity omitted because of collinearity note: 3.race_ethnicity omitted because of collinearity note: 4.race_ethnicity omitted because of collinearity note: 2009.year omitted because of collinearity Fixed-effects (within) regression Number of obs = > 13,228 Group variable: person_id Number of groups = > 2,313 R-sq: Obs per group: within = 0.5844 min = > 1 between = 0.2576 avg = > 5.7 overall = 0.4641 max = > 10 F(25,2312) = > 315.29 corr(u_i, Xb) = -0.0467 Prob > F = > 0.0000 (Std. Err. adjusted for 2,313 clusters in p > erson_id) --------------------------------------------------------------------- > --------- | Robust logearnings | Coef. Std. Err. t P>|t| [95% Conf. > Interval] -------------+------------------------------------------------------- > --------- asvab_score | 0 (omitted) hs_gpa | 0 (omitted) highschool | .1094046 .0573423 1.91 0.057 -.003043 > .2218522 yrs_at_job | .0739907 .0070301 10.52 0.000 .0602047 > .0877767 effort_at_~k | 0 (omitted) dependable | 0 (omitted) female | 0 (omitted) urban | .0492101 .0324989 1.51 0.130 -.0145198 > .1129401 | age | 17 | .7336689 .0979597 7.49 0.000 .5415708 > .925767 18 | 1.374485 .1037419 13.25 0.000 1.171048 > 1.577922 19 | 1.728548 .1173718 14.73 0.000 1.498383 > 1.958713 20 | 2.158161 .1193282 18.09 0.000 1.92416 > 2.392163 21 | 2.4642 .117919 20.90 0.000 2.232962 > 2.695438 22 | 2.71188 .1172112 23.14 0.000 2.48203 > 2.94173 23 | 2.966728 .1142764 25.96 0.000 2.742633 > 3.190823 24 | 3.192842 .1108302 28.81 0.000 2.975505 > 3.410179 25 | 3.389888 .1083582 31.28 0.000 3.177399 > 3.602378 26 | 3.486306 .1083687 32.17 0.000 3.273796 > 3.698816 27 | 3.653422 .1106961 33.00 0.000 3.436348 > 3.870496 | race_ethni~y | Hispanic | 0 (omitted) Mixed Ra..) | 0 (omitted) Non-Black.. | 0 (omitted) | census_reg~n | North Cen.. | -.0878832 .1041593 -0.84 0.399 -.2921387 > .1163722 South | .0634577 .0919756 0.69 0.490 -.1169056 > .243821 West | .1554264 .1162879 1.34 0.181 -.0726131 > .383466 | year | 2001 | .0721984 .0456853 1.58 0.114 -.01739 > .1617869 2002 | .0086543 .0510119 0.17 0.865 -.0913796 > .1086881 2003 | .0025964 .0559176 0.05 0.963 -.1070575 > .1122503 2004 | -.0580436 .0583332 -1.00 0.320 -.1724346 > .0563473 2005 | .0005902 .0567985 0.01 0.992 -.1107911 > .1119714 2006 | -.0039962 .0520458 -0.08 0.939 -.1060575 > .098065 2007 | .0369685 .0434009 0.85 0.394 -.0481403 > .1220773 2008 | .0494228 .0307423 1.61 0.108 -.0108626 > .1097083 2009 | 0 (omitted) | _cons | 6.167635 .1156249 53.34 0.000 5.940896 > 6.394375 -------------+------------------------------------------------------- > --------- sigma_u | .77735141 sigma_e | .77313164 rho | .50272157 (fraction of variance due to u_i) --------------------------------------------------------------------- > --------- (est3 stored) . . predict earnings_resid, residuals // Save residuals (139,500 missing values generated) . . * Save regression output tables . esttab using "panel_seminar_exercise_regs", /// > title("Regression of earnings on education, experience, and > other determinants") /// > mtitles("Pooled OLS" "Random Effects" "Fixed Effects") /// > se label wrap noabbrev rtf /// > drop(16* 17* 18* 19* 20* 21* 22* 23* 24* 25* 26* 27*) /// > star(* 0.10 ** 0.05 *** 0.01) b(%8.2g) /// > compress one replace (output written to panel_seminar_exercise_regs.rtf) . eststo clear . . *** Graphs of Residuals . graph twoway scatter earnings_resid year if person_id == 2 . graph twoway scatter earnings_resid year if person_id == 13 . graph twoway scatter earnings_resid year if person_id == 21 . graph twoway scatter earnings_resid year if person_id == 28 . graph twoway scatter earnings_resid year if person_id == 35 . graph twoway scatter earnings_resid year if person_id == 41 . graph twoway scatter earnings_resid year if person_id == 49 . graph twoway scatter earnings_resid year if person_id == 53 . . . ******************************************************************* > ********* . **** Interpretation > **** . ******************************************************************* > ********* . /* > > 1. Why are some variables omitted from the fixed effects regression > ? > > The demographic variables (sex and race) as well as the ability mea > sures > (ASVAB score and high school GPA) are omitted from the fixed effect > s > regression because they are time-invariant. When we think of the f > ixed effects > explanatory variable as the demeaned x_it - avg(x)_i, then it's cle > ar when > these variables do not vary over time, they will be collinear with > the > individual fixed effects. > > > > 2. How (and why) do the standard errors of estimates change between > estimates? > > The standard errors get larger going from pooled OLS, to random eff > ects, to > fixed effects regressions. This makes sense given that OLS uses al > l of the > within and between variation, fixed effects uses only within variat > ion, and > random effects uses something inbetween. > > 3. Do the residuals look autocorrelated? > > Yes. There are occasionally big jumps in the error, but for the mo > st part > the residual in one period looks like its position is related to th > e position of > the residual in the previous period. > > */ . ******************************************************************* > ********* . **** General Notes > **** . ******************************************************************* > ********* . /* > > 1. Note the "i." prefix for the census_region, age, and race_ethni > city > controls. In the exercise, we did not create indicator variables, > but in the > regression you can quickly control for indicators for each distinct > value of > these variables by using the "i." prefix. (Note, I have also done t > his for > years). > > Using the "i." prefix is great for anything where you need to gener > ate a lot of > indicators quickly, which are not the main explanatory variables of > analysis. > > 2. Note that when I run the fixed effects regression, I still incl > ude > indicators for each year by using i.year (which are of course, the > year fixed > effects). It is tempting to think that fixed effects regression wi > ll > control for these automatically, but it does not. The fixed effect > estimator > by default controls only for the individual effects (the equivalent > of > i.person_id). > > Hence, you always need to control for time trends by creating time > fixed effects > manually. You can also control for time fixed effects in pooled OL > S and random > effects regression, as the same time trend concerns apply there too > . > > 3. Note that one of the options for the `esttab` command is > "drop(16* 17* 18* 19* 20* 21* 22* 23* 24* 25* 26* 27*)" > > What does this do? Compare the regression output vs the word table > generated > by esttab. The fixed effects for each year (2000-2009) and age (16 > -27) are > missing in the regression table that I exported. Thats because I h > ave omitted > displaying any estimates for variables that start with these values > (hence > the year fixed effects are omitted because of the "20*" term in the > drop. > > These age and year fixed effects are not my primary explanatory var > iables, > just necessary controls. To make the table I present less cluttere > d, I > therefore do not report the estimates for the variables. > */ . end of do-file