Run IPW and Unweighted estimation. For IPW, we use probit model to estimate the selection probabilities. But we do not use sampling weights (provided by the ONS) to combine with the estimated selection probabilities. We only use the inverse probabilities as weights in IPW estimation. Also we havent checked the validity of the probit estimation yet. We should run some diagnostics tests on this. Note that Vector Z in R's model includs all X's and HOURPAY (Derived variable). [Should it be in this model? R, conditional on Z, is independent of Y. But Y (Direct variable) should be correlated with this Z (Derived variable)
For Unweighted estimators, we run OLS (complete case analysis), OLS with income-sampling weights and OLS with normal-weights. Again, have not done the diagnostics tests.
EDAGE variable: there are people who are studying (EDAGE = 96) and had no education (EDUAGE = 97). We delete the group that are studying but quote 97 to be zero.
If EDAGE = 97 has few observations. Maybe we should delete this group as well.