All entries for August 2005
August 11, 2005
(1) Example 4.1 of Wooldridge (2002) shows that E(y|x) can be
m(x,theta), a nonlinear function. Also, the model (distribution) of Y|X can be a member of the linear exponential family yielding the quasi log-likelihood.
(2) In this light, there seems to be four components: distribution of Y|X, model for E(Y|X), distribution of X and a model of R.
(3) In the case where we have to estimate selection probabilities, Y and X are required to be observed only when S =1. Thus, X can be missing with Y???
Normally, MAR => R depends on X which is fully observed. In this case, it is still MAR but R depends on the completely observed Z and X can be missing ???
August 09, 2005
August 08, 2005
Note about IPW
(1) There is an example of how to use IPW in the notes from introduction-to-missing-data course. But the missing variable is a covariate, not a dependent variable. We can follow this example. Notice that the same variables are used in both the structure model and the missing-data model. Must check what Wooldridge has to say about this. Also, the coefficient of the missing covariate is still significant and of the same magnitude even though we delete some observations due to the missingness in this variable. However, the missingness of this covariate does affect the coefficient of other covariates such as "gender". This is quite unexpected.
(2) Tell Mark about the limitation of Heckman's approach which is that it wroks only with some special case of non-linear model (exponential). This is noted by Wooldridge (2002). However, nonparametric version of Heckman's approach may be intended as an relaxation of this limitation.
(3) Consider the IPW estimator in the light of Mark's comment. The model of Y given X can be non-linear as in the nonlinear LS case. The distribution of X has to satisfy certain conditions but we dont have to specify it ( again as in nonlinear LS case; look in Wooldridge (2002)). However, we have to model R and make MAR Assumption. Note that although the model of R has to be known, nonparamatric modelling is possible. Wooldridge suggests us to read the paper of Hirano, Imbens..
August 03, 2005
Things to do:
(1) Mark suggests that there are 3 components in the model of missing data: model of Y given X, model of X and model of R (the missing variable). For example, TLR assume that Y is linear in X but does not assume the distribution of X and the parametric model for R. So TLR is parametric in Y given X but not in X and R. Sample Selection model of Newey et al. seems not to restrict the model of Y given X but assume some thing about the model of R.
So we have to read and to clarify about this point.
(2) Emprircal Works: need to read the paper of Skinnner and do the IPW estimation with the data set.
Also, after doing IPW, think about the criteria of comparing different estimators. For building the wage distribution, we dont have to estimate the fitted values??
(3) one of the problems discussed was that TLR needs as less X as possible because we would like to be non-parametric about the disribution of X.
(4) Mark would like us to do (i) descriptive stats on all variables, (ii) complete case analysis (probably both unweighted and weighted using basic weight (i.e. non-income one)), (iii) IPW estimation (wtd & unwtd) with the following variables as our X's:
Years of education (= age left full-time education – 5)
Experience (potential) (= Current age – age left full-time education)
[These 3 variables are the core of the "Mincerian earnings equation" that economists generally start with]
Size of workplace (dummy for 25+)
London & South-East (combined dummy)
[These 5 variables are all commonly used ones and give a good mixture of characteristics of the individual and characteristics of the job]
(5) an issue with IPW is that whether X and Z have to be different
(6) Empirical Work: should we include age < 22 in our sample
as Skinner does?