All entries for Sunday 14 August 2005
August 14, 2005
We will think about efficiency only if the subjects of comparison are consistent estimators. Thus, we have to compare both of them under the circumstance where they are consistent. That is, Theorem of the Weighted estimators holds with Z=X and the objective function is E[q(W,theta)|X]. Alternatively, the following scenary must be true.
SCENARY 5: a feature of the conditional distribution is correctly specified, R depends only on X, X are completely recorded and some regularity condtions hold.
Given this scenary, it can be show that, for Weighted estimator, asym variance is the same whether the missing probabilities are estimated or are known.
This result does not depend on whether or not a generalised conditional information matrix equality (GCIME) holds. Thus, if all conditions of this scenary are satisfied, the result holds even when there is heteroskedasticity in VAR of unknown form in the context of LS (see Wooldridge 2002)
At this point, we know that, according to this scenary, both Weighted and Unweighted estimators are consistent and it does not matter whether we estimate the seletion probabilities are use the known probabilities. (then the quesion remains: Should we weight or not weight?)
However, if GCIME also holds, we can show that the asymvariance of Unweighted estimator is smaller than that of Weighted estimator.
GCIME holds for conditional MLE if the conditional DENSITY is correctly specified with the true variance =1. (so both, say, population conditional mean and the density have to be correctly specified)
So, in SCENARY 5, it is likely that we will use Unweighted estimator.
In the SCENARY 2 below, we say that we will use Weighted estimator. However, if GCIME holds, we might wanna gamble with the model specification of mean function and use Unweighted estimator.
To COMPARE BETWEEN THE TWO APPROACHES
To comepare them, we have to set the situations where both of them can be used as an alternative of one another.
SCENARY 1: If R depends only on X, the covarites in the model. That is, there is no Z variable and Assumption MAR holds as (R=1|Y,X) = (R=1|X). This is the most common situation where both types of analysis can be used. Also, this is quite a realistic situation since, in practice, it will be difficult to find Z which is not a subset of X and makes the independence between R and (Y,X) holds.
(1) Do not require the correct specification of E(Y|X).
(2) Allow some other variables which are not in X to alsp affect R.
(1) the model of R has to be correct.
(2) X has to be completely recorded now so that we can estimate the model of R.
(3) only Y is allowed to be missing.
(1) Y and X can be jointly missing as the conditioning variables in the model of R (X or a subset of it) can be missing.
(2) If there are some variables which are important but incompletely recorded, they can be in X. In attrition, some covariates are missing in the later waves of study, such covariates can be included using this appraoch.
(1) correct specification of the feature of interest
AS CAN BE SEEN, to use which one of them, we have to consider case by case. NOTE that MAR and NMAR is not a good criteria to divide the literature in missing data (at least for Econometricians) anymore. From the above elaboration, the circumstance where X is missing but R depends on X fits with the setting of NMAR. As can be seen, there can be the case where Unweighted M-estimator is consistent.
SCENARY 2: If R depends only on X and X is completely recorded;
Then, we should use Weighted analysis as we can allow for mispecification and some other variables to affect the missing probability.
SCENARY 3: If R depends only on X, X is incompletely recorded and the feature of interest is correctly specified;
Then, we should use Unweighted analysis because the incomplete X can be in the model of R. (As we know for sure that these incomplete X are matter for the R's model and we do not have to worry about the miscpecification.) Thus, Unweighted analysis yields consistency under weaker assumption (= more variables in the R's model to ensure the independency between Y and R)
However, there could be some restriction such that we cannot use these incomplete variables anyway (Wooldridge (2002) gives an example where the structural of the conditional expectation model refrains us from using incomplete variables). So, Weighted might be better.
SCENARY 4: If R depends only on X and X is incompletely recorded;
Now, we do not know whether, say, E(Y|X) is correctly specified or not. This case is also unclear. We might want to use Unweighted analysis and gamble about the model specification. Or, if R is not significantly dependent on those incomplete X, we may want to use Weighted analysis.
Note that, in all of this discussion, R has to be independent of Y conditional on some variables anyway. ( For Weighted analysis this is Assumption MAR but, in Unweighted analysis, it is not MAR because the conditional variables in R model can be missing)
(1) Do not require the correct specification of the feature of Y|X ( conditional mean, conditional median)
(2) Allow other variables (apart from those in X) in Z to affect R.
(3) Y and X can be jointly missing as long as Z is fully recorded and as Assumption MAR (using Z) is satisfied.
(1) Model of R must be correctly specified
(2) Z must be completely recorded.
(3) Response probability has to be positive (meaning that we cannot exclude a subsection of the population in the sampling process) ( This may imply that wage equation example is not valid here because people who dont work are excluded completely)
(1) Missing variables (except Y) can be allowed into the model of R,i.e., missing mechanism. This is because we do not have to estimate the response probability. Thus, Ignorability Assumption ( this is not MAR) of R's model tends to be weaker than that of Weighted analysis in general ( since more variables can be conditioned upon to make the independent between Y and R more plausible.)
(2) Y and X can be jointly missing even when we dont have any variable as Z.
(3) Response probability can be zero for some subset of population
(1) require correct specification of the conditional mean, conditional median or conditional distribution.