All entries for October 2005
October 25, 2005
Plewise's example of whether to use non-response weights or not is very interesting. How it is related to weighting in a regression context? Can we just adopt it? Maybe not. In conditional model context, we know that if response prob. vary with X then the adjusted mean is better than an unadjusted?
We should try to apply Richard approach to Examples of Plewise.
In this examples, research ignored the information about missing-mechanism. That is, he still clings on to the fact that response rate in stratum 1 is 0.9 and in Stratum 2 is 0.7. How about if we stratify the population according to the values of the binary variable of interest?
So the response rate is not for each stratum, but for each possible values of the variable of interest, which is 0 and 1.
This means that we have to stratify each stratum into two substratum and then apply the Rihcard's method. Very interesting to see what will happen if we use this weight instead.
From the sheet about weighting of ESDS, we know that there are 3 types of weights:
(1) Sample Design or Probability Weights;
(2) Non-response weights ;
(3) Post-stratification weights.
Based on observed variables, one calculate the prob of an observation being included and weight the observation with the inverse of this weight.
Weight (2) is also a type of IPW. However, we use an incomplete set of variables to put observations into different classes and observations in the same class are given the same weight. Thus, we implicitly make an assumption that observations in the same class are of the same characteristic. Of course, this could be wrong.
Weight (3) is just a fequency weight to adjust our sample to represent the real population.
The thing is they normally combine these weights together. As we can see, (1) is like the weight in IPW M-estimtor and (2) is the weight of Richard and Esmeralda. Can we find an optimal way to combine these two weights?? Note that (1) can be continuously vary with observations according to its definition but (2) have to be constant for observations in the same class.
Another point is whether there is a difference between weighting of survey data in general and weighting in a particular study. For example, in a dataset from LFS that we are working on, there are two weights provided. However, "hrrate" is not fully observed and we would like to do IPW M-estimation to take an account of this missingness. So even though the weights provided (pwt03, piwt03) are calculated using non-response weight, we should calculate our own weights and combined them together???
October 19, 2005
(1) plot "lev" against "iprob" as the measure of detacting troblesome observations.
(2) Test whether normality assumption is suitable for our binary choice model or not.
(3) If not, find alternative (note that "unusual" weights aggsigned to observations could happen because we use the wrong model for e of the latent regression)
(4) We do not include "lhourpay" into the structral model because there is not suitable interpretation of the coef of this variable.
(5) we do not include "lhourpay" into the probit model because we would like to use the same set of X's as in the structural model.
(6) Write a report to Mark, showing results of various regression.
(7) Trim the "iprob" using the sample size because this will not affect the asymptotic probperties of the estimator (as the treshold "N" become infinity and we effectively do not trim the weights)
(8) Try Logit instead of Probit to see whether it affects the two observations with high "iprob" or not.
(9) try to code people who study over 30 years as leaving education at 24 to work and then later come back to study. So start calculating work experience from 24 year of age.
October 18, 2005
We are interested in the true value of theta, theta0, of f(y|x,theta). In MLE, we define this value as the unique soluation to
maxE[f(y|x,theta)]. If iid sample is available, ML estimator is the solution to the sample analog of this problem.
However, an endogenous stratified sample is available instead of an iid sample.
October 13, 2005
We are not clear about what is the relationship of "lhourpay" and the regression of "lhrrate" on covariates. For example, if "lhourpay" is included in to the regression, Is is endogenous?
One thing we can be sure of is that "lhourpay" is "lhrrate" + mesurement errors! Can this help explaning anything?