Favourite blogs for The Ice-skating Gecko

Friends » In My Head

August 31, 2006

What to do tomorrow

1) check why using “cond()” rather than “replace” leads to better rate of convergence.
2) check whether we made any mistake translating code from intreg to our gmm code. (especially the cond() thing)
3) if “cond()” is better than use it instead of “replace”
4) Is code for b/sigma correct? that is, the “cond()” and “replace” thingy.

August 09, 2006

Simulation 2

Today I tried to change the moment conditions for the intercept, coefficient and for sigma. I hope that, with modifications, the code might work better. Instead of writting two lines of codes for r = 1 and r = 0, I combine them together using (1–r) and (r). However, this does not change the results.

August 08, 2006

Simulation 1

Today, I compared two simulation studies; 40 replications, discrete missing mechanism, but one study uses bb–parameters whereas the other uses bs parameters. It seems that the study with bb–parameters is better. This result againsts the original result from LFS data. There, I concluded that bs–parameter is better.

August 07, 2006

The Beginning of The End

(1) Time Line
August : Simulations, Learn Matlab, Chapter 1?
September: Moving out of the office

There are about 8 months left before April!!

What have to be done:
– Simulation Studies ( a lot)
– finish First Chapter with Richard (difficult)
– Tang Little and Rubin's estimator
– Bounds
– Maximum Score for IPW


Today, I tried to fix the simulation's programme by changing the "genr"; that is, use the discrete missing–mechanism. However, this does not help the problem. The convergence rate is still poor. why?

January 05, 2006

Identification on the theory part

For identification issue. When Y is continuous, Tang Little and Raghunathan (2003) mentions that only a certain type of parametric family can be allowed. We have to argue against this.

We might be able to use Manski's things. But we have to combine the identification in choice-based sampling with missing-data. Show that when P(Y|X) is specified, any model is ok.

Show that when H(Y,X) and f(X) are known, the missing-data mechanism is known. So missing-data becomes choice-based sampling problem. Then, when P(Y|X) is specified, all we have to do is to find an objective function that its unique solution is the true theta. In this second step, we may be able to use some material from Manski's chapter about choice-based sampling.

November 24, 2005

Meeting with Mark, Tuesday 22

(1) Tidy up the report by

– find a criteria for cutting or not the outliers

– use pwt03 weight as pwit03 is not exogenous weights

– use pweight instead of fweight

– find in the stat literature on how to adjust extreme weights

– use hetprob model instead of probit, find other binary choice model as well.

– read the paper that Mark gave about weighted OLS

– add some details that Mark asks in the report.
(2) prepare for the Selection model.

November 11, 2005

Works to be done in the next week

1 Continue writting Richard's chapter
2 Read Mark's reply and prepare for discussing with him
3 Prepare for teaching next Wednesday
4 Prepare for discussing with Richard about SKinner's paper

There is a possible overlap between (2) and (4).

October 25, 2005

Plewis's example and Weighting in Regression Model

Plewise's example of whether to use non-response weights or not is very interesting. How it is related to weighting in a regression context? Can we just adopt it? Maybe not. In conditional model context, we know that if response prob. vary with X then the adjusted mean is better than an unadjusted?

We should try to apply Richard approach to Examples of Plewise.
In this examples, research ignored the information about missing-mechanism. That is, he still clings on to the fact that response rate in stratum 1 is 0.9 and in Stratum 2 is 0.7. How about if we stratify the population according to the values of the binary variable of interest?
So the response rate is not for each stratum, but for each possible values of the variable of interest, which is 0 and 1.

This means that we have to stratify each stratum into two substratum and then apply the Rihcard's method. Very interesting to see what will happen if we use this weight instead.

An Idea about the future work on Weighting

From the sheet about weighting of ESDS, we know that there are 3 types of weights:
(1) Sample Design or Probability Weights;
(2) Non-response weights ;
(3) Post-stratification weights.
Based on observed variables, one calculate the prob of an observation being included and weight the observation with the inverse of this weight.

Weight (2) is also a type of IPW. However, we use an incomplete set of variables to put observations into different classes and observations in the same class are given the same weight. Thus, we implicitly make an assumption that observations in the same class are of the same characteristic. Of course, this could be wrong.

Weight (3) is just a fequency weight to adjust our sample to represent the real population.

The thing is they normally combine these weights together. As we can see, (1) is like the weight in IPW M-estimtor and (2) is the weight of Richard and Esmeralda. Can we find an optimal way to combine these two weights?? Note that (1) can be continuously vary with observations according to its definition but (2) have to be constant for observations in the same class.

Another point is whether there is a difference between weighting of survey data in general and weighting in a particular study. For example, in a dataset from LFS that we are working on, there are two weights provided. However, "hrrate" is not fully observed and we would like to do IPW M-estimation to take an account of this missingness. So even though the weights provided (pwt03, piwt03) are calculated using non-response weight, we should calculate our own weights and combined them together???

October 19, 2005

Meeting wit Mark today

Things to do
(1) plot "lev" against "iprob" as the measure of detacting troblesome observations.
(2) Test whether normality assumption is suitable for our binary choice model or not.
(3) If not, find alternative (note that "unusual" weights aggsigned to observations could happen because we use the wrong model for e of the latent regression)
(4) We do not include "lhourpay" into the structral model because there is not suitable interpretation of the coef of this variable.
(5) we do not include "lhourpay" into the probit model because we would like to use the same set of X's as in the structural model.
(6) Write a report to Mark, showing results of various regression.
(7) Trim the "iprob" using the sample size because this will not affect the asymptotic probperties of the estimator (as the treshold "N" become infinity and we effectively do not trim the weights)
(8) Try Logit instead of Probit to see whether it affects the two observations with high "iprob" or not.
(9) try to code people who study over 30 years as leaving education at 24 to work and then later come back to study. So start calculating work experience from 24 year of age.

October 18, 2005

Problem with Theory

We are interested in the true value of theta, theta0, of f(y|x,theta). In MLE, we define this value as the unique soluation to
maxE[f(y|x,theta)]. If iid sample is available, ML estimator is the solution to the sample analog of this problem.

However, an endogenous stratified sample is available instead of an iid sample.

October 13, 2005

Idea about lhourpay and lhrrate

We are not clear about what is the relationship of "lhourpay" and the regression of "lhrrate" on covariates. For example, if "lhourpay" is included in to the regression, Is is endogenous?

One thing we can be sure of is that "lhourpay" is "lhrrate" + mesurement errors! Can this help explaning anything?

October 04, 2005


The problem is "hourpay" is highly correlated to "hrrate" only. So omitting it will not cause any X's to be endogenous as "hourpay" is not significantly correlated to them. But omitting it should have another impact? or not?

August 25, 2005

Econometric Society World Congress 2005


About identification, we can probably look at works by Chescher and Imbens. These are about Triangular equations with endogenous regressor (we can think of R as an endogenous regressor? and put R in the structural equation? if so, we are in the flamework of them). We may be can say why Y has to be discrete in RS(2003)

Xiaohong Chen wrote a paper with Chunrong Ai about efficient sequential estimation of semi-nonparametric moment models which include Newey and Chamberlain's bounds as special cases.

Floren Jean-Pierre, Uni of Toulouse I wrote a paper about Endogeneity in nonseparable midels: Application to Treatment Models where the Outcomes are durations. Very techniqual paper. There is some regularise of irregular estimators.

Han Hong wrote a paper about confident interval of identification regions "Inference on Identified parameter sets"

Francesca Molinari, Cornell, Partial identification of probability disribution with misclassified data. Very interesting. Use Manski idea. Maybe we can use something to solve identification problem in our exrension of RS(2003)

Sergio Firpo, Puc-Rio and UBC, Inequality treatment effacts. See the effect of treatment on the distribution of the targeted variable. Quite interestin to use this idea to measure the effect of introducing NMW (national minimum wage) on the wage distribution.

What is the relationship between selection (missing data) and random censoring model? (see a paper by tamer and Shakeeb Kahn (Rocheter) for this randomly censored regression)

Thomas Stoker (UCL and MIT) works on missing in X. See his website for more details.

Cheti Nicoletti from Essex ISER works on bounds and imputation. Her paper has some issues about using impuation when Z in the impution regression model is not the same as X in the structural model (X is a proper subset of Z). Since, in Skinner (2002), their X is included in Z, this may be an issue.

Does Bhattacharya's paper allow NMAR? but, although he said that his paper started off from Hirano, Imbens,....'s paper on Attrition with IPW, that paper does not allow NMAR??

Tarozzi Chen..'s paper on Semiparametric Efficiency on moment models are very interesting. But they need refreshment samples. Notice that, for IPW estimators, in order to obtain efficiency, they have to weight with both nonparametric and parametric estimated probabilities ( with correct specification). This may mean that results due to Wooldridge (using estimated prob leads to more efficient estimator) carry over to this. However, since all of these estimators attain semiparametric efficiency bound, we cannot say that using estimated probs is more efficient but we can see that they have to use estimated probs to attain such bounds.

Grant Hiller's paper is very interesting. It is about the foundation of estimation procedure.

August 18, 2005

From Meeting with Mark the day before


we might wanna use EDAGE for exp and YERQAL2 for education.
Why? think about a student who drops out of his Uni after his first year. He will be 19 assuming he has entered the Uni at 18. For him, EDAGE (age leaving full time education) = 19 but YERQAL2 (age obtained highest qualification, which should be A-levels in this case) = 18.


We will come back to look at the case where Z is not equal to X later. That is, we might add some extra variables into R's model (such as hourpay) apart of those from X.


We decided to have Z = X for the time being.

the coefficient of "londse" seems to be varried a lot with the change in assumption.

Combine sampling weights and IP weights and use these modified weights in the OLS estimation.

Write a report for Mark about the arguments for using IPW estimator in Wooldridge (2002, 2003).

Do some diagnostic tests.

Try to do Heckman, Newey's selection models.

Next step should be Smith or TLR.

Meet some time in September.

August 17, 2005

Running IPW

Run IPW and Unweighted estimation. For IPW, we use probit model to estimate the selection probabilities. But we do not use sampling weights (provided by the ONS) to combine with the estimated selection probabilities. We only use the inverse probabilities as weights in IPW estimation. Also we havent checked the validity of the probit estimation yet. We should run some diagnostics tests on this. Note that Vector Z in R's model includs all X's and HOURPAY (Derived variable). [Should it be in this model? R, conditional on Z, is independent of Y. But Y (Direct variable) should be correlated with this Z (Derived variable)

For Unweighted estimators, we run OLS (complete case analysis), OLS with income-sampling weights and OLS with normal-weights. Again, have not done the diagnostics tests.

EDAGE variable: there are people who are studying (EDAGE = 96) and had no education (EDUAGE = 97). We delete the group that are studying but quote 97 to be zero.

If EDAGE = 97 has few observations. Maybe we should delete this group as well.

August 16, 2005

Case Analysis concluded from Wooldridge (2002) and (2003)

SCENARY 1: If R depends only on X, the covarites in the model. That is, there is no Z variable and Assumption MAR holds as (R=1|Y,X) = (R=1|X).

Other settings are special cases of this scenary, so we will not discuss it.

SCENARY 2: If R depends only on X and X is completely recorded;

Double Robustness property of Weighted estimator makes it more likely to be used in this situation. This property means that we have to correctly specify either E(Y|X) or the model of R to get consistency. Why? because (in the linear regression context??) if the model of R is correct, Weighted consistenly estimate L(Y|X), the linear projection in the population anyway. If L(Y|X) = E(Y|X), then we have consistency even if the model of R is incorrect.

The drawback is when GCIME holds and (Y|X) = E(Y|X) as Unweighted estimator is consistent and efficient. But Unweighted is not robust since if (Y|X) is not equal to E(Y|X), then it is inconsistent immediately.

SCENARY 3: If R depends only on X, X is incompletely recorded and the feature of interest is correctly specified;
Use Weighted but also see the comment below.

SCENARY 4: If R depends only on X and X is incompletely recorded;
Weighted estimator will be inconsistent as we cannot estimate R. Unweighted analysis may be better as it allow R model to depend on missing X (Ignorability Assumption holds using missing X)

SCENARY 5: a feature of the conditional distribution is correctly specified, R depends only on X, X are completely recorded and some regularity condtions hold.
Both are consistent and we dont have to bother about getting the model of R correct. It is likely that we will use Unweighted estimator hoping GCIME to hold.

SCENARY 6: Same as SCENARY 5 + GCIME holds
We will use Unweighted as it is efficient estimator.

August 15, 2005

IPW in Wooldridge (2003)

To explore the following issues: when Z is not completely recorded and when the model of R is incorrectly specified.

It is shown here that estimated probabilities yield more efficient estimator (than that using the known ones) as long as the generalised version of information matrix equality holds in the first-step estimation.

Z can be missing when R=1 if the model of R is the conditional log-likelihood function for the cencoring values in the context of censored survival or duraiton analysis.

When the sampling is exogenous (or R depends only on X) and the expectation of the objective function is conditional on X (no misspecification), if we you use Weighted estimator then the selection model (R's model) is allowed to be misspecified.

This should work well in SCENARY 2. In this scenary, we fully record X and R depends on X. The incentive for using Unweighted is that if the feature of interest is correctly specified and GCIME holds than it will be consistent and efficiency. (Note that, in MLE, this requires correct specification in the mean function (for consistency) and the conditional density (for GCIME)!!!

However, using Weighted estimator allows misspecification in both the model for the feature of interest (pop mean or median functions) and the model for missing-data mechanism. This sounds very promissing indeed.

In term of efficiency, we note below that asym var of estimated and unestimated (known) IPW estimator are the same under exogenous sampling and correct specification. From the result about misspecification in selection model, we can relax this result a bit since we no longer require the first-step estimation to be MLE and the correct specification of its model. Now, we can allow for any regular estimation problem with conditional variable Z and allow the misspecification in the probability of selection model (as long as sampling is exogenous and, say, conditional median is correctly specified).

This result extends the cases where GCIME holds, that is Unweighted is more efficient than Weighted ( even though selection model in Weighted estimation is allowed to be misspecified)

August 14, 2005

Note about MAR, NMAR and the literature

Note that, from the setting of IPW estimators, we can see the problem of using notions like NMAR and MAR to convey our idea. Because the nature of Econometric is in our interest on the conditional model, we always have Y and X. In IPW, Unweighted estimator is consistent in a case where X is missing and R depends on X. Such case should be called as NMAR but R is not dependent on Y. Normally, when we use NMAR, we would like to imply that R depends on Y as well. In this IPW's case, the notation of NMAR is clearly misleading. So we should use something like endogenous missing instead.

Efficiency of Weight and Unweighted estimators

We will think about efficiency only if the subjects of comparison are consistent estimators. Thus, we have to compare both of them under the circumstance where they are consistent. That is, Theorem of the Weighted estimators holds with Z=X and the objective function is E[q(W,theta)|X]. Alternatively, the following scenary must be true.

SCENARY 5: a feature of the conditional distribution is correctly specified, R depends only on X, X are completely recorded and some regularity condtions hold.

Given this scenary, it can be show that, for Weighted estimator, asym variance is the same whether the missing probabilities are estimated or are known.

This result does not depend on whether or not a generalised conditional information matrix equality (GCIME) holds. Thus, if all conditions of this scenary are satisfied, the result holds even when there is heteroskedasticity in VAR of unknown form in the context of LS (see Wooldridge 2002)

At this point, we know that, according to this scenary, both Weighted and Unweighted estimators are consistent and it does not matter whether we estimate the seletion probabilities are use the known probabilities. (then the quesion remains: Should we weight or not weight?)

However, if GCIME also holds, we can show that the asymvariance of Unweighted estimator is smaller than that of Weighted estimator.

GCIME holds for conditional MLE if the conditional DENSITY is correctly specified with the true variance =1. (so both, say, population conditional mean and the density have to be correctly specified)

So, in SCENARY 5, it is likely that we will use Unweighted estimator.

In the SCENARY 2 below, we say that we will use Weighted estimator. However, if GCIME holds, we might wanna gamble with the model specification of mean function and use Unweighted estimator.