Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Vladimír Hlásny <vhlasny@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Re: GMM minimization of regional errors imputed from hhd level model |

Date |
Wed, 11 Dec 2013 13:04:38 +0900 |

Reposting the code without errors due to comments spilling into the following lines. Sorry about that. Vladimir clear all set obs 500 gen x = 5+uniform() /*create individuals' incomes*/ gen psucode=group(20) /*create 10 regions with N/20 sampled individuals each*/ bysort psucode: egen avgx=mean(x) gen weight=0.7*(avgx-4) /*create region-level sampling weights*/ gen population= round(3*_N/20+_N/20*(avgx-4)) /*create region-level true population, 4+ times larger than region sample (correlated with income)*/ gen sub = 0.25 /*create overall sampling rate (e.g., sample has 25% of true population)*/ drop avgx program drop _all prog nonresp, eclass version 10 syntax varlist(numeric) [if] [in] [, samplewt(varlist) subsamplewt(varlist) myidvar(varlist)] local lhs: word 1 of `varlist' replace `lhs' = `lhs'*`subsamplewt' /*multiply regional population by the percentage that we have in the sample*/ local rhs: list varlist - lhs tempvar region egen `region' = group(`myidvar') /*create region indicators*/ matrix b = J(1,`:word count `rhs' _cons',2) matrix b[1,1]=-0.2 matname b `rhs' _cons, c(.) mata: m_nonresp("`lhs'", "`rhs'", "`samplewt'", "`region'") ereturn post b, e(`touse') depname(`lhs') ereturn display end mata: void logistic(todo,b,crit,g,H) { external y,X,W,W2,addobs p = invlogit(X*b') /*response probability = logistic of x [n*1]*/ m = y - (addobs'*(1:/p)) /*region error = region population - (sum of individual inverse probs in region) [regions*1]*/ crit = m'*(W:*m) /*weight errors by size of region, sum up squared errors, and minimize*/ } void m_nonresp(string scalar lhs, string scalar rhs, string scalar samplewt, string scalar region) { external y,X,W,W2,addobs st_view(y1=., ., lhs) st_view(regn=., ., region) st_view(W1=., ., samplewt) st_view(X=., ., rhs) X = X,J(rows(X),1,1) addobs = J(rows(X),max(regn),0) /*create a sparse matrix [n*regions] assigning individuals to regions*/ for (i=1; i<=rows(X); i++) { addobs[i,regn[i]] = 1 } y2 = y1:*addobs y = colmax(y2)' /*create population by region [regions*1]*/ W2 = colmax(W1:*addobs)' W = W2:/y /*region error weight = sampling weight / region population [regions*1]*/ S = optimize_init() optimize_init_evaluator(S, &logistic()) optimize_init_which(S,"min") optimize_init_evaluatortype(S,"d0") optimize_init_params(S,st_matrix("b")) p = optimize(S) st_replacematrix("b",p) } end nonresp population x, samplewt(weight) subsamplewt(sub) myidvar(psucode) bootstrap, reps(3): nonresp population x, samplewt(weight) subsamplewt(sub) myidvar(psucode) On Wed, Dec 11, 2013 at 12:42 PM, Vladimír Hlásny <vhlasny@gmail.com> wrote: > Hi, I have now tried Austin's suggestion - optimize in Mata. The great > news is that I can do the optimization at the regional level (using > matrices), while using individual-level variables to estimate > coefficients - exactly what I needed. The remaining problems are that > 1) the model is sensitive and sometimes doesn't want to converge (even > using a subset of data that have worked), 2) solving is sometimes slow > for 1000+ observations, and 3) bootstrapping of errors also appears > sensitive and extremely slow even with 200 observations. > > Is there a faster way to calculate my standard errors? Being new to > Mata, I wonder if I could define variables or do some calculations > more efficiently. > > In the code below, I create a fake dataset that looks like Current > Population Survey data. Sample size can be varied from say 100 to > 10000. Then essentially 1) the logistic function calculates the > probability of survey-response by each individual as a function of > coefficients, 2) these probabilities, inverted, are added up for each > region to derive the predicted population in each region, and 3) these > predicted regional populations are fitted against observed regional > populations. Refer to equations 2,6,7 in Korinek, Mistiaen and > Ravallion (2007), An econometric method of correcting for unit > nonresponse bias in surveys, J. of Econometrics 136. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: _rmcoll issues** - Next by Date:
**RE: st: _rmcoll issues** - Previous by thread:
**st: Re: GMM minimization of regional errors imputed from hhd level model** - Index(es):