` putexcel `

has recently become a very good friend. For those who (or working with people who) find comfort in working with tables in Excel after data processing or estimation in Stata (yes, there are others who don’t find comfort in this.) and already into Stata 13, learning `putexcel`

could be very helpful (put an end to copy-pasting!). A number of user-written commands, such as `outreg`

[1], `outreg2`

, `tabout`

, are also already available for similar purposes. What puts `putexcel`

apart is its ‘user-friendliness’ and flexibility. You can put anything anywhere in Excel without sweat.For example, suppose you are estimating a single model for a number of outcome variables and you are only interested in the parameter estimate of a specific covariate and its corresponding attributes and you want to end up with a table in an Excel sheet that looks like this:

How do you populate this sheet using

`putexcel`

? To illustrate [2], let us use the data produced by Innovations for Poverty Action and used for the impact evaluation of the Philippines KALAHI-CIDSS program.[3] Assuming that “KALAHI-CIDDS Data” folder is in your current directory, we write the following codes to load the data, do some housekeeping, and create the variables of interest:use "KALAHI-CIDDS Data/H08 Expenditure (Food).dta", clear recode Cereal_1- tobacco_2 (-99 = .) egen food = rowtotal(Cereal_1-outsidemeals_2) egen cereal = rowtotal(Cereal_1 cereal_2) gen lnfood = ln(food) gen lncereal = ln(cereal) label var lnfood "Log of total food consumption" label var lncereal "Log of total cereal consumption"

Now, we estimate the average difference in food consumption between two groups: treated (

*treatmen*= 1) and control (

*treatmen*= 0).

`reg lnfood treatmen`

This code will return the following results in the results window:

. reg lnfood treatmen Source | SS df MS Number of obs = 5938 -------------+------------------------------ F( 1, 5936) = 0.17 Model | .076469843 1 .076469843 Prob > F = 0.6828 Residual | 2718.46895 5936 .457963099 R-squared = 0.0000 -------------+------------------------------ Adj R-squared = -0.0001 Total | 2718.54542 5937 .457898842 Root MSE = .67673 ------------------------------------------------------------------------------ lnfood | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- treatmen | .0071772 .0175641 0.41 0.683 -.0272548 .0416092 _cons | 6.932916 .0124176 558.31 0.000 6.908573 6.957259 ------------------------------------------------------------------------------

How do we take only the values we want? This is where

*returned results*come into play. Returned results are calculations stored after a Stata command. Not all commands have stored results but those that matter do. In our case, we want from this table the number of observation, coefficent of

*treatmen*, and its standard error and p-value.

Typing

`ereturn list`

after estimation will return the calculations saved by Stata. In our example, the following will be displayed in the results window:. ereturn list scalars: e(N) = 5922 e(df_m) = 1 e(df_r) = 5920 e(F) = .5202063477728368 e(r2) = .0000878649729487 e(rmse) = .7219142356536746 e(mss) = .2711108253315615 e(rss) = 3085.268168745421 e(r2_a) = -.0000810391039139 e(ll) = -6472.276435645724 e(ll_0) = -6472.53661526113 e(rank) = 2 macros: e(cmdline) : "regress lncereal treatmen" e(title) : "Linear regression" e(marginsok) : "XB default" e(vce) : "ols" e(depvar) : "lncereal" e(cmd) : "regress" e(properties) : "b V" e(predict) : "regres_p" e(model) : "ols" e(estat_cmd) : "regress_estat" matrices: e(b) : 1 x 2 e(V) : 2 x 2 functions: e(sample)

Easily we can see that the number of observations is saved in a scalar called

`e(N)`

and the adjusted R-squared in scalar `e(r2_a)`

. Where are our coefficients, standard errors, and p-value? The coefficients and constant are saved in the vector `e(b)`

and the variance stored in `e(V)`

will let you calculate the standard errors. To show the contents of `e(b)`

and `e(V)`

, type `matrix list e(b)`

and `matrix list e(V)`

. We can use this by explicitly subscripting the elements in the matrices (see Generating scalars for coefficients or standard errors after regression) or use the *_variables*(underscore variables) created by Stata. After estimation, Stata creates for each

*varname*in the model the underscore variable _b[

*varname*] that stores the estimated coefficient and _se[

*varname*] that stores the standard error. What about the p-value? This can be calculated using what are already stored:

`p-value = 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen]))`

For more information, see

`help return`

, `help system variables`

, and Martin Buis’s Stata tip 54: Where did my p-values go?Now, we want the mean of our outcome variable. Instead of the mean of

*lnfood*, let us take the mean of

*food*for the sample observations used in the estimation using

`summarize`

and `e(sample)`

. The `e(sample)`

, another one out of the `ereturn list`

, returns 1 if the observation is used in the previous estimation and 0 otherwise. Thus, if we type `sum food if e(sample)`

, we want to show the summary of the variable *food*only for those observations included in our previous estimation. This returns the following:

. sum food if e(sample) Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- food | 5938 1301.023 1184.682 12 37084

How do we grab that mean? If we type

`ereturn list`

now it will still give us the one stored after the regression estimation. This is because `ereturn list`

returns results only for estimation commands like `regress`

. For general commands, we use `return list`

.. return list scalars: r(N) = 5938 r(sum_w) = 5938 r(mean) = 1301.023093635043 r(Var) = 1403472.291154946 r(sd) = 1184.682358759067 r(min) = 12 r(max) = 37084 r(sum) = 7725475.130004883

There is our mean stored in

`r(mean)`

. To summarize, we now have:----------------------------------------------------------------- What we want| Where it is stored ------------+---------------------------------------------------- N | e(N) Coeffient | _b[treatmen] or _coef[treatmen] s.e. | _se[treatmen] p-value | 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen])) Adj. R2 | e(r2_a) Mean | r(mean) -----------------------------------------------------------------

Next, we will now use

`putexcel`

to populate our Excel sheet.First, we will create results.xlsx; name a sheet to singlediff; create a table title in the first cell (A1) of the sheet; and add column headers in the second row.

putexcel A1 = ("Table 1. My table") A2 = ("Outcome") /// B2 = ("Mean (in units)") C2 = ("N") D2 = ("Coefficient") /// E2 = ("s.e.") F2 = ("p-value") G2 = ("Adj. R2") /// using results, sheet("singlediff") replace

Now, we will populate our table.

loc pvalue = 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen]) putexcel A3 = ("lnfood: Log of total food consumption") /// B3 = (r(mean)) C3 = (e(N)) D3 = (_b[treatmen]) /// E3 = (_se[treatmen]) F3 = (`pvalue') G3 = (e(r2_a)) /// using results, sheet("singlediff") modify

We then estimate the same model for

*lncereal*and append the results in the next row.

`reg lncereal treatmen`

sum cereal if e(sample)

loc pvalue = 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen])) putexcel A4 = ("lncereal: Log of total cereal consumption") /// B4 = (r(mean)) C4 = (e(N)) D4 = (_b[treatmen]) /// E4 = (_se[treatmen]) F4 = (`pvalue') G4 = (e(r2_a)) /// using results, sheet("singlediff") modify

Our updated table now looks like this:

Since the above routines are repeating, we can loop over our outcome variables of interest and shift the rows to which the results will be pasted accordingly. See Nick Cox’s Speaking Stata: How to repeat yourself without going mad.

We can rewrite our codes above as:

putexcel A1 = ("Table 1. My table") A2 = ("Outcome") /// B2 = ("Mean (in units)") C2 = ("N") D2 = ("Coefficient") /// E2 = ("s.e.") F2 = ("p-value") G2 = ("Adj. R2") /// using results, sheet("singlediff") replace * set the row into which the first outcome will be pasted loc row = 3 foreach y of varlist lnfood lncereal { loc ylab: var label `y' reg `y' treatmen loc pvalue = 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen])) loc ynew = subinstr("`y'", "ln", "", 1) sum `ynew' putexcel A`row' = ("`y': `ylab'") B`row' = (r(mean)) /// C`row' = (e(N)) D`row' = (_b[treatmen]) /// E`row' = (_se[treatmen]) F`row' = (`pvalue') /// G`row' = (e(r2_a)) using results, sheet("singlediff") modify loc row = `row' + 1 }

The Stata Blog has featured

`putexcel`

in Export tables to Excel and Retaining an Excel cell’s format when using putexcel.Notes:

[1] The recent version (version 4) of

`outreg`

was completely rewritten and there has been significant changes since. Examples in Writing a formatted regression output using -outreg- written in 2010 may be obsolete.[2] FOR ILLUSTRATION. No effort was exerted for the models to make economic or econometric sense.

[3] For the interested, follow this link to access the data or learn more about the program. Thanks to Shiel Velade and Julien Labonne for sharing the link.

Filed under: Basic functions, Basic Programming, Post-estimation | Tagged: excel, foreach, KALAHI-CIDDS, outreg, putexcel, returned results, system variables, tabout | Leave a comment »