Put anything anywhere in Excel without sweat

putexcel has recently become a very good friend. For those who (or working with people who) find comfort in working with tables in Excel after data processing or estimation in Stata (yes, there are others who don’t find comfort in this.) and already into Stata 13, learning putexcel could be very helpful (put an end to copy-pasting!). A number of user-written commands, such as outreg [1], outreg2, tabout, are also already available for similar purposes. What puts putexcel apart is its ‘user-friendliness’ and flexibility. You can put anything anywhere in Excel without sweat.

For example, suppose you are estimating a single model for a number of outcome variables and you are only interested in the parameter estimate of a specific covariate and its corresponding attributes and you want to end up with a table in an Excel sheet that looks like this:

Screen Shot 2014-07-17 at 11.38.02 PM
How do you populate this sheet using putexcel? To illustrate [2], let us use the data produced by Innovations for Poverty Action and used for the impact evaluation of the Philippines KALAHI-CIDSS program.[3] Assuming that “KALAHI-CIDDS Data” folder is in your current directory, we write the following codes to load the data, do some housekeeping, and create the variables of interest:
use "KALAHI-CIDDS Data/H08 Expenditure (Food).dta", clear
recode Cereal_1- tobacco_2 (-99 = .)
egen food = rowtotal(Cereal_1-outsidemeals_2)
egen cereal = rowtotal(Cereal_1 cereal_2)
gen lnfood = ln(food)
gen lncereal = ln(cereal)
label var lnfood "Log of total food consumption"
label var lncereal "Log of total cereal consumption"

Now, we estimate the average difference in food consumption between two groups: treated (treatmen = 1) and control (treatmen = 0).

reg lnfood treatmen

This code will return the following results in the results window:
. reg lnfood  treatmen

      Source |       SS       df       MS              Number of obs =    5938
-------------+------------------------------           F(  1,  5936) =    0.17
       Model |  .076469843     1  .076469843           Prob > F      =  0.6828
    Residual |  2718.46895  5936  .457963099           R-squared     =  0.0000
-------------+------------------------------           Adj R-squared = -0.0001
       Total |  2718.54542  5937  .457898842           Root MSE      =  .67673

      lnfood |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    treatmen |   .0071772   .0175641     0.41   0.683    -.0272548    .0416092
       _cons |   6.932916   .0124176   558.31   0.000     6.908573    6.957259

How do we take only the values we want? This is where returned results come into play. Returned results are calculations stored after a Stata command. Not all commands have stored results but those that matter do. In our case, we want from this table the number of observation, coefficent of treatmen, and its standard error and p-value.

Typing ereturn list after estimation will return the calculations saved by Stata. In our example, the following will be displayed in the results window:
. ereturn list

                  e(N) =  5922
               e(df_m) =  1
               e(df_r) =  5920
                  e(F) =  .5202063477728368
                 e(r2) =  .0000878649729487
               e(rmse) =  .7219142356536746
                e(mss) =  .2711108253315615
                e(rss) =  3085.268168745421
               e(r2_a) =  -.0000810391039139
                 e(ll) =  -6472.276435645724
               e(ll_0) =  -6472.53661526113
               e(rank) =  2

            e(cmdline) : "regress lncereal  treatmen"
              e(title) : "Linear regression"
          e(marginsok) : "XB default"
                e(vce) : "ols"
             e(depvar) : "lncereal"
                e(cmd) : "regress"
         e(properties) : "b V"
            e(predict) : "regres_p"
              e(model) : "ols"
          e(estat_cmd) : "regress_estat"

                  e(b) :  1 x 2
                  e(V) :  2 x 2


Easily we can see that the number of observations is saved in a scalar called e(N) and the adjusted R-squared in scalar e(r2_a). Where are our coefficients, standard errors, and p-value? The coefficients and constant are saved in the vector e(b) and the variance stored in e(V) will let you calculate the standard errors. To show the contents of e(b) and e(V), type matrix list e(b) and matrix list e(V). We can use this by explicitly subscripting the elements in the matrices (see Generating scalars for coefficients or standard errors after regression) or use the _variables (underscore variables) created by Stata. After estimation, Stata creates for each varname in the model the underscore variable _b[varname] that stores the estimated coefficient and _se[varname] that stores the standard error. What about the p-value? This can be calculated using what are already stored:

p-value = 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen]))

For more information, see help return, help system variables, and Martin Buis’s Stata tip 54: Where did my p-values go?

Now, we want the mean of our outcome variable. Instead of the mean of lnfood, let us take the mean of food for the sample observations used in the estimation using summarize and e(sample). The e(sample), another one out of the ereturn list, returns 1 if the observation is used in the previous estimation and 0 otherwise. Thus, if we type sum food if e(sample), we want to show the summary of the variable food only for those observations included in our previous estimation. This returns the following:
. sum food if e(sample)
    Variable |       Obs        Mean    Std. Dev.       Min        Max
        food |      5938    1301.023    1184.682         12      37084

How do we grab that mean? If we type ereturn list now it will still give us the one stored after the regression estimation. This is because ereturn list returns results only for estimation commands like regress. For general commands, we use return list.
. return list

                  r(N) =  5938
              r(sum_w) =  5938
               r(mean) =  1301.023093635043
                r(Var) =  1403472.291154946
                 r(sd) =  1184.682358759067
                r(min) =  12
                r(max) =  37084
                r(sum) =  7725475.130004883

There is our mean stored in r(mean). To summarize, we now have:
What we want| Where it is stored
N           | e(N)
Coeffient   | _b[treatmen] or _coef[treatmen]
s.e.        | _se[treatmen]
p-value     | 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen]))
Adj. R2     | e(r2_a)
Mean        | r(mean)

Next, we will now use putexcel to populate our Excel sheet.

First, we will create results.xlsx; name a sheet to singlediff; create a table title in the first cell (A1) of the sheet; and add column headers in the second row.
putexcel A1 = ("Table 1. My table") A2 = ("Outcome") ///
    B2 = ("Mean (in units)") C2 = ("N") D2 = ("Coefficient") ///
    E2 = ("s.e.") F2 = ("p-value") G2 = ("Adj. R2") ///
    using results, sheet("singlediff") replace

Now, we will populate our table.
loc pvalue = 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen])
putexcel A3 = ("lnfood: Log of total food consumption") ///
    B3 = (r(mean)) C3 = (e(N)) D3 = (_b[treatmen]) ///
    E3 = (_se[treatmen]) F3 = (`pvalue') G3 = (e(r2_a)) ///
    using results, sheet("singlediff") modify

We then estimate the same model for lncereal and append the results in the next row.

reg lncereal treatmen
sum cereal if e(sample)

loc pvalue = 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen]))
putexcel A4 = ("lncereal: Log of total cereal consumption") ///
    B4 = (r(mean)) C4 = (e(N)) D4 = (_b[treatmen]) ///
    E4 = (_se[treatmen]) F4 = (`pvalue') G4 = (e(r2_a)) ///
    using results, sheet("singlediff") modify

Our updated table now looks like this:

Screen Shot 2014-07-19 at 5.43.53 PM

Since the above routines are repeating, we can loop over our outcome variables of interest and shift the rows to which the results will be pasted accordingly. See Nick Cox’s Speaking Stata: How to repeat yourself without going mad.

We can rewrite our codes above as:
putexcel A1 = ("Table 1. My table") A2 = ("Outcome") ///
    B2 = ("Mean (in units)") C2 = ("N") D2 = ("Coefficient") ///
    E2 = ("s.e.") F2 = ("p-value") G2 = ("Adj. R2") ///
    using results, sheet("singlediff") replace

* set the row into which the first outcome will be pasted   
loc row = 3    
foreach y of varlist lnfood lncereal {
    loc ylab: var label `y'

    reg `y' treatmen
    loc pvalue = 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen]))

    loc ynew = subinstr("`y'", "ln", "", 1)
    sum `ynew'

    putexcel A`row' = ("`y': `ylab'") B`row' = (r(mean)) ///
        C`row' = (e(N)) D`row' = (_b[treatmen]) ///
    E`row' = (_se[treatmen]) F`row' = (`pvalue') ///
    G`row' = (e(r2_a)) using results, sheet("singlediff") modify

    loc row = `row' + 1

The Stata Blog has featured putexcel in Export tables to Excel and Retaining an Excel cell’s format when using putexcel.


[1] The recent version (version 4) of outreg was completely rewritten and there has been significant changes since. Examples in Writing a formatted regression output using -outreg- written in 2010 may be obsolete.

[2] FOR ILLUSTRATION. No effort was exerted for the models to make economic or econometric sense.

[3] For the interested, follow this link to access the data or learn more about the program. Thanks to Shiel Velade and Julien Labonne for sharing the link.


Leave a Reply