• "Everything should be made as simple as possible, but not simpler." - Anonymous (although often attributed to Albert Einstein)
  • Enter your email address to subscribe.

  • Categories

  • Recent Comments

  • RSS Statalist: the Stata forum

  • RSS Stackoverflow [Stata]

  • Google Analytics Stats

    Period:Last 30 Days
    Total Visits:10919

Put anything anywhere in Excel without sweat


putexcel has recently become a very good friend. For those who (or working with people who) find comfort in working with tables in Excel after data processing or estimation in Stata (yes, there are others who don’t find comfort in this.) and already into Stata 13, learning putexcel could be very helpful (put an end to copy-pasting!). A number of user-written commands, such as outreg [1], outreg2, tabout, are also already available for similar purposes. What puts putexcel apart is its ‘user-friendliness’ and flexibility. You can put anything anywhere in Excel without sweat.

For example, suppose you are estimating a single model for a number of outcome variables and you are only interested in the parameter estimate of a specific covariate and its corresponding attributes and you want to end up with a table in an Excel sheet that looks like this:

Screen Shot 2014-07-17 at 11.38.02 PM
How do you populate this sheet using putexcel? To illustrate [2], let us use the data produced by Innovations for Poverty Action and used for the impact evaluation of the Philippines KALAHI-CIDSS program.[3] Assuming that “KALAHI-CIDDS Data” folder is in your current directory, we write the following codes to load the data, do some housekeeping, and create the variables of interest:
use "KALAHI-CIDDS Data/H08 Expenditure (Food).dta", clear
recode Cereal_1- tobacco_2 (-99 = .)
egen food = rowtotal(Cereal_1-outsidemeals_2)
egen cereal = rowtotal(Cereal_1 cereal_2)
gen lnfood = ln(food)
gen lncereal = ln(cereal)
label var lnfood "Log of total food consumption"
label var lncereal "Log of total cereal consumption"

Now, we estimate the average difference in food consumption between two groups: treated (treatmen = 1) and control (treatmen = 0).

reg lnfood treatmen

This code will return the following results in the results window:
. reg lnfood  treatmen

      Source |       SS       df       MS              Number of obs =    5938
-------------+------------------------------           F(  1,  5936) =    0.17
       Model |  .076469843     1  .076469843           Prob > F      =  0.6828
    Residual |  2718.46895  5936  .457963099           R-squared     =  0.0000
-------------+------------------------------           Adj R-squared = -0.0001
       Total |  2718.54542  5937  .457898842           Root MSE      =  .67673

------------------------------------------------------------------------------
      lnfood |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    treatmen |   .0071772   .0175641     0.41   0.683    -.0272548    .0416092
       _cons |   6.932916   .0124176   558.31   0.000     6.908573    6.957259
------------------------------------------------------------------------------

How do we take only the values we want? This is where returned results come into play. Returned results are calculations stored after a Stata command. Not all commands have stored results but those that matter do. In our case, we want from this table the number of observation, coefficent of treatmen, and its standard error and p-value.

Typing ereturn list after estimation will return the calculations saved by Stata. In our example, the following will be displayed in the results window:
. ereturn list

scalars:
                  e(N) =  5922
               e(df_m) =  1
               e(df_r) =  5920
                  e(F) =  .5202063477728368
                 e(r2) =  .0000878649729487
               e(rmse) =  .7219142356536746
                e(mss) =  .2711108253315615
                e(rss) =  3085.268168745421
               e(r2_a) =  -.0000810391039139
                 e(ll) =  -6472.276435645724
               e(ll_0) =  -6472.53661526113
               e(rank) =  2

macros:
            e(cmdline) : "regress lncereal  treatmen"
              e(title) : "Linear regression"
          e(marginsok) : "XB default"
                e(vce) : "ols"
             e(depvar) : "lncereal"
                e(cmd) : "regress"
         e(properties) : "b V"
            e(predict) : "regres_p"
              e(model) : "ols"
          e(estat_cmd) : "regress_estat"

matrices:
                  e(b) :  1 x 2
                  e(V) :  2 x 2

functions:
             e(sample)

Easily we can see that the number of observations is saved in a scalar called e(N) and the adjusted R-squared in scalar e(r2_a). Where are our coefficients, standard errors, and p-value? The coefficients and constant are saved in the vector e(b) and the variance stored in e(V) will let you calculate the standard errors. To show the contents of e(b) and e(V), type matrix list e(b) and matrix list e(V). We can use this by explicitly subscripting the elements in the matrices (see Generating scalars for coefficients or standard errors after regression) or use the _variables (underscore variables) created by Stata. After estimation, Stata creates for each varname in the model the underscore variable _b[varname] that stores the estimated coefficient and _se[varname] that stores the standard error. What about the p-value? This can be calculated using what are already stored:

p-value = 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen]))

For more information, see help return, help system variables, and Martin Buis’s Stata tip 54: Where did my p-values go?

Now, we want the mean of our outcome variable. Instead of the mean of lnfood, let us take the mean of food for the sample observations used in the estimation using summarize and e(sample). The e(sample), another one out of the ereturn list, returns 1 if the observation is used in the previous estimation and 0 otherwise. Thus, if we type sum food if e(sample), we want to show the summary of the variable food only for those observations included in our previous estimation. This returns the following:
. sum food if e(sample)
    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        food |      5938    1301.023    1184.682         12      37084

How do we grab that mean? If we type ereturn list now it will still give us the one stored after the regression estimation. This is because ereturn list returns results only for estimation commands like regress. For general commands, we use return list.
. return list

scalars:
                  r(N) =  5938
              r(sum_w) =  5938
               r(mean) =  1301.023093635043
                r(Var) =  1403472.291154946
                 r(sd) =  1184.682358759067
                r(min) =  12
                r(max) =  37084
                r(sum) =  7725475.130004883

There is our mean stored in r(mean). To summarize, we now have:
-----------------------------------------------------------------
What we want| Where it is stored
------------+----------------------------------------------------
N           | e(N)
Coeffient   | _b[treatmen] or _coef[treatmen]
s.e.        | _se[treatmen]
p-value     | 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen]))
Adj. R2     | e(r2_a)
Mean        | r(mean)
-----------------------------------------------------------------

Next, we will now use putexcel to populate our Excel sheet.

First, we will create results.xlsx; name a sheet to singlediff; create a table title in the first cell (A1) of the sheet; and add column headers in the second row.
putexcel A1 = ("Table 1. My table") A2 = ("Outcome") ///
    B2 = ("Mean (in units)") C2 = ("N") D2 = ("Coefficient") ///
    E2 = ("s.e.") F2 = ("p-value") G2 = ("Adj. R2") ///
    using results, sheet("singlediff") replace

Now, we will populate our table.
loc pvalue = 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen])
putexcel A3 = ("lnfood: Log of total food consumption") ///
    B3 = (r(mean)) C3 = (e(N)) D3 = (_b[treatmen]) ///
    E3 = (_se[treatmen]) F3 = (`pvalue') G3 = (e(r2_a)) ///
    using results, sheet("singlediff") modify

We then estimate the same model for lncereal and append the results in the next row.

reg lncereal treatmen
sum cereal if e(sample)

loc pvalue = 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen]))
putexcel A4 = ("lncereal: Log of total cereal consumption") ///
    B4 = (r(mean)) C4 = (e(N)) D4 = (_b[treatmen]) ///
    E4 = (_se[treatmen]) F4 = (`pvalue') G4 = (e(r2_a)) ///
    using results, sheet("singlediff") modify

Our updated table now looks like this:

Screen Shot 2014-07-19 at 5.43.53 PM

Since the above routines are repeating, we can loop over our outcome variables of interest and shift the rows to which the results will be pasted accordingly. See Nick Cox’s Speaking Stata: How to repeat yourself without going mad.

We can rewrite our codes above as:
putexcel A1 = ("Table 1. My table") A2 = ("Outcome") ///
    B2 = ("Mean (in units)") C2 = ("N") D2 = ("Coefficient") ///
    E2 = ("s.e.") F2 = ("p-value") G2 = ("Adj. R2") ///
    using results, sheet("singlediff") replace

* set the row into which the first outcome will be pasted   
loc row = 3    
foreach y of varlist lnfood lncereal {
    loc ylab: var label `y'

    reg `y' treatmen
    loc pvalue = 2*ttail(e(df_r), abs(_b[treatmen]/_se[treatmen]))

    loc ynew = subinstr("`y'", "ln", "", 1)
    sum `ynew'

    putexcel A`row' = ("`y': `ylab'") B`row' = (r(mean)) ///
        C`row' = (e(N)) D`row' = (_b[treatmen]) ///
    E`row' = (_se[treatmen]) F`row' = (`pvalue') ///
    G`row' = (e(r2_a)) using results, sheet("singlediff") modify

    loc row = `row' + 1
}

The Stata Blog has featured putexcel in Export tables to Excel and Retaining an Excel cell’s format when using putexcel.

 

Notes:
[1] The recent version (version 4) of outreg was completely rewritten and there has been significant changes since. Examples in Writing a formatted regression output using -outreg- written in 2010 may be obsolete.

[2] FOR ILLUSTRATION. No effort was exerted for the models to make economic or econometric sense.

[3] For the interested, follow this link to access the data or learn more about the program. Thanks to Shiel Velade and Julien Labonne for sharing the link.

 

No -usespss- for Mac


In Reading SPSS data file into Stata, I describe Sergiy Radyakin’s -usespss- that loads SPSS data (.sav) into Stata. I was on Windows then. -usespss- is unfortunately not available for Mac OS. StatTransfer could easily do this if you have the software. Another option is to use R.

A quick Google search led me to a simple R routine that does exactly this. Following the steps outlined in Daniel Marcelino’s Loading SPSS (.sav) into Stata, I managed to covert an SPSS data set I downloaded from IDRA UCLA website into a Stata .dta file.

Try it out! It is easy to follow.

An interesting extension is Gabriel Rossman’s importspss.ado (requires R). It implements the R routine as an ado-file.

Tell me, where did I go wrong


If you are Filipino, you are most likely singing the title by now :)

Looking for a missing bracket, a misplaced comma, or a space that shouldn’t be there—or debugging in general—can be a pain sometimes. When the usual error message fails to point out where you messed up, try turning trace on to track down the error. trace literally traces the execution of programs. It echoes the lines that Stata executes internally. Reading through the whole thing on your results window can be daunting but you don’t really have to go through those. You just need to know which line it stops executing and see why it stopped there. To turn on trace, type:

set trace on
And.. don’t forget to turn it off when you don’t need it. They can be really very long.

set trace off
Next time you wonder where you went wrong, use trace before you lose your mind. For more options, see help set trace.

Now, continue with the singing.. “what did I do to make you change your mind completely..”

 
%d bloggers like this: