• "Everything should be made as simple as possible, but not simpler." - Anonymous (although often attributed to Albert Einstein)
  • Categories

  • Recent Comments

  • RSS N.E.C. The Stata Blog

  • RSS Statalist: the Stata forum

  • RSS Stackoverflow [Stata]

  • Google Analytics Stats

  • Enter your email address to subscribe.

Import data from Excel sheets


How do we import data from all sheets in a number of Excel files? Each Excel file has a different number of sheets with names of no discernible pattern, but (thankfully) each sheet has the same structure: the first observation is in the same row and the columns correspond to each other. An example is the set of 17 Excel files of census data of barangays (villages) that was provided to me. Each Excel file corresponds to one region and within each file are sheets corresponding to the province in the region.  How do we consolidate all sheets in all files into one data file?

We wish the answer would be, “With a wave of a wand…” Unfortunately, we are muggles. One way is to import every sheet of every file manually until eternity. Doing this would create a new problem: headache. How do we do it like magic?

Here are the spells we need: fs, foreach, and import excel.

fs, written by Nick Cox, lists the names of files in a specified folder and saved the list in a local macro r(files) and is downloadable via the SSC Archive:

ssc install fs

First, change the working directory to where the Excel files are (see help cd). The following block of codes reads every sheet in each of the of the Excel file in the folder and saves these into a single data file with filename newdata.dta.
fs *.xlsx  
foreach f in `r(files)' {
    import excel using `f', describe
    forvalues s = 1/`r(N_worksheet)' {
        import excel using `f',         ///
            sheet(`r(worksheet_`s')')   ///
            cellrange(`r(range_`s')')   ///
            firstrow clear
        cap append newdata
        save newdata, replace
        }   
    }    

import excel with the option describe does not actually load the data from Excel but only lists all the sheets and the cell ranges. Below is an example of the output:
. import excel using ARMM.xlsx, describe

          Sheet | Range
  --------------+--------------
        BASILAN | A1:AR237
  LANAO DEL SUR | A1:AR1214
    MAGUINDANAO | A1:AR559
           SULU | A1:AR444
      TAWI-TAWI | A1:AR229

import excel using filename, describe saves the number of sheets and the list of sheets and their corresponding ranges in macros. Typing return list gives:
. return list

scalars:
        r(N_worksheet) =  5

macros:
        r(worksheet_1) : "BASILAN"
            r(range_1) : "A1:AR237"
        r(worksheet_2) : "LANAO DEL SUR"
            r(range_2) : "A1:AR1214"
        r(worksheet_3) : "MAGUINDANAO"
            r(range_3) : "A1:AR559"
        r(worksheet_4) : "SULU"
            r(range_4) : "A1:AR444"
        r(worksheet_5) : "TAWI-TAWI"
            r(range_5) : "A1:AR229"


These saved results were used to read the sheets in each file. Note that we are assuming here that the first row of each of the sheets contains the variable names, thus the option firstrow with the second import excel command that loads the Excel sheet into memory. If this is not the case, some cleaning may be required (e.g., dropping unnecessary rows and columns) before the data is appended. How complex the cleaning maybe depends on how unstructured the data is in the sheets.

 

Haiku: cls


clean results window
in Stata thirteen, typing
cls clears screen

 

Where to buy Stata in the Philippines


In the Philippines, there has been an emerging demand in the public sector for data-driven analysis of various govenrment programs (yes, emerging.. don’t ask why.. but the leadership has a lot to do with this.. bless them). I am lucky enough to be invited to introduce Stata to some government agencies (and some private institutions) and how it can be used with their data. Often the the question where to get a copy of Stata pops up. So, for reference: here is the sole distributor of Stata in the country:

Columbia CP Ltd
Unit 515, Building 1-A
Palmdale Heights Pinagbuhatan
Pasig City, METRO MANILA
PHILIPPINES

Tel: 0-917 649-6421 (if calling from the Philippines)
Tel: 63-2-5664488 (if calling from outside the Philippines)

Email: sales@columbiacp.com
URL: http://www.columbiacp.com

I told them they could ask the distributor to come and discuss the best licensing option base on their needs. Although I really don’t know if the distributor really  does this. :p But that would be bad service if they don’t.

 
%d bloggers like this: