• "Everything should be made as simple as possible, but not simpler." - Anonymous (although often attributed to Albert Einstein)
  • Enter your email address to subscribe.

  • Categories

  • Recent Comments

  • RSS Statalist: the Stata forum

  • RSS Stackoverflow [Stata]

  • Google Analytics Stats

    Period:Last 30 Days
    Total Visits:10298

-encode- it

Over lunch today, a friend asked how to generate a new variable that will have unique numeric IDs corresponding to the string values in an existing variable.  The first command that comes to mind is -encode-*. -encode- generates a numeric variable from a string variable and uses the string values as labels for the generated numeric values. Its partner, -decode-, does the reverse. To illustrate, let’s use the overused** auto.dta:

sysuse auto, clear
encode make, gen(make_id)

By default, the order of the number generated corresponds to the alphabetical order of the string variable.

What -encode- does is to save you from writing longer codes, such as:

sysuse auto, clear
gen byte make_id = .
replace make_id = 1 if make == “AMC Concord”
replace make_id = 2 if make == “AMC Pacer”
replace make_id = 74 if make == “Volvo 260″
/* By the time you get here, you could have finished an episode of
“The Big Bang Theory” */

label define make 1 “AMC Concord” 2 “AMC Pacer” …
/* and another episode here */


or the more complex but unnecessary

sysuse auto, clear
levelsof make, local(l)
gen byte make_id = .
local id = 1
foreach i of local l{
replace make_id = id' if <em>make </em>== "i’”
label define make id' "i’”, add
local id = `id’ + 1
label values make_id make

Another way is to use -group- under -egen-. Example:

sysuse auto, clear
egen make_id = group(make)

But then you still have to create and attach the value labels to make_id. Nick Cox pointed out in his comment that -group- has a -label- option.

See -help encode- for more options and for its counterpart -decode-.

*I came across -encode- in Christopher Baum’s An Introduction to Stata Programming

**Does one lose a byte when data is overused? Sort of the ‘wear-and-tear’ we see in most things that aren’t invisible.

Getting to know “factor variables”

This is an update to the earlier post i. without the prefix -xi-. So the i.‘s (or “i options” as Joe Glass called it) have a name. Stata calls them “factor variables” and there is more to them than i. .See -help fvvarlist- for the documentation and some very helpful examples.

World Bank’s open data policy and -wbopendata-

Last year, a friend from the World Bank (Manila) sent an email about World Bank’s open access policy that allows free download access to thousands of indicators from the World Bank data catalog. As I always had access to World Bank data sets via our institution’s subscription, I took this information for granted. This is not to say that I ignore the implications of this initiative. The World Bank model puts pressure on governments and other development agencies to follow. It is odd that there are still countries in the world today where economic data, such as GDP or inflation data, are not made public.

It is only a matter of time that applications, not only to automate data download, but also to present these wealth of information in ingenious ways will be made available. ESRI, for example, published a free web application that maps any one of more than a thousand economic and financial indicators for any region of the world. In the screen shot below, the size of the bubbles represents workers’ remittances inflows to countries in Asia.

To bring out the best of ideas, the World Bank initiated the “Apps for Development” competition, a challenge to software developers and development practitioners to create innovative apps using World Bank data (vote for your favorite apps here).

For Stata users, -wbopendata- (J.P. Azevedo 2011) is the module to access data from World Bank data catalog. -wbopendata- is easy to use but note that it requires an internet connection. First, install -wbopendata- via SSC:

ssc install wbopendata

-wbopendata- allows you to download (i) all indicators for a specific country for all years or (ii) a specific indicator for all countries and for all years or (iii) a set of indicators within a specified topic for all countries and for all years. -wbopendata- loads the data into the Stata memory. For example, to download all data available for the Philippines for all years, type:

wbopendata, country(phl) clear

This returns data for 972 indicators from 1960 to the latest year available. The default data display is in wide format. To display the data in a long format, use the ‘long’ option:

wbopendata, country(phl) long clear

To download GDP per capita (in constant PPP $) for all countries, type:

wbopendata, indicator(ny.gdp.pcap.pp.kd) clear

Lastly, to download all indicators under the topic “Poverty” for all countries, type:

wbopendata, topics(11) clear

The list of countries, topics, and indicators and their corresponding codes are documented in the help file (see -help wbopendata-). -wbopendata- also has other options not mentioned here.
%d bloggers like this: