• "Everything should be made as simple as possible, but not simpler." - Anonymous (although often attributed to Albert Einstein)
  • Categories

  • Recent Comments

  • RSS N.E.C. The Stata Blog

  • RSS Statalist: the Stata forum

  • RSS Stackoverflow [Stata]

  • Google Analytics Stats

  • Enter your email address to subscribe.

Stata-MySQL a first encounter

In late April, I had my first encounter with .myd, .myi, and .frm files. The challenge was to read these directly from Stata so that users can easily work on the data using Stata. I was told that these are SQL databases. I initially thought that this can easily be read using a straightforward use of -odbc- command. But the file extensions .myd, .myi, and .frm are not among the choices avilable in Windows ODBC Data Source Administrator*. Google…google…google.

Thanks to Andrew Dyck’s post Connect to MySQL database using Stata. His Step 0 and Step 1 is exactly the second half of what I needed to do. But, as I had no idea how SQL works, it took me a while to understand what I needed to do in the first half. I confused the term ‘MySQL server’ to a specific physical machine other than my own PC. Eventually I figured I need MySQL server installed. For those who have their first encounters like me, here is the first half: Step -2 and Step -1.

Step -2: Download/Install the MySQL server**
Not knowing any better, I just used all default settings during the installation, and  typed “pass” as password when asked. I did not bother to change the default username “root”.

You did OK if you can open the MySQL Command Line Client (screen shot below) from the start menu. As you can see (or maybe not because they are too small) from the screenshot, I typed “show databases;” to see the databases that the system has access to. I found that only the default MySQL databases are in the list. Why can’t it see my database? Because my folder was in the desktop.

I guess it is like Stata. If you put your data or ado-file in the wrong place, Stata will not be able to find it. Where is the right place?

Step -1: Paste the folder with your data in the right place
That right place can be found in my.ini file (in my case, my.ini file is in “C:\Program Files\MySQL\MySQL Server 5.5″).

#Path to the database root
datadir=”C:\ProgramData\MySQL\MySQL Server 5.5\data”

Note that C:\ProgramData\ is, by default, a hidden folder. If you can’t find it, it is either you have not changed the settings in folder options or you have no admin rights.

Can you change this data directory? Yes. First, you need to stop MySQL from running (closing the MySQL Client window is not enough). You can use Windows task manager to end the program mysqld.exe. Second, move the entire data folder from old path to the new path. Third, change the data directory path in my.ini. In my case, my new data directory path is:


Now, you can move your folder (with the .myd, .myi, and .frm files) to this new data directory. Reboot.

Follow Andrew Dyck’s Step 0 and Step 1. Note that the connectionstring() used in Step 1 was only added to -odbc- late last year (see Stata’s blog). Make sure your Stata is up-to-date.

It also helps to learn a few SQL syntax so that you can load only selected lines or those that meet specific conditions, or load everything (see MySQL documentation). To illustrate, we use the built-in MySQL database information_schema.

Example 1: Load rows 1-20 of table TABLES
odbc load, exec(“SELECT * FROM TABLES LIMIT 20“) conn(“DRIVER={MySQL ODBC 5.1 Driver};SERVER=localhost;DATABASE=information_schema;UID=root;PWD=pass;”) clear

Example 2: Load observations where the variable ROW_FORMAT==”Fixed”
odbc load, exec(“SELECT * FROM TABLES WHERE ROW_FORMAT=’Fixed'”) conn(“DRIVER={MySQL ODBC 5.1 Driver};SERVER=localhost;DATABASE=information_schema;UID=root;PWD=pass;”) clear

Example 3: Load everything from table TABLES
odbc load, exec(“SELECT * FROM TABLES“) conn(“DRIVER={MySQL ODBC 5.1 Driver};SERVER=localhost;DATABASE=information_schema;UID=root;PWD=pass;”) clear

*Control Panel–>Administrtive Tools–>Data Sources (ODBC)

** Complications arise if you have no admin rights to the PC. I am using a 64-bit Windows 7 machine; I am not sure if this works the same in other platforms as well.

-encode- it

Over lunch today, a friend asked how to generate a new variable that will have unique numeric IDs corresponding to the string values in an existing variable.  The first command that comes to mind is -encode-*. -encode- generates a numeric variable from a string variable and uses the string values as labels for the generated numeric values. Its partner, -decode-, does the reverse. To illustrate, let’s use the overused** auto.dta:

sysuse auto, clear
encode make, gen(make_id)

By default, the order of the number generated corresponds to the alphabetical order of the string variable.

What -encode- does is to save you from writing longer codes, such as:

sysuse auto, clear
gen byte make_id = .
replace make_id = 1 if make == “AMC Concord”
replace make_id = 2 if make == “AMC Pacer”
replace make_id = 74 if make == “Volvo 260″
/* By the time you get here, you could have finished an episode of
“The Big Bang Theory” */

label define make 1 “AMC Concord” 2 “AMC Pacer” …
/* and another episode here */


or the more complex but unnecessary

sysuse auto, clear
levelsof make, local(l)
gen byte make_id = .
local id = 1
foreach i of local l{
replace make_id = id' if <em>make </em>== "i'”
label define make id' "i'”, add
local id = `id’ + 1
label values make_id make

Another way is to use -group- under -egen-. Example:

sysuse auto, clear
egen make_id = group(make)

But then you still have to create and attach the value labels to make_id. Nick Cox pointed out in his comment that -group- has a -label- option.

See -help encode- for more options and for its counterpart -decode-.

*I came across -encode- in Christopher Baum’s An Introduction to Stata Programming

**Does one lose a byte when data is overused? Sort of the ‘wear-and-tear’ we see in most things that aren’t invisible.

Getting to know “factor variables”

This is an update to the earlier post i. without the prefix -xi-. So the i.‘s (or “i options” as Joe Glass called it) have a name. Stata calls them “factor variables” and there is more to them than i. .See -help fvvarlist- for the documentation and some very helpful examples.
%d bloggers like this: