_n and _N are Stata system variables—they exist whether you like them or not. They are also referred to as “underscore variables” for the obvious reason that they are written as

*_variable*. Little _n contains the line number of the current observation, while its big brother _N contains the total number of observations in the data.

_n is often used to generate unique codes for each observation:

**gen**

*code*=_n /* generates the variable code that contains the integer 1 (for the first obervation, _n==1) to _N (for the last observation, _n==_N) */

Or to refer to neighboring observations (also called subscripting):

**gen**

*gdplag*=

*gdp*[_n-1] /* generates the variable

*gdplag*, which is equal to the preceding observation’s gdp */

**gen**

*gdpgrowth*=(

*gdp*/

*gdp*[_n-1] – 1)*100 /* generates the variable

*gdpgrowth,*the growth rate for the variable

*gdp*

**/*

[Note: The

*gdplag*and

*gdpgrowth*for the first observation (_n==1) will be missing since observation [_n-1] does not exist for _n==1.]

Make sure, however, that you refer to the right neighbor! For example, if you are calculating the growth rate of variable

*gdp*between 1999 and 2000,

*gdp*must be in order such that the

*gdp*subscripted by [_n-1] is the

*gdp*for 1999. This is easily addressed by invoking the -sort- command, “

**sort**

*year*,” before generating the growth rate variable. There is another complication, however, when you are calculating this for different groups of observations, say by country. Will “

**sort**

*country year*” before generating the variable suffice? No. Why? Because the [_n-1] for the first observation of country B refers to the last observation of country A. Here is where Super -bysort- comes to the rescue:

**bysort**

*country year*

**:**

**gen**

*gdplag*=

*gdp*[_n-1]

**bysort**

*country year*

**: gen**

*gdpgrowth*=(

*gdp*/

*gdp*[_n-1] – 1)*100

// Another syntax for bysort is:

**by**

*country*(

*year*),

**sort:**…

_n may also be used to keep the nth observation by group:

**bysort**

*householdid*:

**keep**if _n==1 /* keeps the first observation for each

*householdid**/

Big brother _N, on the other hand, may be used to generate a variable that contains the number of observations by group:

**bysort**

*householdid*:

**gen**

*householdsize*=_N /* generates the variable

*householdsize*, which is equal to the number of observations for each

*householdid*. */

What we have illustrated above are just a few examples to showcase the potential of underscore variables _n and _N. For sure, you will find other uses of _n and _N. Another underscore variable is the beautiful number π, which, as you would’ve guessed, is written as _pi.

Filed under: Basic functions Tagged: | _n, bysort, subscripting, system variables

Jhiedonf, on 2 September 2010 at 12:34 AM said:Hi statadaily, i think you can also try

sort year

tsset gdp /*this will identify gdp as a time series variable */

gen gdp_previous=l.gdp