Functions: inlist()

I once was asked what is wrong about the code similar to the one below:

gen asean4 = 1 if countryname == “Indonesia” | “Malaysia” | “Philippines” | “Thailand”

This is a common mistake. Understandably, the assumption that repeating the left side of the expression, in this case ‘countryname’, is redundant is not far-off. Alas, Stata requires it and the correct syntax is:

gen asean4 = 1 if countryname == “Indonesia” | countryname == “Malaysia” ///
| countryname == “Philippines” | countryname == “Thailand”

But we can do better by using the built-in function inlist(). Learning a little bit more about Stata’s built-in functions can be very convenient (sometimes necessary)—shorter codes, faster processing, more facebook time. Using inlist(), the equivalent code is:

gen asean4 = 1 if inlist(countryname, “Indonesia”, “Malaysia”, “Philippines”, “Thailand”)

inlist() may also be used for numeric values. For example:

gen asean4 = 1 if inlist(countrycode, 360, 458, 608, 764)

The difference between using numeric and string values is in the number of allowable elements in the list (number of countries in our example). For numeric values, 254 elements are allowed and for string values, only 9. See -help inlist-.

9 Responses

  1. Thanks Mitch! :) Agree, so convenient. Also, saw it as Cox’s Stata tip #39.

    Keep it coming :)

  2. -inlist- (and the similar function -inrange-) were commands I learned about at the UK Stata Users Group earlier this year and wished I’d known about sooner!

    Another bonus tip: -inlist- can be used the other way around to check whether a single value appears in one of several variables, e.g.:

    gen thai = 1 if inlist(“Thailand”, country1, country2, country3)

    This way round of using -inlist- is a bit rarer than the first, but still useful from time to time!

  3. Thanks Mitch! It is convenient but doesn’t work when number of countries in your string example increases. I tried it for EU countries and the message “expression too long” came up :(

  4. Amir,
    The number of string entries allowed with inlist is quite small.
    If you’re going to use inlist with all the EU countries, you’d do better to check a numerical variable, if you can use country codes instead of country names. You may find the findit-able command kountry helpful if you have variables with a conventional string version of country names and want to convert them to some conventional numeric code.

    • Thanks Stephen. I think you are right; it is always convenient to replace string variables with numerical codes.

  5. Amir, next time stata says “expression too long”, you could try introducing “/* */”, this tells stata the command continues on the next line

    replace GH=”1″ if inlist(“Ghana”, country1, country2,/*
    */country3, country4, country5,/*
    */country6, country7, country8,/*

    Hope it helps.

    • thanks. asabere. but i think amir’s problem is that he exceeds the number of arguments allowed with inlist()—10 arguments for strings, including the variable name. one way to go about this is to use a numeric variable as stephen suggested. =D

  6. Hi Mitch

    What would be the code to find values between 2 numbers, say:

    use if inlist(DX1,800, 801,802,803…etc)

    can I do something like

    use if inlist(DX1,800-810)?

    Thank you!

    • Hi, Rafael.

      I am not sure if i understand your question correctly. But based on your example.. it looks like you want to load a subset of your data.

      You may use the -if- or -in- qualifiers for this. See -help use-.

      syntax: use [varlist] [if] [in] using filename [, clear nolabel]

      The function inslist() does not allow range of values as arguments, numbers or strings (but not combined) must be separated by commas.


Leave a Reply

%d bloggers like this: