In Stata 7 and previous versions use only one default missing value “.” (without quote). If you wish to exclude missing values, it would be correct to use the:

**if variable !=.**

Sample code for OLS regression would resemble the following part:

`regress a b if c!=. & d!=. `

regress a b c if d!=.

regress a b c d

**This would be 100% correct if you used an old Stata dataset**; however, if your dataset in had different missing values, this code would be problematic. Stata 8 and later versions allows you do define different types of missing values, each of which begins with a “.” (without quote), such as .a, and .b. Therefore, if you have these missing values in your dataset and you use old code like that above,

**you would probably obtain inconsistent observation numbers**.

The correct way to perform this would be to use

**if c <.**or

**if !mi(c).**The revised code would similar to:

`regress a b if !mi(c) & !mi(d) `

regress a b c if !mi(d)

regress a b c d

What if you have 20 variables in your regression? Such if statements often result in very long lines, thereby reducing the readability of your code.

**There are two easy ways to overcome this:**1) creating a dummy called “touse” with 1 representing valid values for all variables; and 0 for at least one missing value.

`gen touse =!mi(y, a, b, c, d) `

regress y a b if touse

regress y a b c if touse

regress y a b c d if touse

2) If you don’t like this approach, you can also deal with missing values by using nestreg:

`nestreg: reg y (a b) (c) (d) `

## Comments

No response to “Stata: Dealing with missing values”

Post a Comment | Post Comments (Atom)

## Post a Comment