Stata: Dealing with Date variables

Monday, May 21, 2012

Dealing with date variables in dataset requires additional steps. If not done correctly, your data format may not be consistent and you will get your value wrong. Here are some tips related to date variables.

Generating date variables

If you import your data from excel, your date variables may be treated as 1) string variables, or 2) integer (less likely). An example of date variable in string format is: 1 Jan 1960. If in integer format, it would be like 17869. These two formats can be easily converted into date format.

String format

If your value is like 1 Jan 1960, you can use the following code:

gen newvar = date(oldvar, "DMY”)

After running this line, you will see that your value would be like 0, or 1046, because these values are stored in integer format. The value represents the days before or after 1 Jan, 1960. Positive value means after 1 Jan, 1960 and negative value means prior to the date. This format is commonly used in different dataset. Don’t be suprised and we will convert it to date format later.

Integer format

If your values are in integer format, you need the following line to convert it to date format.

Simple version

format newvar %d

A bit complicated version:

format %tdnn/dd/CCYY newvar

You can choose either one.

Date comparisons

Once you convert to date format, it is very easy to compare two dates. If you want to see how many days between two date variables, simply use gen command and use one minus another.

Add or minus from a date

If you want to add, for example, 120 days to your date variable or minus 120 days, it is very similar to date comparisons. Simply generate another variable, which equals to your date variable add or minus 120 days.

Comparison with a specific date

If you want to compare a date varialbe with a specific date, such as 1 Jan 2000, you can use the following code:

gen newvar = (mdy(1,1,2000) - oldvar)

The code above will show you how many days before or after 1 Jan, 2000. If your date varialbe is birthday, you can divide the value with 365.25 and get how old your participants are on 1 Jan, 2000.

Further reading

Using dates in Stata http://www.ats.ucla.edu/stat/stata/modules/dates.htm