Stata: Dealing with Date variables

Monday, May 21, 2012

Dealing with date variables in dataset requires additional steps. If not done correctly, your data format may not be consistent and you will get your value wrong. Here are some tips related to date variables.

Generating date variables

If you import your data from excel, your date variables may be treated as 1) string variables, or 2) integer (less likely). An example of date variable in string format is: 1 Jan 1960. If in integer format, it would be like 17869. These two formats can be easily converted into date format.

String format

If your value is like 1 Jan 1960, you can use the following code:

gen newvar = date(oldvar, "DMY”)

After running this line, you will see that your value would be like 0, or 1046, because these values are stored in integer format. The value represents the days before or after 1 Jan, 1960. Positive value means after 1 Jan, 1960 and negative value means prior to the date. This format is commonly used in different dataset. Don’t be suprised and we will convert it to date format later.

Integer format

If your values are in integer format, you need the following line to convert it to date format.

Simple version

format newvar %d

A bit complicated version:

format %tdnn/dd/CCYY newvar

You can choose either one.

Date comparisons

Once you convert to date format, it is very easy to compare two dates. If you want to see how many days between two date variables, simply use gen command and use one minus another.

Add or minus from a date

If you want to add, for example, 120 days to your date variable or minus 120 days, it is very similar to date comparisons. Simply generate another variable, which equals to your date variable add or minus 120 days.

Comparison with a specific date

If you want to compare a date varialbe with a specific date, such as 1 Jan 2000, you can use the following code:

gen newvar = (mdy(1,1,2000) - oldvar)

The code above will show you how many days before or after 1 Jan, 2000. If your date varialbe is birthday, you can divide the value with 365.25 and get how old your participants are on 1 Jan, 2000.

Further reading

Using dates in Stata

How to know if one journal is a SSCI or SCI journal?

Friday, April 27, 2012

SCI stands for science citation index, and SSCI stands for social science citation index. I personally do not think SCI and SSCI are very important in American academia; however, in some countries, such as Taiwan and China, SSCI and SCI are used to rank universities. In this situation, knowing your target journal is SCI or SSCI journal is critical.

A couple ways to achieve this goal.

1) Check out at SCI and SSCI official site (recommendation: 4 out of 5)


Take SSCI site for example. If I want to know if Foreign Language Annals is a SSCI journal, simply typing foreign in the search box.
Hit search and you will see the results. Voila, it is a SSCI journal.

The second approach (recommendation: 5 out of 5) is to use the following site:

This site is self-explanatory, and I do not see the need to do any screen capturing. 

The third approach is to use Web of Science (recommendation: 3 out of 5), but it requires annual subscription. Please check your library and see if they subscribe it.

Select  publication name and type foreign l*.3.png

Click analyze result.

It will shoe fields that you can analyze. Since we are looking for the journal, please choose source title ad click analyze.

The fourth approach is to use Journal Citation Report, but this one also require library subscription (recommendation: 4 out of 5).

Stata: Count groups by individuals

Wednesday, April 25, 2012

One friend asked me the following question:

How can I transform the following format into:

id level
1 A
1 A
1 B
2 A
2 B
3 B

this one?

id level #ofA #ofB
1 A 2 1
1 A 2 1
1 B 2 1
2 A 1 1
2 B 1 1
3 B 0 1

Well, I do not think there is one command for this task. This is not very difficult if there are only two groups to count.

The way I achieve this task is:

use "", clear
egen acount = group(level)
gsort +id +acount
by id: egen acount2 = count(acount) if acount==1
bys id: replace acount2 = acount2[_n-1] if acount2==.
replace acount2=0 if acount2==.
bys id: egen bcount2 = count(acount) if acount==2
gsort +id -level
by id: replace bcount2 = bcount2[_n-1] if bcount2==.
replace bcount2=0 if bcount2==. 

Level variable is a string variable, so I use egen to get a group id. If you are interested in learning more details, you can check my previous post:Stata: Create id by group.

After creating a new group id, I sort id and level. How do I count how many As and Bs? I count # of As using egen, but  you may notice that if the value is B, # of A would be missing. So my next step is to fill up this missing with the value of previous record. This is why I sorted data at the beginning.

If there is no A, then replace the value from missing to zero.

The way I count # of Bs is similar. The only difference is sorting.

You may be curious: how about if I have more than three groups? Well, my code only works for two groups, and I have not found a way to count three groups by individuals.

If you have tips or code to achieve the task, please let me know!


One friend shared with me her code:
foreach i in A B C D E F G H I J K L N P Q R S T U V W X Y Z {
bys id: egen nof`i'=sum(level=="`i'")}

Stata: Create id by group

Sunday, April 22, 2012

When doing your data analysis, sometimes you will encounter the following situation: in your dataset, everyone has an unique id. However, their IDs are long and each participant has multiple record (or the dataset is in a long format).

To visualize your data, you need to create a new ID for each individual regardless of how many records each person has. For example, the first person has three records, and we would like to assign a new ID 1 for the first person, and the second person would be 2.

Though it sounds difficult and tedious, it is not difficult to do so.  

egen id = group(oldid)

Just one line and your problem will be solved.