data dsf;
set crsp.dsf(keep=permno date ret);
where permno in(10001, 10002, 10003);
run;
instead of
data dsf;
set crsp.dsf(keep=permno date ret);
if permno=10001 or permno=10002 or permno= 10003;
run;
What happens when you do not know in advance what permnos you want to keep? Or if you do know, but the list is a headache to type out?
In that case, you can replace the contents of the where with a macro variable. Suppose some previous steps produce a dataset, named KEEPTHESE that contains the permnos you want to keep:
Obs PERMNO
1 10000
2 10006
3 10009
4 10021
Given this dataset, I want to subset CRSP.DSF, keeping only the permnos listed in KEEPTHESE. To do this, I first run the following data step, which creates several macro variables.
data _null_;
set keepthese nobs=nobs;
if _n_ =1 then call symput("nobs", nobs);
call symput("permname"||trim(left(put(_N_,4.))), permno);
run;
The first line after the SET statement creates a macro variable named NOBS, which contains the number of observations in KEEPTHESE. The second line creates a series of macro variables named permname1, permname2, permname3, ... containing the permnos in the order in which they appeared in KEEPTHESE. Now inside a macro, I concatenate all these macro variables.
%macro justconcatenate;
%let listofpermnos=&permname1;
%do i =2 %to &nobs;
%let listofpermnos =%sysfunc(catx(%str(,),&listofpermnos, "%trim(&&permname&i)"));
%end;
%let listofpermnos=(&listofpermnos);
%mend;
%justconcatenate;
This macro just creates a macro variable named LISTOFPERMNOS whose value is set to the contents of the macro variable PERMNAME1. I then join all the remaining PERMNAMEs to this LISTOFPERMNOS with commas as delimiters.
Finally, I can use the macro variable so created:
data dsf;
set crsp.dsf(keep=permno date ret);
where permno in &listofpermnos;
run;
This might seem like a lot of bother for nothing, but supposing you are work with a large dataset, like TAQ, and you need to keep observations with symbols in a dataset like KEEPTHESE. Sorting and merging is not an option. This method works well, as would hashes(see section 12). I have not compared this method against a hash.