next up previous
Next: Macro loops make things Up: Efficiency Previous: Never use proc sql

Don't sort more than strictly necessary

Sorting takes by far the most time of most programs.

Use the fact that the datasets you use are already sorted. DSF and MSF are sorted by permno date. A splendid example of this is the code on the WRDS website which merges TAQ.CT with TAQ.CQ.

Instead of sorts to merge, use hashes - Section 12), which do not require sorting and are faster anyway.

Instead of using proc means with a by statement, that is, instead of saying

proc sort data=crsp.msf out=msf;
  by date;
run;

proc means data=msf noprint;  
  by date;
  var ret;
  output out=summarystats(drop=_type_ _freq_) mean=meanret;
run;

say

proc means data=crsp.msf noprint nway;  
  class date;
  var ret;
  output out=summarystats(drop=_type_ _freq_) mean=meanret;
run;

I replaced the ``by'' with a ``class''. I also used the NWAY option, which restricts the output to the ``outremost level''. Without it, SAS produces output for all combinations of the CLASS variables. MEANING WHAT?


Andre de Souza 2012-11-19