%let startdate= '31Jan1980'd; %let enddate= '31Jan2000'd; %macro justaloop; %let startdate=%sysfunc(putn(&startdate, 8.)); %let enddate=%sysfunc(putn(&enddate, 8.)); %let date=&startdate; %do %while(&date <= &enddate); data tempdsf; set crsp.dsf; where intnx('month', date, 1)-1 = &date; run; <code to calculate correlations> %let date=%eval(%sysfunc(intnx(month, &date+1, 1))-1); %end; %mend; %justaloop;
This is inefficient because it takes long to subset DSF each time. Instead, do the following:
Sorting DSF and creating an index are simple:
proc sort data=crsp.dsf out=dsf; where date>=&startdate and date<=&enddate; by date; run; data dsfindex(keep=date firstobs lastobs); set dsf; by date; retain firstobs; if first.date then do; firstobs=_n_; end; if last.date then do; lastobs=_n_; output; end; run;
DSFINDEX now contains, for each date, the observation numbers at which the observations for that date begin and end.
Within the loop, you then say:
data _null_; set dsfindex; where date=&date; call symput('firstobs', firstobs); call symput('lastobs', lastobs); run; data tempdsf; set dsf(firstobs=&firstobs obs=&lastobs); run;
The DATA _NULL_ statement just means that I am not creating a new dataset when I subset DSFINDEX. The only reason I am opening DSFINDEX is to get the values of FIRSTOBS and LASTOBS into the corresponding macro variables.
I then subset DSF to get TEMPDSF, keeping only the observations numbered between &FIRSTOBS and &LASTOBS. Note that the obs= statement tells you the last observation that will be read, not the number of observations to read.
TEMPDSF now contains all the observations for which DATE equals &DATE. The savings here are enormous: this takes virtually no time.
The complete code thus reads:
%let startdate= '31Jan1980'd; %let enddate= '31Jan2000'd; proc sort data=crsp.dsf out=dsf; where date>=&startdate and date<=&enddate; by date; data dsfindex(keep=date firstobs lastobs); set dsf; by date; retain firstobs; if first.date then do; firstobs=_n_; end; if last.date then do; lastobs=_n_; output; end; run; %macro justaloop; %let startdate=%sysfunc(putn(&startdate, 8.)); %let enddate=%sysfunc(putn(&enddate, 8.)); %let date=&startdate; %do %while(&date <= &enddate); data _null_; set dsfindex; where date=&date; call symput('firstobs', firstobs); call symput('lastobs', lastobs); run; data tempdsf; set dsf(firstobs=&firstobs obs=&lastobs); run; <code to calculate correlations> %let date=%eval(%sysfunc(intnx(month, &date+1, 1))-1); %end; %mend; %justaloop;