For instance, I may want to calculate the demeaned value of a variable. To do this, I first use proc means to calculate the mean of that variable and then merge proc means' output dataset into the parent dataset.
I describe a construct that does this efficiently.
Suppose the parent dataset is crsp.msf, and I wish to calculate the demeaned value of returns.
I first find the mean value using proc means:
proc means data=crsp.msf noprint; var ret; output out=summarystats(drop=_type_ _freq_) mean=meanret; run;SUMMARYSTATS has one variable named meanret, and one observation. To merge this back into msf, I say:
data msf; if _n_ =1 then set summarystats; set msf; run;This creates the additional variable meanret in msf and copies its one value to all observations in msf.
Why does this work? It's a complicated answer. This relies on two things, the implicit RETAIN in every SET statement and the behaviour of SAS with respect to the endings of input datasets. These are fairly complicated and not essential, so I shall not go into them here. Interested readers may look at Sections 16.1 and 16.2 .