The efficient way is to use the Frisch-Waugh theorem which says that when you run a regression of Y on two sets of independent variables X_1 and X_2, the coefficients on X_2 are the same as if you did the following:

- Run each variable in X_2 on X_1, get the residuals
- Run Y on the residuals.

In our application, X_1 are the dummies, X_2 is lagsize.

The residual from regressing a variable on dummies is the variable demeaned by group. So regressing X_2, i.e. lagsize, on dummies is the same as demeaning lagsize by group - in this case, by permno.

To do this you can actually calculate the means using proc means by permno, merge the means in by permno, and then subtract the means, or you can do it in one step by using proc standard:

proc standard data= msf mean=0; by permno; var lagsize; run; proc reg data=msf; model ret=lagsize; run;

You can check that the coefficient is the same as with the dummies.

An alternative is to use proc glm with the absorb statement, but I do not completely understand proc glm, and I am reluctant to use it. This is simple, transparent, and as easy to code. Should you have multiple independent variables (other than the dummies), this extends easily: just put the additional variables on the VAR statement in the PROC STANDARD and, as usual, on the MODEL statement in PROC REG.