The logic of the DATA step

There are subtleties to the DATA step that it would help to understand.

The DATA step is nothing but an implicit loop. The standard DATA step looks like this:

      data a;
         set b;
	 <do something>

What this does is to read from dataset b, one observation at a time, do something to that observation, and then output that observation to dataset a. This is an implicit loop, which in pseudocode, might be made explicit by saying:

     DO I = 1 TO ROWS(B);
         <do something to CURRENT_OBSERVATION>;

treating A and B as matrices so ROWS(B) and B[I,.] (by which I mean the Ith row of the matrix B - the Ith observation) - have meaning.

There are two things that it is important to know. One is the role of the OUTPUT statement. The other is the role of the counter _N_.


Andre de Souza 2012-11-19