 
 
 
 
 
   
The HASH object allows you to load a ``small'' dataset into RAM as a hash. A hash, or associative array, is a list indexed by a key. For instance, for the MSF dataset, the key variables are permno and date - which means that for a particular permno-date combination, the MSF dataset contains one observation, with the data being return, price, shares outstanding and so on. Given a permno-date combination, I could find the data for that combination by opening the MSF dataset and going through it until I hit the appropriate value. What a hash does is to allow you to instantly access the data for any specified permno-date combination without going through the entire dataset.
How do you create and use the hash? An example is probably the best way to explain.
In this example, I want to merge the monthly-frequency Fama-French factors with the MSF dataset, so I can run regressions to calculate betas.
data msf;
    set crsp.msf(keep=permno date ret);
    where ret is not missing;
    date=intnx('month', date, 1)-1;
run;    
data ff;
    set ff.factors_monthly(keep=date mktrf smb hml umd rf);
    date=intnx('month', date, 1)-1;
run;
All standard so far. Observe that I make the dates in both datasets end-of-month.
Now I define and use the hash.
data msf;
    if _n_=1 then do;
        declare hash h(dataset:'ff');
        h.defineKey('date');
        h.defineData('mktrf', 'smb', 'hml', 'umd', 'rf');
	call missing(mktrf, smb, hml, umd, rf);
        h.defineDone();
	end;
    set msf;
    if h.find() = 0 then output;
run;
What this does is:
data msf;
    if _n_=1 then do;
        declare hash h(dataset:'ff');
        h.defineKey('date');
        h.defineData('mktrf', 'smb', 'hml', 'umd', 'rf');
	call missing(mktrf, smb, hml, umd, rf);
        h.defineDone();
	end;
    set msf;
    if h.find() = 0 then foundinFF=1;
    else foundinFF=0;
run;
The foundinFF variable will tell you whether the observation existed in the FF dataset or not.
Some caveats:
The hash is of enormous utility in performing sortless merges, but it is also very useful in other situations. An example of this is many-to-many merges, explored in Section 12.4.2.
Because it is a headache to type out all this when you want to do merges with hashes, I wrote a macro that does it for me. See section 10.3.2 for the macro.
A nice introduction to the hash object can be found in Secosky and Bloom (2006).
 
 
 
 
