A useful way to have control while not doing too much work is to use proc import on a small subsample first, and then use what proc import writes to the log to construct your own input statement, changing variable names or formats as you like.
For instance, I have the MSF dataset stored as a CSV file: text.csv. The first two lines are
PERMNO,DATE,RET 10000,19860228,-0.257143A proc import would read:
proc import file='text.csv' out=out dbms=csv; run;and produce the following in the log file:
11 data WORK.OUT ; 12 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ 13 infile 'text' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ; 14 informat PERMNO best32. ; 15 informat DATE best32. ; 16 informat RET best32. ; 17 format PERMNO best12. ; 18 format DATE best12. ; 19 format RET best12. ; 20 input 21 PERMNO 22 DATE 23 RET 24 ; 25 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */ 26 run;Notice that the date variable is stored as a number, and it would have been a hassle to get it into the correct format.
But working from this log, it is trivial to produce the following clean input data step:
data WORK.OUT; infile 'text' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2; informat PERMNO best32. ; informat DATE YYMMDD8. ; informat RET best32. ; format PERMNO best12. ; format DATE date9. ; format RET best12. ; input PERMNO DATE RET ; run;
Notice the change in the informat and format for date, so that the output dataset has date correctly coded.