SAS::Parser - Parse a SAS program file
use SAS::Parser; $p = new SAS::Parser; $p->parse_file('mysas.sas'); # returns a SAS::Parser object
or
$file = shift @ARGV; $p->parse_file($file, {options});
After parsing, you can access the information stored in the SAS::Parser object as follows:
SAS::Parser
@procs = $p->procs(); # get list of procs called @datasets = $p->datasets(); # get list of datasets created $macros = $p->macros(); # get string of macros called
SAS::Parser is a base tool for use in writing applications which deal with .sas programs. It can be used as a documentation tool, e.g., to extract lists of procedures used, data sets created, macros used, etc., and produce a nicely formatted header in a consistent format, or to produce standard documentation headers for SAS macros. It can also be used as a pre-processor to a SAS code formatter, to produce WWW documents, etc. It is not likely to be useful as a SAS syntex checker without a good deal of additional work. It does as reasonable a job on SAS macros as can be expected without being an actual macro processor.
I had written a large number of specialized scripts for some of these tasks, and found that I was re-doing similar stuff each time. SAS::Parser is an attempt to bring this to the next level, where the basic statement parsing can be assumed, and your application can just work with the info extracted.
It's just a beginning, and all the rest depends on writing Perl code making use of SAS::Parser to accomplish such tasks. See SAS::Header for one such extension.
Any parser works by segmenting text into 'interesting units' for the purpose at hand.
SAS::Parser parses a SAS program into statements when the parse() or parse_file() methods are called. Each statement is classified as a statement type, and further parsed depending on that statement type. Information about libnames, filenames, data sets created, procs called, macros called, and macros defined is stored in the SAS::Parser object.
In addition, the parsed description of each statement selected by the stored option (its type, the statement name, and statement text) may be stored in an array for further processing.
Presently, we just collect the information from the SAS program. To do more interesting things, one should define sub-classes for more specialized tasks. See, for example, SAS::Header. These can add items to the object structure, which, like Topsy, just grows.
The external interface to SAS::Parser is:
Create a new, but empty SAS::Parser object. The object constructor takes no arguments.
Parse the $string as a SAS program. The $string argument is typically a series of lines (separated by \n) read from a file. The parse() method may be called several times with different chunks of a large file, or with lines read from different files. The parse() method does most of the work, but most applications directly use the parse_file() method, which in turn calls parse() with the text of a file. The return value is a reference to the parser object.
This method can be called to parse text from a file. The argument can be a filename or an already opened file handle. The return value from parse_file() is a reference to the parser object.
On Unix systems, parse_file() also attempts to locate and parse the autoexec.sas file, in order to locate pre-defined libname and filename statements which may be referenced in the SAS program.
libname
filename
The parse() and parse_file() methods take the following options as an optional second argument. All options are included as a hash of (option_name, option_value) pairs.
Setting doincludes=>1 (non-zero) causes the parser to insert the text of included files (%include statements) in the input stream at that point, if the included file can be read. In this case, line numbers refer to the total stream, not individual files.
doincludes=>1
%include
Setting trim=>1 (non-zero) causes each statement to be trimmed of leading/trailing whitespace, and all internal C-style comments (/* ... */) to be removed before the statement is stored or printed.
trim=>1
/* ... */
The store option specifies either 'ALL', or 'NONE', or a list of statement types whose contents and descriptors are stored in the SAS:Parser object. The default is store = qw(data proc).
store
SAS:Parser
For example, to store all data and proc statements, use
data
proc
$p->parse_file($file, {store=>qw(data proc)});
For each stored statement, the SAS::Parser object stores a list of the following 5 elements:
($lineno, $step, $type, $stmt, $statement)
The parse_file() method uses the following call to parse the autoexec.sas file silently, storing no statements (but recording filename and libname information):
parse_file()
$self -> parse($auto, {silent=>1, store=>qw(none)}) if $auto;
The print option specifies either 'ALL', or 'NONE', or a list of statement types whose contents and descriptors are printed as they are parsed. The default here, print = qw(data proc) prints information about each data and proc step. This option is mainly used for debugging or testing.
print
print = qw(data proc)
Setting silent=>1 (non-zero) suppresses the printout of statements as they are parsed. This is equivalent to setting the print option to 'NONE'.
silent=>1
The following methods are available in the SAS:Parser class. Except for the output() method, they all work as both constructors and accessors. If called with an argument, that argument is added to the corresponding entry in the SAS:Parser object. If called with no argument, they return that entry.
As a convenience, the accessors which ordinarily return lists (e.g., procs(), macros(), datasets(), etc.) will return a blank-separated string if called in a scalar context, or an array if called in a list context. (But note that "print $p->procs();" supplies a list context.)
The items for all these lists are stored and returned in the order found in the file(s) parsed. To use or print these in a sorted order, use the sort() function (which also supplies a list context).
Appends the named procedure to the list of procedures called. The constructor use of these methods is used internally during parsing.
Returns a list of the unique names of procedures called in PROC statements or a blank-separated string in a scalar context. The list accessor functions such as this are used as follows:
PROC
my @procs = $p->procs(); # list context print "procs called: ", join(', ', @procs), "\n" if scalar @procs;
my $procs = $p->procs(); # scalar context print "procs called: $procs\n" if $procs;
Returns a list of the unique names of macros invoked explicitly in the form %macname [(args);] or a blank-separated string in a scalar context. This does include macros invoked as part of %let other statements, e.g., %let nv = %nvar(&vars);, but not other macro statements.
%macname [(args);]
%let
%let nv = %nvar(&vars);
Returns a list of the unique names of macros defined or a blank-separated string in a scalar context.
Returns a list of the unique names of datasets created in DATA statements or a blank-separated string in a scalar context. Output datasets created by procedures are not tracked.
DATA
Returns a list of the unique names of included files from %include statements or a blank-separated string in a scalar context.
Returns a list of the unique names of IML modules defined or a blank-separated string in a scalar context.
Returns a hash of the names of SAS libraries defined. The key for each element of the hash is the libref, and the corresponding value is a string containing the folder or directory name.
The libnames and corresponding directory names (if any) may be printed as follows:
my %libnames = $p->libnames(); while (($libref,$value) = each %libnames) { print " libname: $libref=$value\n"; }
Returns a hash of the names of SAS filenames defined. Non-disk filenames (pipe, printer, tape, etc) are ignored. The key for each element of the hash is the fileref, and the corresponding value is a string containing the filename, or a folder or directory name, or a blank-separated list of folder/directory names (for a filename aggregate).
pipe
printer
tape
Returns a list-of-lists of the SAS statements stored, which consists of all statements whose type matches the store option.
Sets an end-of-file condition which terminates parsing after the current statement has been processed. The eof() method may be used by a sub-class of SAS::Parser to end the parsing after the required information has been extracted.
This method is used to produce output from the parser as each statement is parsed. The default method provided in SAS::Parser simply prints the values of $step, $type, $stmt, and $statement. It uses a negative value of $lineno as a flag for initial processing. Sub-classes of SAS::Parser may override this method for other purposes.
For example, the following lines define a short SAS program as a here document, and parses it with SAS::Parser.
use SAS::Parser; my $sascode = <<END; data test; do x=1 to 20; y=x + normal(0); output; end; proc reg data=test; model y=x; proc means data=test; var y x; END ; my $p = new SAS::Parser; $p -> parse($sascode);
When run, this produces the following printed output:
data data test data test; proc proc reg proc reg data=test; proc proc means proc means data=test;
The parsing of each statement returns the variables $lineno, $step, $type, $stmt, and $statement, which may be printed by parser() and/or stored in the SAS::Parser object (depending on the options: silent, print, store).
$lineno is the source line number of the first line of the statement. $step is one of 'data', 'proc', or '' (for global statements outside of PROC or DATA steps. $type is a general statement type, $stmt sometimes gives a further keyword or name associated with the statement, and $statement is the actual text of the statement (possibly trimmed of whitespace and embedded /* comments */, depending on the trim option).
The statement $types currently used are:
parser() could not classify this statement.
an assignment statement. $stmt contains the name of the variable assigned.
cards; datalines;, etc.
a C-style comment: /* ... */
a DATA statement. $stmt contains the name of the first data set mentionted
a SAS global statement: options, title, run, axis, etc. $stmt contains the statement keyword.
%include statement. The parser handles the forms %include 'path/filename';, %include fileref;, and %include fileref(file); where fileref was defined in a filename statement, possibly in the autoexec.sas file. If the fileref was defined, the name of the actual file is found, if the file exists.
%include 'path/filename';
%include fileref;
%include fileref(file);
fileref
actual data lines following cards;
cards;
a macro call statement. $stmt contains the macro name.
a macro comment statement: %* ... ;
%* ... ;
a macro definition statement: %macro(). $stmt contains the macro name. $statement contains the text of the macro definition statement, including all arguments and default values.
%macro()
%mend statement
some other macro statement: %display, %do, %else, %end, etc. $stmt contains the statement keyword.
%display
%do
%else
%end
null statement
a PROC statement. $stmt contains the name of the procedure called.
a statement comment: * ... ;
* ... ;
some other SAS statement: all DATA step statements, and PROC step statements. $stmt contains the statement keyword.
The following methods are available in the SAS:Parser class for specialized parsing of particular statement types, to extract or operate on additional information in a statement. They are designed so that they may be overridden for particular applications.
Those listed as NOOP do nothing here, except reserve a place for such additional processing. For example, you can override parse_mdef() to do further parsing of macro arguments.
NOOP
Parse a data statement, finding all dataset names created, and storing these in $self->{datasets}. We don't bother distinguishing between permanent and temporary datasets, or store information about the SAS libraries referred to. We handle (implicit) _data_, as in data;, but don't resolve these to DATA1, DATA2, etc.
data;
Parse a filename statement to determine fileref and corresponding folder(s).
Parse a %include statement to determine pathname of included file(s). For this to work, we must have seen and parsed the filename statements for any %include fileref; or %include fileref(file);. We don't actually include the file, but leave that to the higher-ups.
Returns: the resolved pathname of the included file, if it exists.
Parse a libname statement to determine libref and corresponding folder
Parse a macro statement. As implemented here, this just looks for user-defined macro functions invoked in a %let statement, e.g.,
%let nv = %words(&vars);
This will add %words to the list of macros called.
%words
The following subroutines are exported by default.
Find the autoexec.sas file, and return its pathname if found, else return undef. If the environment variable SAS_OPTIONS defines -autoexec, we look there first. Otherwise, we search the current directory, the user's HOME directory, or a directory specified by the environment variable SASROOT, in that order.
undef
-autoexec
Protect special characters from the parser by remapping them into some other string.
Read a file, given filename (complete path) or filehandle (assumed open). Returns the file contents or undef if not found.
Uses SAS_OPTIONS and SASROOT to locate autoexec.sas.
SAS_OPTIONS
SASROOT
parse() does not handle certain types of complex macros particularly well. When %do...; stuff %end; is used inside another statement to generate conditional code, that text, up to the next ';' is appended appropriately to the current statement. In other cases, it may fail, returning '?' as the statement type, because it's a static parser, not a true macro interpreter. In these cases, the parser swallows text up to the next ';' as the current statement, and soldiers on. Following statements are parsed correctly.
%do...; stuff %end;
The logic used to handle ';' inside quoted strings is fooled by unmatched quotes, even those inside comments. For example,
*--don't expect this comment to parse correctly;
There are still some problems with parsing line labels that look like statement types or keywords. For example, the macro statement
%done: options notes;
gets classified as a %do statement.
SAS::Header, SAS::Index
SAS::Header
SAS::Index
Michael Friendly, friendly@yorku.ca
Copyright 1999- Michael Friendly. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
7 POD Errors
The following errors were encountered while parsing the POD:
=back doesn't take any parameters, but you said =back 4
To install SAS::Parser, copy and paste the appropriate command in to your terminal.
cpanm
cpanm SAS::Parser
CPAN shell
perl -MCPAN -e shell install SAS::Parser
For more information on module installation, please visit the detailed CPAN module installation guide.