[Chapter 10] 10.3 PLVprsps: Parsing PL/SQL Strings

10.3 PLVprsps: Parsing PL/SQL Strings

The PLVprsps (PL/Vision PaRSe) PL/SQL package builds upon all the other string parsing and analyzing packages to provide easy-to-use, high-level programs to parse PL/SQL strings and programs. The parsed output is passed back into a PL/SQL table you provide in your calls to PLVprs modules. You can then work with the contents of the PL/SQL table as you see fit.

This parsing process separates PL/SQL code into individual atomics, which can then be used for any of the following purposes:

Analyze the contents of a PL/SQL program. What programs are defined in the package? Are any variables not being used?
Reformat or pretty-print the PL/SQL program. Once the atomics are separated, you can put them back together however you want and end up with the same program (as long as you preserve the order of the atomics).

The PLVprsps package offers several different levels of parsing programs (parse a string, parse a line, parse a program). With PLVprsps, you can also specify precisely which type of language elements you want to return in your parse.

The following sections show how to use the different elements of PLVprsps.

10.3.1 Selecting Token Types for Parsing

One of the users of PLVprsps is the PLVcat package (described in Chapter 18, PLVcase and PLVcat: Converting and Analyzing PL/SQL Code ), which catalogues the contents and usage of PL/SQL program units. You could ask, for example, to generate a list of those builtins which are used by a particular program. Or you could request to see only those nonkeyword references, which would give you the list of all application-specific identifiers in your code.

The way PLVcat is able to offer this flexibility is by offering you the ability in PLVprsps to request that the output from a call to the parsing programs (plsql_string or module ) return only certain kinds of tokens. The different types currently recognized by PLVprsps are:

Any keywords
Builtin functions, procedures, and packages
Application-specific identifiers (non-keyword identifiers)
All tokens

For each of these token types, PLVprsps offers toggles so that you direct the package to keep only the tokens in which you are interested. These are the "keep" and "nokeep" programs. The headers for these programs are:

PROCEDURE keep_all;
PROCEDURE keep_kw;
PROCEDURE keep_nonkw;
PROCEDURE keep_bi;

PROCEDURE nokeep_all;
PROCEDURE nokeep_kw;
PROCEDURE nokeep_nonkw;
PROCEDURE nokeep_bi;

So if I wanted to keep builtins and non-keywords when I perform my parse, I would issue these two calls:

PLVprsps.keep_nonkw;
PLVprsps.keep_bi;

All keywords which are not builtins would, therefore, be discarded in the parse. You would not see such atomics as IF and =.

If, on the other hand, I want to obtain all non-keywords, but reject all keywords in the parse, I would call these two programs:

PLVprsps.keep_nonkw;
PLVprsps.nokeep_kw;

10.3.2 Parsing PL/SQL Code

PLVprsps offers two programs for PL/SQL source code parsing: plsql_string and module . The plsql_string procedure parses the string passed to it. The module procedure parses all the lines of code for a specified program -- by calling the plsql_string program for each line in that program. Both of these programs are explained below.

10.3.2.1 plsql_string procedure

The header for plsql_string is:

PROCEDURE plsql_string
   (line_in IN VARCHAR2,
    tokens_out IN OUT PLVtab.vc2000_table,
    num_tokens_out IN OUT INTEGER,
    in_multiline_comment_out IN OUT BOOLEAN);

The line_in argument is the line of code to be parsed. The tokens_out PL/SQL table holds the distinct tokens found in the line. The num_tokens_out argument indicates the number of tokens found and in the PL/SQL table. The in_multiline_comment_out argument returns TRUE if the line of code has initiated or is part of a multiline comment block. You should initialize this IN OUT argument to FALSE to make sure that the parsing is performed as expected.

Comments are not considered atomics for the purposes of parsing. The comment text is parsed but never written to the PL/SQL table. For this reason, all of the following strings will be parsed into precisely the same set of tokens:

v_err := 'ORA-' || TO_CHAR (SQLERRM);
v_err := 'ORA-' || /* Concatenate! */ TO_CHAR (SQLERRM);
/* ASSIGN VALUE */ v_err := 'ORA-' || TO_CHAR (SQLERRM);
*/ v_err := 'ORA-' || TO_CHAR (SQLERRM);
v_err := 'ORA-' || TO_CHAR (SQLERRM); -- end of line

In all of these cases, the last argument of plsql_string will be returned as FALSE. The same set of tokens will be returned with the following string as well, but in this case the last argument of plsql_string will be returned as TRUE since a multiline comment block has been started:

v_err := 'ORA-' || TO_CHAR (SQLERRM); /* big comment coming:

10.3.2.2 Script to test plsql_string

The anonymous block shown below (and found in file PLVprsps.tst ) illustrates how to set up variables and then call the plsql_string program. It also uses PLVtab.display to easily show the contents of my PL/SQL table of tokens.

DECLARE
   full_string VARCHAR2(100)
       := 'v_err := ''ORA-'' || TO_CHAR (SQLERRM)';
   strings PLVtab.vc2000_table;
   num INTEGER := 1;
   incmnt BOOLEAN := FALSE;
BEGIN
   p.l (full_string);
   PLVprsps.plsql_string (full_string, strings, num, incmnt);
   PLVtab.display (strings, num);
   p.l (incmnt);
END;
/

Here are the results from an execution of the test script:

SQL> start PLVprsps.tst
v_err := 'ORA-' || TO_CHAR (SQLERRM)
Contents of Table
v_err
:=
'ORA-'
||
TO_CHAR
(
SQLERRM
)
FALSE

10.3.2.3 module procedure

Use PLVprsps.module to parse an entire module or PL/SQL program unit. The module procedure is overloaded in the following two versions:

PROCEDURE module
   (module_in IN VARCHAR2 := NULL, 
    tokens_out IN OUT PLVtab.vc2000_table,
    num_tokens_out IN OUT INTEGER);

PROCEDURE module
   (tokens_out IN OUT PLVtab.vc2000_table,
    num_tokens_out IN OUT INTEGER);

In the first version, you provide the name of the module that you wish to parse and the data structures that will be filled with the parsed tokens: a PL/SQL table and a count of the rows filled. In the second version, you simply supply the PL/SQL table and the variable to hold the number of tokens. This version of module assumes that you have already set the current object with a call to PLVobj.setcurr .

The three-argument version of module simply calls the two-argument version as shown below:

PROCEDURE module
   (module_in IN VARCHAR2 := NULL, 
    tokens_out IN OUT PLVtab.vc2000_table,
    num_tokens_out IN OUT INTEGER)
IS
BEGIN
   PLVobj.setcurr (module_in);
   module (tokens_out, num_tokens_out);
END;

10.3.2.4 Implementing a module parser

The two-argument version of PLVprsps.module that assumes that the object has already been set is not much more complex than the one you see above. The reason that it is so straightforward is that it relies on the plsql_string program to parse each line of code. And it gets those lines of code by using the PLVio package. The implementation of module is shown below:

PROCEDURE module
   (tokens_out IN OUT PLVtab.vc2000_table,
    num_tokens_out IN OUT INTEGER)
IS
   srcline PLVio.line_type;
   in_multiline_comment BOOLEAN := FALSE;
BEGIN
   init_table (tokens_out, num_tokens_out);
   PLVio.asrc;
   LOOP
      PLVio.get_line (srcline);
      EXIT WHEN srcline.eof;
      plsql_string 
         (srcline.text, tokens_out, num_tokens_out,
          in_multiline_comment);
   END LOOP;
END;

In the declaration section of the procedure, I declare a record to hold the line of code extracted with a call to the PLVio.get_line procedure. I also declare a Boolean variable, which is required for a call to plsql_string .

In the body of the procedure I first initialize the table that will hold the parsed tokens. Then I request reading of the source code from the ALL_SOURCE data dictionary view. Since I am going to read all of the lines of code in the specified program (assumed to be the current object in PLVobj), my call to PLVio.asrc does not have any arguments.

Now all I have to do is loop through the lines of code using the PLVio.get_line procedure. If I have not reached "end of file," I parse that string. Notice that I do not have to manually add the new parsed tokens to my table. The plsql_string program automatically places the new tokens in the rows after the current value of the num_tokens_out variable (that is why the third argument of plsql_string is an IN OUT parameter).

It is amazing how easy it can be to implement complex new functionality when you build upon preexisting elements.

10.3.3 Initializing a Table of Tokens

PLVprsps provides a program to initialize the PL/SQL table and row counter you will use to store parsed tokens. It is called by the module program; its header is shown below:

PROCEDURE init_table


   (tokens_out IN OUT PLVtab.vc2000_table,
    num_tokens_out IN OUT INTEGER);

At this time the init_table procedure does nothing more than assign an empty table to the PL/SQL table argument passed to it and set the row counter to 0.

Why bother building a program like init_table ? There are two good reasons:

By hiding this logic behind a programmatic interface, I make it easier to enhance this initialization process if the need arises in the future. If I rely on users to prepare the PL/SQL table to receive the parsed tokens, it will be very difficult for them to upgrade their own code to meet the new requirements of my package.
If the user can call init_table , she does not have to worry about the specific steps necessary to prepare the PL/SQL table. The individual steps are abstracted into a named action and taken care of by PLVprsps itself.

The init_table procedure is a gesture of respect for the users of PLVprsps. You don't have time to worry about insignificant details. Furthermore, you are much more likely to use PLVprsps if I offer this kind of feature.

10.3.4 Using PLVprsps

To give you a sense of how PLVprsps breaks apart PL/SQL code, let's see what it does with the following package body:

PACKAGE BODY testcase
IS
   PROCEDURE save (string_in IN VARCHAR2)
   IS
      n INTEGER := DBMS_SQL.OPEN_CURSOR;
   BEGIN
      UPDATE PLV_output SET program = string_in;
      IF SQL%ROWCOUNT = 0
      THEN
         INSERT INTO PLV_output VALUES (string_in);
      END IF;
      PLVcmt.perform_commit;
   END;
END testcase;

This program has calls to builtins, as well as application-specific identifiers. Now take a look at the following script (found in modprs.sql ). It takes two SQL*Plus arguments: the name of the program to be parsed and the type of tokens to retain and then display:

DECLARE
   strings PLVtab.vc2000_table;
   num INTEGER := 1;
BEGIN
   PLVprsps.nokeep_all;
   PLVprsps.keep_&2;
   PLVprsps.module ('&1', strings, num);
   PLVtab.display (strings, num);
END;
/

Let's take a look at the output generated by different calls to this script. The first call to modprs requests the parse to display only builtins. The second call shows all those identifiers which are not keywords in the PL/SQL language.

SQL> start modprs b:testcase bi
Contents of Table
DBMS_SQL.OPEN_CURSOR
UPDATE
ROWCOUNT
INSERT

SQL> start modprs b:testcase nonkw
Contents of Table
testcase
save
string_in
n
PLV_output
program
string_in
PLV_output
string_in
PLVcmt.perform_commit
testcase