[Chapter 11] 11.2 Character Function Examples

11.2 Character Function Examples

Now that you have a solid grounding in all the different character functions, you should find the extended examples in the following sections easy to follow. I also hope that you'll find that the concepts embodied by the examples come in handy in your own applications.

11.2.1 Parsing a Name

Have you ever noticed that a person's name is a complicated data structure? You've got a first name, last name, middle initial, title, salutation, and who knows what else. It is also impossible to guarantee the uniqueness of a person's name, no matter how many attributes you plan to maintain. That's why I memorized my nine-digit Social Security number early in life.

The name of a human being is made up of many components. To make things even more complicated, though, there are at least two common formats for displaying and specifying a person's name:

LAST, FIRST MIDDLE, as in Feuerstein, Eli S.

and:

FIRST MIDDLE LAST, as in Eli Silva Feuerstein

But of course a name could also be formatted as follows:

TITLE FIRST MIDDLE LAST1 LAST2 GENERATION SUFFIX

as in Mr. Eli S. Hamilton Feuerstein, III, PhD

Why should you care about these different formats? Suppose my application has a table of callers structured like this:

CREATE TABLE caller
   (caller_id NUMBER PRIMARY KEY,
    first_name VARCHAR2(30),
    last_name VARCHAR2(30),
    middle_name VARCHAR2(30)
   );

and that the caller_id column is a foreign key in a number of other tables.

I strongly believe that users should never have to see or enter the actual foreign key values, unless there is a specific application requirement to do so. Instead, the user should be able to work with the name of the caller and leave it to the PL/SQL code to get the key. Users don't need to know that the caller named "Steven Feuerstein" has called ID number 14067. They just know the name and they should be allowed to enter the name in whatever format they choose:

FIRST MIDDLE LAST
LAST, FIRST MIDDLE
LAST, FIRST
LAST
FIRST LAST

and so on. My application should be smart enough to figure out the intended name. To do this, I would have to parse the name entered by the user into its component parts, dynamically interpreting the type of format the user has employed. This parsing process is performed using a variety of string functions.

Now if you think I am going to provide you with code that will be able to make sense of a name like Mr. Eli S. Hamilton Feuerstein, III, PhD, then I would ask you to please lower your expectations. Instead, I will present several versions of a procedure that parses a name assuming a simpler format. I will then leave the more general and truly bewildering cases as, ahem, exercises for the reader.

11.2.1.1 Parsing first and last names only

The procedure that follows, parse_name, accepts a full name and returns the first and last names, separated out of that combined string. The first version of this procedure understands and interprets three different formats:

FIRST LAST
LAST, FIRST
LAST

The parse_name algorithm is straightforward, applying the following rules:

If the full name has a comma, then everything to the left of the comma is the last name and everything to the right of the comma is the first name.
If the full name does not have a comma but does have a space, then everything to the left of the space is the first name and everything to the right of the space is the last name.
If the full name has neither a space nor a comma, then the entire string is the last name and the procedure returns a NULL value for the first name.

In all of these cases, the first and last names are returned in uppercase.

The following table shows how parse_name will interpret various name strings:

Full Name	First	Last
SANDRA BARKER	SANDRA	BARKER
QUINCE,HARRY	HARRY	QUINCE
JOG BOG, HOLLER	HOLLER	JOG BOG
Steve Samson	STEVE	SAMSON
Eli Silva Feuerstein	ELI	SILVA FEUERSTEIN
HALLY		HALLY

The following version of parse_name doesn't check for middle names and uses LTRIM, RTRIM, INSTR, and SUBSTR to separate a full name into its first and last name components:

/* Filename on companion disk: parsenm1.sp */
PROCEDURE parse_name
              (fullname_in IN VARCHAR2,
               fname_out OUT caller.first_name%TYPE,
               lname_out OUT caller.last_name%TYPE)
IS
   /* Trimmed and uppercased version of name. */
   fullname_int VARCHAR2(255) := LTRIM (RTRIM (UPPER (fullname_in)));

   /* The location of the comma delimiter in the name. */
   delim_comma NUMBER := INSTR (fullname_int, ',');

   /* Location of the first space delimiter in the name. */
   delim_space NUMBER := INSTR (fullname_int, ' ');

BEGIN
   IF delim_comma = 0 AND delim_space = 0
   THEN
      /* No comma, no space: return whole string as LAST name. */
      lname_out := fullname_int;
      fname_out := NULL;

   ELSIF delim_comma != 0
   THEN
      /* Found a comma in the string! Use LAST, FIRST format.
      ||
      || Use SUBSTR to grab everything from the beginning of the
      || name to just before comma and assign it to last name.
      */
      lname_out := SUBSTR (fullname_int, 1, delim_comma-1);
      /*
      || Use SUBSTR to grab everything from just after the comma
      || and assign it to the first name. Use LTRIM to
      || get rid of any blanks between the comma and the first name.
      */
      fname_out := LTRIM (SUBSTR (fullname_int, delim_comma+1));

   ELSIF delim_space != 0
   THEN
      /* Found a space in the string! Use FIRST LAST format.
      ||
      || Use SUBSTR to grab everything from the beginning of the
      || name to just before the space and assign to first name.
      */
      fname_out := SUBSTR (fullname_int, 1, delim_space-1);
      /*
      || Use SUBSTR to grab everything from just after the space
      || and assign it to the last name. Use RTRIM to
      || get rid of any multiple spaces before the last name.
      */
      lname_out := LTRIM (SUBSTR (fullname_int, delim_space+1));
   END IF;
END parse_name;

As you saw in the previous table, parse_name does not do such a great job with middle names when provided in the format FIRST MIDDLE LAST. In fact, it doesn't recognize middle names as a separate component in a person's name. In the next section we look at a version of parse_name that does handle middle names intelligently.

11.2.1.2 Parsing first, last, and middle names

In order to handle potential middle names, I need to come up with a new set of possible formats which will be recognized by the procedure. I can think of the following:

FIRST MIDDLE LAST
LAST, FIRST MIDDLE
LAST, FIRST
LAST
FIRST LAST

I could also try to handle MIDDLE LAST, FIRST -- this would seem to me, however, to be a very unusual and awkward way to enter a name. I am going to ignore it.

Given these formats, I will enhance the parse_name procedure shown in the previous section so that it applies the following additional rules:

If the format is FIRST MIDDLE LAST, then the last name is made up of everything after the last blank in the name, and the first name is everything up to the first blank.
If the format is LAST, FIRST MIDDLE, then the last name is made up of everything up to the comma. The first name is everything after the comma and before the first blank after the comma (or the end of the string, in which case there is no middle name).

We've already seen how to find the first space in a string. How can we find the last occurrence of the space in a string? We will make use of the negative starting point for the INSTR function as follows (the INSTR function is discussed earlier in the chapter):

last_space_loc := INSTR (fullname_int, ' ', -1, 1);
lname_out   := SUBSTR (fullname_int, delim_space+1);

The enhanced version of parse_name is shown below. It takes the full name and returns up to three different components of that name. The three component parameters are given modes of IN OUT, rather than just OUT, because I make reference to the parameters in the program as I manipulate the values. Because of this they cannot be simply OUT parameters.

Here is the version of parse_name that checks for middle names:

/* Filename on companion disk: parsenm2.sp */
PROCEDURE parse_name
   (fullname_in IN VARCHAR2,
    fname_out IN OUT caller.first_name%TYPE,
    mname_out IN OUT caller.last_name%TYPE,
    lname_out IN OUT caller.middle_name%TYPE)
/*
|| Note: comments are provided only for new elements of code.
*/
IS
   fullname_int VARCHAR2(255) := LTRIM (UPPER (fullname_in));
   delim_comma NUMBER := INSTR (fullname_in, ',');

   /* Location of the first and last space delimiters. */
   first_space NUMBER := INSTR (fullname_in, ' ');
   last_space  NUMBER;

BEGIN
   IF delim_comma = 0 AND first_space = 0
   THEN
      /* Just one word, so it all goes to the last name */
      lname_out := fullname_int;
      fname_out := NULL;
      mname_out := NULL;

   ELSIF delim_comma != 0
   THEN
      /* LAST, FIRST or FIRST MIDDLE format:
      || Strip off last name, then do first and middle names
      */
      lname_out := SUBSTR (fullname_int, 1, delim_comma-1);
      /*
      || Assign the remainder of the name to the first name.
      || If there are no more spaces, there is no middle name
      || and the first name will be the rest of the name.
      ||
      || Find the first space after the comma. I use LTRIM
      || to remove any leading blanks between the comma and
      || the first name.
      */
      fname_out := LTRIM (SUBSTR (fullname_int, delim_comma+1));
      first_space := INSTR (fname_out, ' ');
      IF first_space = 0
      THEN
         /* No space, so there is no middle name. */
         mname_out := NULL;
      ELSE
         /* Split up the name into first and middle parts. */
         fullname_int := fname_out; -- hold a copy
         fname_out := SUBSTR (fullname_int, 1, first_space-1);
         mname_out := SUBSTR (fullname_int, first_space+1);
      END IF;

   ELSIF first_space != 0
   THEN
      /*
      || We have a "FIRST MIDDLE LAST" scenario. See if there is
      || another space, and make sure that there are non-blank
      || characters between the two blanks.  If this is the case,
      || we have a middle name. If not, just split up the name into
      || first and last. Well? Didn't I say that names are
      || complicated?
      */
      last_space := INSTR (fullname_in, ' ', -1, 1);
      IF first_space = last_space
      THEN
         /* Just one space. Split name as in parse_name. */
         fname_out := SUBSTR (fullname_int, 1, first_space-1);
         lname_out := SUBSTR (fullname_int, first_space + 1);
      /*
      || Check to see if there is anything besides blanks between
      || the first and last names. The user might have typed:
      || "JOE    SHMO" just to play games with your program. You
      || don't want to look silly, so you make the effort and
      || use RTRIM to squeeze out the blanks, leaving behind NULL
      || if that's all there was.
      */
      ELSE
         /*
         || The hardest part about the middle name is that you have
         || to figure out its length. You need to subtract 1 from
         || the difference between last and first since the
         || first_space locator is actually always an offset from
         || zero. For example:
         ||          Loc = 12345678901234567890
         ||       String = thelma hoke peterson
         ||   first_space (7) ||-^    ^||- last_space (12)
         ||       Length of middle name "Hoke" = 12 - 7 - 1 = 4
         */
         mname_out :=
            LTRIM (RTRIM
                     (SUBSTR (fullname_int,
                              first_space + 1,
                              last_space - first_space - 1)));

         /* Now the easy part. */
         fname_out := SUBSTR (fullname_int, 1, first_space - 1);
         lname_out := SUBSTR (fullname_int, last_space + 1);
      END IF;
   END IF;
END;

You can see how the code grew considerably more complex just by adding the necessary parsing for middle name. You can see why I don't want to show you how to handle titles, suffixes, and salutations. Of course, in many data entry screens, these other elements of the name are isolated into separate fields or items on the screen.

11.2.2 Implementing Word Wrap for Long Text

Suppose you have a very long line of text and you need to wrap it to fit within a certain line length. Instead of breaking apart words, you want to fit as many separate words as you can onto a single line and then start a new line.

The wrap package contains the to_paragraph procedure,[ 2 ] which takes one long line of text and breaks it into separate lines, depositing those separate lines into a PL/SQL table. The procedure also contains calls to the built-in DBMS_OUTPUT package so it can display a trace of the way it performs the analysis to break apart the long text.

[2] This example, including the wonderful test script, was provided by David Thompson. It contains a number of features of PL/SQL covered later in this book, including built-in package modules and programmer-defined packages. Thanks as well to Lawrence Pit for his help with this example.

/* Filename on companion disk: wordwrap.spp */
create or replace PACKAGE wrap
IS
   TYPE paragraph_tabletype IS TABLE OF VARCHAR2 (80)
      INDEX BY BINARY_INTEGER;

   PROCEDURE to_paragraph
      (text_in IN VARCHAR2,
       line_length IN INTEGER,
       paragraph_out IN OUT paragraph_tabletype,
       num_lines_out IN OUT INTEGER,
       word_break_at_in IN VARCHAR2 := ' ');
END wrap;
/

create or replace PACKAGE BODY wrap
IS
    replace_string VARCHAR2(100) := NULL;

   PROCEDURE to_paragraph
      (text_in IN VARCHAR2,
       line_length IN INTEGER,
       paragraph_out IN OUT paragraph_tabletype,
       num_lines_out IN OUT INTEGER,
       word_break_at_in IN VARCHAR2 := ' ')
   IS
      len_text INTEGER := LENGTH (text_in);
      line_start_loc INTEGER := 1;
      line_end_loc INTEGER := 1;
      last_space_loc INTEGER;
      curr_line VARCHAR2(80);

      PROCEDURE set_replace_string IS
      BEGIN
        replace_string := RPAD ('@', LENGTH (word_break_at_in), '@');
      END;

      PROCEDURE find_last_delim_loc
      (line_in IN VARCHAR2, loc_out OUT INTEGER)
      IS
         v_line VARCHAR2(1000) := line_in;
      BEGIN
         IF word_break_at_in IS NOT NULL
         THEN
            v_line :=
               TRANSLATE (line_in, word_break_at_in, replace_string);
         END IF;
         loc_out := INSTR (v_line, '@', -1);
      END;

    BEGIN
       set_replace_string;

      IF len_text IS NULL
      THEN
         num_lines_out := 0;
      ELSE
         num_lines_out := 1;
         LOOP
           EXIT WHEN line_end_loc > len_text;
           line_end_loc := LEAST (line_end_loc + line_length, len_text + 1);

           /* get the next possible line of text */
           curr_line := SUBSTR (text_in || ' ', line_start_loc, line_length +1);

           /* find the last space in this section of the line */
           find_last_delim_loc (curr_line, last_space_loc);

           /* When NO spaces exist, use the full current line*/
           /* otherwise, cut the line at the space.          */
           IF last_space_loc > 0
           THEN
              line_end_loc := line_start_loc + last_space_loc;
           END IF;

           /* Add this line to the paragraph */
           paragraph_out (num_lines_out) :=
              substr (text_in,
                       line_start_loc,
                       line_end_loc - line_start_loc);

           num_lines_out := num_lines_out + 1;
           line_start_loc := line_end_loc;
         END LOOP;
         num_lines_out := num_lines_out - 1;
      END IF;
   END to_paragraph;
END wrap;
/

The following script tests the to_paragraph procedure and provides a trace of the lines that are being placed in the table. Notice that David uses RPAD to generate a "ruler," which shows clearly how the long text will need to be broken up into the paragraph format:

/* Filename on companion disk: testwrap.sql */
DECLARE
   /* Declare a PL/SQL table to hold the lines of the paragraph. */
   loc_para wrap.paragraph_tabletype;

   /* The test text. */

   loc_text VARCHAR2 (2000) :=
      'This is a very long line of text which we will wrap to smaller lines';

   loc_line_length INTEGER := 10;
   loc_lines INTEGER;

BEGIN
   /* Display a header containing the text and ruler. */
   DBMS_OUTPUT.PUT_LINE ('====================');
   DBMS_OUTPUT.PUT_LINE (loc_text);
   DBMS_OUTPUT.PUT_LINE (RPAD('1',length(loc_text),'2345678901'));
   DBMS_OUTPUT.PUT_LINE
      ('Wrapping to: '||TO_CHAR(loc_line_length)||' characters.');
   DBMS_OUTPUT.PUT_LINE('====================');

   /* Wrap the long text into a paragraph. */
   wrap.to_paragraph(loc_text, loc_line_length, loc_para, loc_lines);

   DBMS_OUTPUT.PUT_LINE('====================');
END;
/

Here are two examples of the output from the test script. The first number in square brackets shows the position of the last space found before the line break. The second number in square brackets shows the starting position in the current line and location of the first character in the next line:

====================
This is a very long line of text which we will wrap to smaller lines
12345678901234567890123456789012345678901234567890123456789012345678
Wrapping to: 10 characters.
====================
"This is a v"[ 10][  1] [ 11]This is a
"very long l"[ 10][ 11] [ 21]very long
"line of tex"[  8][ 21] [ 29]line of
"text which "[ 11][ 29] [ 40]text which
"we will wra"[  8][ 40] [ 48]we will
"wrap to sma"[  8][ 48] [ 56]wrap to
"smaller lin"[  8][ 56] [ 64]smaller
"lines "     [  6][ 64] [ 70]lines
====================

====================
This is a very long line of text which we will wrap to smaller lines
12345678901234567890123456789012345678901234567890123456789012345678
Wrapping to: 6 characters.
====================
"This is"[  5][  1] [  6]This
"is a ve"[  5][  6] [ 11]is a
"very lo"[  5][ 11] [ 16]very
"long li"[  5][ 16] [ 21]long
"line of"[  5][ 21] [ 26]line
"of text"[  3][ 26] [ 29]of
"text wh"[  5][ 29] [ 34]text
"which w"[  6][ 34] [ 40]which
"we will"[  3][ 40] [ 43]we
"will wr"[  5][ 43] [ 48]will
"wrap to"[  5][ 48] [ 53]wrap
"to smal"[  3][ 53] [ 56]to
"smaller"[  0][ 56] [ 62]smalle
"r lines"[  2][ 62] [ 64]r
"lines " [  6][ 64] [ 70]lines
====================

Why would you want to break up text into paragraphs? One reason is to be able to place that text into a formatted document, such as a newsletter or newspaper. Such text usually needs to be filled out to the margins. The output from a call to wrap.to_paragraph would very naturally then become input to the fill_text function, described in the next section.

11.2.3 Filling Text to Fit a Line

It's very easy to left-align a string in PL/SQL -- LTRIM it. It is also easy to right-align a string in a field or item -- LPAD it with spaces to the full length of the field or item; that will force the string all the way to the right. But how would you justify the string to make it look like a line in a newspaper column, with the first word on the left margin and the last letter of the last word on the right margin?

To justify a string you need to add blanks to spaces between words until the line has been stretched to fill the field or item. You also want to space the blanks evenly across the column, so that it's as readable as possible. The fill_text function shown in the next example accepts a string and line size and returns a string of the specified length, with spaces distributed throughout the string.

In order to achieve an even distribution of spaces, fill_text employs two, nested loops. The inner loop moves from word to word in the string and adds a single blank space after each word. The outer loop calls the inner loop again and again until the string is filled to the specified size. A local module skips over contiguous spaces to find the location of the next nonblank character:

/* Filename on companion disk: filltext.sf */
FUNCTION filled_text (string_in IN VARCHAR2, linesize_in IN NUMBER)
RETURN VARCHAR2
/*
|| Parameters:
||    string_in   - the string to justify.
||    linesize_in - the length of the line to which string is justified.
*/
IS
   -- The length of the string.
   len_string NUMBER := LENGTH (string_in);

   -- I will give a name to the blank to make the code more readable.
   a_blank CONSTANT VARCHAR2(1) := ' ';

   -- The starting location for INSTR checks.
   start_loc NUMBER;

   -- Flag to control execution of the inner WHILE loop.
   keep_adding_blanks BOOLEAN;

   -- The location in the string of the next blank.
   nextblank_loc NUMBER;

   /*
   || The return value for the function is initially set to the incoming
   || string. I also LTRIM and RTRIM it to get rid of end spaces.
   || I am going to assume a max line size of 500. You could raise this
   || limit to 32K, but you will use more memory to do so.
   */
   return_value VARCHAR2 (500) :=
       LTRIM (RTRIM (SUBSTR (string_in, 1, 500)));

   /*---------------------- Local Module ----------------------------*/

   FUNCTION next_nonblank_loc (string_in IN VARCHAR2, start_loc_in IN NUMBER)
   RETURN NUMBER
   /*
   || Finds the location in a string of the next NON-BLANK character
   || from the starting location. Uses SUBSTR to examine the current
   || character and halts the loop if not a blank.
   */
   IS
      return_value NUMBER := start_loc_in;
   BEGIN
      WHILE SUBSTR (string_in, return_value, 1) = a_blank
      LOOP
         return_value := return_value + 1;
      END LOOP;
      RETURN return_value;
   END;

BEGIN
   /* The outer loop. */
   WHILE len_string < linesize_in
   LOOP
      /* Reset variables for inner loop. */
      start_loc := 1;
      keep_adding_blanks := TRUE;
      /*
      || Inner loop:
      || If I have filled the string or gone through the entire line,
      || halt this loop and go to the outer loop.
      */
      WHILE len_string < linesize_in AND keep_adding_blanks -- Inner loop
      LOOP
         /* Find the location of the next blank. */
         nextblank_loc := INSTR (return_value, a_blank, start_loc);

         /* If no more blanks, am done with line for this iteration. */
         keep_adding_blanks := nextblank_loc > 0;

         IF keep_adding_blanks
         THEN
            /*
            || Use SUBSTR to pull apart the string right where the
            || blank was found and then put the two pieces back together
            || after stuffing a space between them. Then add 1 to the
            || length and find the location of the next non-blank
            || character. This new starting location will be used by
            || INSTR to then find the next blank.
            */
            return_value :=
               SUBSTR (return_value, 1, nextblank_loc) ||
               a_blank ||
               SUBSTR (return_value, nextblank_loc+1);
            len_string := len_string + 1;
            start_loc := next_nonblank_loc (return_value, nextblank_loc);
         END IF;

      END LOOP;
   END LOOP;

   RETURN return_value;

END filled_text;

The filltext.sf file contains the function shown above as well as a procedure version of the function, which will show you through calls to DBMS_OUTPUT.PUT_LINE how the fill_text algorithm builds its string. After you run the script in that file, you will have a filled_text function and a fill_text procedure.

If you then execute the command:

SQL> exec fill_text ('this is not very long', 30)

You will see the following output from the procedure call:

123456789012345678901234567890
==============================
this is not very long
this  is not very long
this  is  not very long
this  is  not  very long
this  is  not  very  long
this   is  not  very  long
this   is   not  very  long
this   is   not   very  long
this   is   not   very   long
this    is   not   very   long
this    is   not   very   long
==============================
123456789012345678901234567890

You can see how the function dispersed the spaces evenly throughout the string. The final line is repeated twice because it was displayed from within the function and by the SQL*Plus script.

11.2.4 Counting Substring Occurrences in Strings

You now know how to use INSTR to find the location of a substring within a string. This means that you also know how to determine whether or not a certain substring is in that string (INSTR returns zero if the substring is not found). What if you want to figure out how many times a particular substring occurs within a string?

The first thing that you should do when asked to implement a request like this is to make sure that Oracle's SQL, PL/SQL, or some tool doesn't already provide what you need. I sure hate to spend an hour writing some convoluted program only to find out that a built-in program provided with the tool already does it. Become familiar with what is available, and don't be the least bit embarrassed to go back to the manuals for a refresher.

So, I ask myself, is there a character function that will return the number of times a substring occurs within a string? Afraid not -- but checking was the right thing to do. Fortunately, the flexibility of INSTR makes it easy to build such a utility for ourselves. In fact, I use INSTR to produce two different versions of this freq_instr ("frequency in string") function.

The first version, freq_instr1, counts the number of occurrences of a substring by changing the starting location of the INSTR-based search to just past the location of the last match. From that point on, it always searches for the first occurrence.

The second version, freq_instr2, also counts the number of occurrences of a substring using INSTR as well. However, in this function, the starting location of the search always remains 1, and the nth_occurrence argument to INSTR (the fourth argument) is bumped up after each occurrence is found.

Here's the first version:

/* Filename on companion disk: freqinst.sf */
FUNCTION freq_instr1
   (string_in IN VARCHAR2,
    substring_in IN VARCHAR2,
    match_case_in IN VARCHAR2 := 'IGNORE')
RETURN NUMBER
/*
|| Parameters:
||    string_in - the string in which frequency is checked.
||    substring_in - the substring we are counting in the string.
||    match_case_in - If "IGNORE" then count frequency of occurrences
||                    of substring regardless of case. If "MATCH" then
||                    only count occurrences if case matches.
||
|| Returns the number of times (frequency) a substring is found
|| by INSTR in the full string (string_in). If either string_in or
|| substring_in are NULL, then return 0.
*/
IS
   -- Starting location from which INSTR will search for a match.
   search_loc NUMBER := 1;

   -- The length of the incoming substring.
   substring_len NUMBER := LENGTH (substring_in);

   -- The Boolean variable which controls the loop.
   check_again BOOLEAN := TRUE;

   -- The return value for the function.
   return_value NUMBER := 0;
BEGIN

   IF string_in IS NOT NULL AND substring_in IS NOT NULL
   THEN
      /* Loop through string, moving forward the start of search. */
      WHILE check_again
      LOOP
         IF UPPER (match_case_in) = 'IGNORE'
         THEN
            -- Use UPPER to ignore case when performing the INSTR.
            search_loc :=
               INSTR (UPPER (string_in),
                      UPPER (substring_in), search_loc, 1);
         ELSE
            search_loc := INSTR (string_in, substring_in, search_loc, 1);
         END IF;

         /* Did I find another occurrence? */
         check_again := search_loc > 0;
         IF check_again
         THEN
            return_value := return_value + 1;

            /* Move start position past the substring. */
            search_loc := search_loc + substring_len;
         END IF;
      END LOOP;
   END IF;

   RETURN return_value;

END freq_instr1;

Now let's take a look at the second version of freq_instr, which keeps the starting location of the call to INSTR at a constant value of 1 and increments the number of the occurrence we wish to extract from the string. Both approaches do the job, but this second version uses a little less code and relies more directly and naturally on INSTR to do the job.

/* Filename on companion disk: freqinst.sf */
FUNCTION freq_instr2
   (string_in IN VARCHAR2,
    substring_in IN VARCHAR2,
    match_case_in IN VARCHAR2 := 'IGNORE') RETURN NUMBER
IS
   substring_loc NUMBER;
   return_value NUMBER := 1;
BEGIN
   /*
   || Use the last argument to INSTR to find the Nth occurrence of
   || substring, where N is incremented with each spin of the loop.
   || If INSTR returns 0 then have one too many in the return_value,
   || so when I RETURN it, I subtract 1 (see code following loop).
   */
   LOOP
      IF UPPER (match_case_in) = 'IGNORE'
      THEN
         substring_loc :=
            INSTR (UPPER (string_in), UPPER (substring_in), 1, return_value);
      ELSE
         substring_loc := INSTR (string_in, substring_in, 1, return_value);
      END IF;

      /* Terminate loop when no more occurrences are found. */
      EXIT WHEN substring_loc = 0;

      /* Found match, so add to total and continue. */
      return_value := return_value + 1;

   END LOOP;

   RETURN return_value - 1;

END freq_instr2;

In freq_instr2, I use INSTR to keep on getting the next occurrence of the substring until there are no more, letting INSTR determine the number of occurrences. In freq_instr1, I explicitly move the starting point of the search through the string and then use INSTR to get the next occurrence. Will both functions always return the same values? And if not, which one returns the correct values?

In fact, freq_instr1 and freq_instr2 do not count the same number of occurrences of a substring in a string. This discrepancy relates back to the way INSTR works -- consider the following results:

freq_instr1 ('abcdaaa', 'aa') ==> 1

freq_instr2 ('abcdaaa', 'aa') ==> 2

The first version, freq_instr1, only finds one match for aa because it moves the starting point for the next search for an aa past the second a in the string aaa, right to the last character of the string. The freq_instr2 function, on the other hand, uses the second a in the string aaa as the starting point for the second occurrence of aa and so returns 2 rather than 1. Which answer is correct? It depends on the detailed specifications for the function. Either answer could be right; it depends on the question.

11.2.5 Verifying String Formats with TRANSLATE

Oracle Forms allows you to specify a format mask on screen items that are numbers and dates. With a number mask of 999"-"99"-"9999, for example, a user could enter 123457890 and have Oracle Forms automatically convert the entry to 123-45-7890 for Social Security number formatting. With a date mask of MMDDYY, a user could enter 122094 and have Oracle Forms automatically convert that information to the internal date format for December 20, 1994. Unfortunately, you can use format masks only on number and date items. You cannot format a character item. You cannot use a mask, for example, to ensure that users enter the department code in the format AAANNN, where the first three characters are letters and the last three characters are numbers.

TRANSLATE offers an easy way to verify and enforce formats in character fields, which I will explore below. Suppose you need to make sure that users enter a value in the format AAANN99, where A is an uppercase letter (A through Z), N is a digit (0 through 9), and the two trailing nines are just that: two trailing nines. Given this format, the following entries would all be valid:

THR3399

ZYX1299

While these next entries violate the format:

1RR1299: First character is a digit, instead of a number
RRR8Q89: "Q" where number must be, second last digit not a nine
UUU1290: Last digit is not a nine

To validate and enforce this format in an Oracle Forms item, I create a When-Validate-Item trigger for that item. One approach I could take is to look at each character individually and make sure it has the right value. I could use SUBSTR to do this, as follows:

DECLARE
   /* The formatted value. */
   my_value VARCHAR2 (20) := UPPER (:company.industry_code);
BEGIN
   IF SUBSTR (my_value, 1, 1) NOT BETWEEN 'A' and 'Z' OR
      SUBSTR (my_value, 2, 1) NOT BETWEEN 'A' and 'Z' OR
      SUBSTR (my_value, 3, 1) NOT BETWEEN 'A' and 'Z' OR
      SUBSTR (my_value, 4, 1) NOT BETWEEN '0' and '9' OR
      SUBSTR (my_value, 5, 1) NOT BETWEEN '0' and '9' OR
      SUBSTR (my_value, 6, 2) != '99'
   THEN
      MESSAGE (' Please enter code in format AAANN99.');
      RAISE FORM_TRIGGER_FAILURE;
   END IF;
END;

This approach will work fine, in most cases, clearly stating the rules for each of the characters in the code item. Let's see how you would perform this same format check with TRANSLATE:

IF TRANSLATE (UPPER (:company.industry_code),
               '123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',
               'NNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAA') != 'AAANNNN'
   OR :company.industry_code NOT LIKE '_____99'
THEN
   MESSAGE (' Please enter code in format AAANN99.');
   RAISE FORM_TRIGGER_FAILURE;
END IF;

With TRANSLATE, I use the search and replace sets to replace all numbers with "N" and all letters with "A". If the user followed the specified format, then TRANSLATE will return AAANNNN. I then include a second check to see if the last two characters of the string are 99. Notice that I used five underscores as single-character wildcards to make sure the correct number of characters preceded the double nines.

I can also use TRANSLATE to support multiple formats on the same character item, and then determine action in the screen based on the format selected. Suppose that, if a user enters a string in the format NNNAAA, I need to execute a procedure to generate an industry profile. If the user enters a string in the format AAA, I need to execute a procedure to calculate annual sales for that specified department. TRANSLATE will easily sort through these formats as follows:

formatted_entry :=
   TRANSLATE (UPPER (:company.criteria),
               '123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',
               'NNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAA');

IF formatted_entry = 'NNNAAA'
THEN
   gen_ind_profile (:company.criteria);
ELSIF formatted_entry = 'AAA'
THEN
   calc_annual_sales (:company.criteria);
END IF;

If you need to perform pattern or format masking and analysis, TRANSLATE is generally the most efficient and effective solution.