11.2 Character Function ExamplesNow that you have a solid grounding in all the different character functions, you should find the extended examples in the following sections easy to follow. I also hope that you'll find that the concepts embodied by the examples come in handy in your own applications. 11.2.1 Parsing a NameHave you ever noticed that a person's name is a complicated data structure? You've got a first name, last name, middle initial, title, salutation, and who knows what else. It is also impossible to guarantee the uniqueness of a person's name, no matter how many attributes you plan to maintain. That's why I memorized my nine-digit Social Security number early in life. The name of a human being is made up of many components. To make things even more complicated, though, there are at least two common formats for displaying and specifying a person's name:
and:
But of course a name could also be formatted as follows:
as in Mr. Eli S. Hamilton Feuerstein, III, PhD Why should you care about these different formats? Suppose my application has a table of callers structured like this:
and that the caller_id column is a foreign key in a number of other tables. I strongly believe that users should never have to see or enter the actual foreign key values, unless there is a specific application requirement to do so. Instead, the user should be able to work with the name of the caller and leave it to the PL/SQL code to get the key. Users don't need to know that the caller named "Steven Feuerstein" has called ID number 14067. They just know the name and they should be allowed to enter the name in whatever format they choose: FIRST MIDDLE LAST LAST, FIRST MIDDLE LAST, FIRST LAST FIRST LAST and so on. My application should be smart enough to figure out the intended name. To do this, I would have to parse the name entered by the user into its component parts, dynamically interpreting the type of format the user has employed. This parsing process is performed using a variety of string functions. Now if you think I am going to provide you with code that will be able to make sense of a name like Mr. Eli S. Hamilton Feuerstein, III, PhD, then I would ask you to please lower your expectations. Instead, I will present several versions of a procedure that parses a name assuming a simpler format. I will then leave the more general and truly bewildering cases as, ahem, exercises for the reader. 11.2.1.1 Parsing first and last names onlyThe procedure that follows, parse_name, accepts a full name and returns the first and last names, separated out of that combined string. The first version of this procedure understands and interprets three different formats: FIRST LAST LAST, FIRST LAST The parse_name algorithm is straightforward, applying the following rules:
In all of these cases, the first and last names are returned in uppercase. The following table shows how parse_name will interpret various name strings:
The following version of parse_name doesn't check for middle names and uses LTRIM, RTRIM, INSTR, and SUBSTR to separate a full name into its first and last name components:
As you saw in the previous table, parse_name does not do such a great job with middle names when provided in the format FIRST MIDDLE LAST. In fact, it doesn't recognize middle names as a separate component in a person's name. In the next section we look at a version of parse_name that does handle middle names intelligently. 11.2.1.2 Parsing first, last, and middle namesIn order to handle potential middle names, I need to come up with a new set of possible formats which will be recognized by the procedure. I can think of the following: FIRST MIDDLE LAST LAST, FIRST MIDDLE LAST, FIRST LAST FIRST LAST I could also try to handle MIDDLE LAST, FIRST -- this would seem to me, however, to be a very unusual and awkward way to enter a name. I am going to ignore it. Given these formats, I will enhance the parse_name procedure shown in the previous section so that it applies the following additional rules:
We've already seen how to find the first space in a string. How can we find the last occurrence of the space in a string? We will make use of the negative starting point for the INSTR function as follows (the INSTR function is discussed earlier in the chapter): last_space_loc := INSTR (fullname_int, ' ', -1, 1); lname_out := SUBSTR (fullname_int, delim_space+1); The enhanced version of parse_name is shown below. It takes the full name and returns up to three different components of that name. The three component parameters are given modes of IN OUT, rather than just OUT, because I make reference to the parameters in the program as I manipulate the values. Because of this they cannot be simply OUT parameters. Here is the version of parse_name that checks for middle names:
You can see how the code grew considerably more complex just by adding the necessary parsing for middle name. You can see why I don't want to show you how to handle titles, suffixes, and salutations. Of course, in many data entry screens, these other elements of the name are isolated into separate fields or items on the screen. 11.2.2 Implementing Word Wrap for Long TextSuppose you have a very long line of text and you need to wrap it to fit within a certain line length. Instead of breaking apart words, you want to fit as many separate words as you can onto a single line and then start a new line. The wrap package contains the to_paragraph procedure,[ 2 ] which takes one long line of text and breaks it into separate lines, depositing those separate lines into a PL/SQL table. The procedure also contains calls to the built-in DBMS_OUTPUT package so it can display a trace of the way it performs the analysis to break apart the long text.
The following script tests the to_paragraph procedure and provides a trace of the lines that are being placed in the table. Notice that David uses RPAD to generate a "ruler," which shows clearly how the long text will need to be broken up into the paragraph format:
Here are two examples of the output from the test script. The first number in square brackets shows the position of the last space found before the line break. The second number in square brackets shows the starting position in the current line and location of the first character in the next line: ==================== This is a very long line of text which we will wrap to smaller lines 12345678901234567890123456789012345678901234567890123456789012345678 Wrapping to: 10 characters. ==================== "This is a v"[ 10][ 1] [ 11]This is a "very long l"[ 10][ 11] [ 21]very long "line of tex"[ 8][ 21] [ 29]line of "text which "[ 11][ 29] [ 40]text which "we will wra"[ 8][ 40] [ 48]we will "wrap to sma"[ 8][ 48] [ 56]wrap to "smaller lin"[ 8][ 56] [ 64]smaller "lines " [ 6][ 64] [ 70]lines ==================== ==================== This is a very long line of text which we will wrap to smaller lines 12345678901234567890123456789012345678901234567890123456789012345678 Wrapping to: 6 characters. ==================== "This is"[ 5][ 1] [ 6]This "is a ve"[ 5][ 6] [ 11]is a "very lo"[ 5][ 11] [ 16]very "long li"[ 5][ 16] [ 21]long "line of"[ 5][ 21] [ 26]line "of text"[ 3][ 26] [ 29]of "text wh"[ 5][ 29] [ 34]text "which w"[ 6][ 34] [ 40]which "we will"[ 3][ 40] [ 43]we "will wr"[ 5][ 43] [ 48]will "wrap to"[ 5][ 48] [ 53]wrap "to smal"[ 3][ 53] [ 56]to "smaller"[ 0][ 56] [ 62]smalle "r lines"[ 2][ 62] [ 64]r "lines " [ 6][ 64] [ 70]lines ==================== Why would you want to break up text into paragraphs? One reason is to be able to place that text into a formatted document, such as a newsletter or newspaper. Such text usually needs to be filled out to the margins. The output from a call to wrap.to_paragraph would very naturally then become input to the fill_text function, described in the next section. 11.2.3 Filling Text to Fit a LineIt's very easy to left-align a string in PL/SQL -- LTRIM it. It is also easy to right-align a string in a field or item -- LPAD it with spaces to the full length of the field or item; that will force the string all the way to the right. But how would you justify the string to make it look like a line in a newspaper column, with the first word on the left margin and the last letter of the last word on the right margin? To justify a string you need to add blanks to spaces between words until the line has been stretched to fill the field or item. You also want to space the blanks evenly across the column, so that it's as readable as possible. The fill_text function shown in the next example accepts a string and line size and returns a string of the specified length, with spaces distributed throughout the string. In order to achieve an even distribution of spaces, fill_text employs two, nested loops. The inner loop moves from word to word in the string and adds a single blank space after each word. The outer loop calls the inner loop again and again until the string is filled to the specified size. A local module skips over contiguous spaces to find the location of the next nonblank character:
The filltext.sf file contains the function shown above as well as a procedure version of the function, which will show you through calls to DBMS_OUTPUT.PUT_LINE how the fill_text algorithm builds its string. After you run the script in that file, you will have a filled_text function and a fill_text procedure. If you then execute the command:
You will see the following output from the procedure call: 123456789012345678901234567890 ============================== this is not very long this is not very long this is not very long this is not very long this is not very long this is not very long this is not very long this is not very long this is not very long this is not very long this is not very long ============================== 123456789012345678901234567890 You can see how the function dispersed the spaces evenly throughout the string. The final line is repeated twice because it was displayed from within the function and by the SQL*Plus script. 11.2.4 Counting Substring Occurrences in StringsYou now know how to use INSTR to find the location of a substring within a string. This means that you also know how to determine whether or not a certain substring is in that string (INSTR returns zero if the substring is not found). What if you want to figure out how many times a particular substring occurs within a string? The first thing that you should do when asked to implement a request like this is to make sure that Oracle's SQL, PL/SQL, or some tool doesn't already provide what you need. I sure hate to spend an hour writing some convoluted program only to find out that a built-in program provided with the tool already does it. Become familiar with what is available, and don't be the least bit embarrassed to go back to the manuals for a refresher. So, I ask myself, is there a character function that will return the number of times a substring occurs within a string? Afraid not -- but checking was the right thing to do. Fortunately, the flexibility of INSTR makes it easy to build such a utility for ourselves. In fact, I use INSTR to produce two different versions of this freq_instr ("frequency in string") function. The first version, freq_instr1, counts the number of occurrences of a substring by changing the starting location of the INSTR-based search to just past the location of the last match. From that point on, it always searches for the first occurrence. The second version, freq_instr2, also counts the number of occurrences of a substring using INSTR as well. However, in this function, the starting location of the search always remains 1, and the nth_occurrence argument to INSTR (the fourth argument) is bumped up after each occurrence is found. Here's the first version:
Now let's take a look at the second version of freq_instr, which keeps the starting location of the call to INSTR at a constant value of 1 and increments the number of the occurrence we wish to extract from the string. Both approaches do the job, but this second version uses a little less code and relies more directly and naturally on INSTR to do the job.
In freq_instr2, I use INSTR to keep on getting the next occurrence of the substring until there are no more, letting INSTR determine the number of occurrences. In freq_instr1, I explicitly move the starting point of the search through the string and then use INSTR to get the next occurrence. Will both functions always return the same values? And if not, which one returns the correct values? In fact, freq_instr1 and freq_instr2 do not count the same number of occurrences of a substring in a string. This discrepancy relates back to the way INSTR works -- consider the following results:
The first version, freq_instr1, only finds one match for aa because it moves the starting point for the next search for an aa past the second a in the string aaa, right to the last character of the string. The freq_instr2 function, on the other hand, uses the second a in the string aaa as the starting point for the second occurrence of aa and so returns 2 rather than 1. Which answer is correct? It depends on the detailed specifications for the function. Either answer could be right; it depends on the question. 11.2.5 Verifying String Formats with TRANSLATEOracle Forms allows you to specify a format mask on screen items that are numbers and dates. With a number mask of 999"-"99"-"9999, for example, a user could enter 123457890 and have Oracle Forms automatically convert the entry to 123-45-7890 for Social Security number formatting. With a date mask of MMDDYY, a user could enter 122094 and have Oracle Forms automatically convert that information to the internal date format for December 20, 1994. Unfortunately, you can use format masks only on number and date items. You cannot format a character item. You cannot use a mask, for example, to ensure that users enter the department code in the format AAANNN, where the first three characters are letters and the last three characters are numbers. TRANSLATE offers an easy way to verify and enforce formats in character fields, which I will explore below. Suppose you need to make sure that users enter a value in the format AAANN99, where A is an uppercase letter (A through Z), N is a digit (0 through 9), and the two trailing nines are just that: two trailing nines. Given this format, the following entries would all be valid:
While these next entries violate the format:
To validate and enforce this format in an Oracle Forms item, I create a When-Validate-Item trigger for that item. One approach I could take is to look at each character individually and make sure it has the right value. I could use SUBSTR to do this, as follows:
This approach will work fine, in most cases, clearly stating the rules for each of the characters in the code item. Let's see how you would perform this same format check with TRANSLATE:
With TRANSLATE, I use the search and replace sets to replace all numbers with "N" and all letters with "A". If the user followed the specified format, then TRANSLATE will return AAANNNN. I then include a second check to see if the last two characters of the string are 99. Notice that I used five underscores as single-character wildcards to make sure the correct number of characters preceded the double nines. I can also use TRANSLATE to support multiple formats on the same character item, and then determine action in the screen based on the format selected. Suppose that, if a user enters a string in the format NNNAAA, I need to execute a procedure to generate an industry profile. If the user enters a string in the format AAA, I need to execute a procedure to calculate annual sales for that specified department. TRANSLATE will easily sort through these formats as follows:
If you need to perform pattern or format masking and analysis, TRANSLATE is generally the most efficient and effective solution.
Copyright (c) 2000 O'Reilly & Associates. All rights reserved. | |||||||||||||||||||||||
|
|