Chapter 14. Extending MySQL
Within the spectrum of relational
database management systems, MySQL has always occupied a unique
position. Its open source nature and fast performance have made it
the most popular database server for small-to-medium web-based
applications. However, its lack of support for features such as
transactions, foreign keys, row-level locking, and stored procedures
have made many pass over it for other applications, though its
performance would otherwise be beneficial.
As MySQL has matured over time, various solutions have emerged for
all these problems. Through alternative table types such as Berkeley
DB and InnoDB, MySQL now supports transactions and row-level locking.
Thanks to the InnoDB table type, the MySQL team has added support for
foreign keys.
One of MySQL's virtues is that it is intended to be
extended. For example, stored procedures remain the most requested
Oracle/SQL Server/Sybase feature lacking in MySQL. Though MySQL
currently has no support for stored procedures (it probably will by
the time you read this book), you can extend MySQL by writing your
own user-defined functions (UDFs). In this chapter, we examine how to
write your own functions and add character sets to MySQL.
14.1. User-Defined Functions
UDFs are simply
native programming functions. MySQL links to the libraries containing
them and executes them as SQL functions in the same way you have used
the PASSWORD( ) and NOW( )
functions. MySQL gives these functions access to the internals of
MySQL and empowers them to manipulate data.
You will generally write UDFs in C. The examples in this chapter are
all in C. However, because MySQL accesses your functions through
shared libraries, you can write them in any language that you can
compile into a native-code shared library. In other words, you can
write a UDF in C, C++, Objective C, C#, Java, VisualBasic, or even
Pascal.
MySQL
supports two types of UDFs: standard
and aggregate. MySQL applies standard functions to each row in the
result set. PASSWORD( ) and DAYNAME(
) are examples of two common standard functions built into
MySQL. If you execute the SQL in the mysql
database:
SELECT UPPER(User) FROM user;
you will see the name of each user in your database in all capital
letters.
An aggregate function operates on
groups of rows. You generally execute aggregate functions in
conjunction with the SQL GROUP
BY clause. This clause causes MySQL to
group result sets into sections, in which each section is a single
row. An aggregate function operates on all of the values of each
group. AVG( ) and MAX( ) are
examples of two common aggregate functions built into MySQL:
SELECT AVG(age) FROM People GROUP BY city;
The result of this query is a single column containing the average of
all age values for people who live in the same city.
You create aggregate and standard functions in much the same
manner—aggregate functions simply have two extra routines.
14.1.1. Standard Functions
Create a standard UDF in three steps, two of
which are optional:
- init
-
When an SQL query calls a UDF, the
init routine for that function is called first, if
it exists. This routine is responsible for allocating data and
setting certain parameters for the main routine.
- main
-
After the init routine
is called, the main routine for the desired UDF is
called. This is the only routine required to exist to define a UDF.
This routine is responsible for processing the data and returning the
result of the function.
- deinit
-
Once the main routine
is finished and the result is returned to the user, MySQL will call a
deinit routine for the function, if it exists. The
purpose of this routine is to deallocate any memory allocated by the
init routine and clean up as necessary.
When an SQL query calls a standard function, these routines are
called in the following manner:
init
main
main
main
...
deinit
MySQL calls the main routine once for each row
returned by the SQL query. The return values of the
main routine comprise the results of the UDF as a
whole.
When creating a UDF, all three routines must reside in the same
library. You can, however, bundle multiple functions in the same
library.
TIP:
We use the terms "routine" and
"function" in a specific manner in
this section to reduce the possibility of confusion. The problem is
that multiple native functions make up a single UDF. When we refer to
a routine, we mean one of the native functions that make up the UDF.
When we refer to a function, we mean the UDF itself.
14.1.1.1. The init routine
The init
routine is the first routine called when a user attempts to execute a
UDF within a SQL query. For a function with the name
remove_html( ), the init
routine must have the name remove_html_init( ).
The init function must have the following
signature:
my_bool func_name_init(UDF_INIT *initid, UDF_ARGS *args, char *message);
The three parameters passed to the init routine
are all pointers to modifiable data that will be made available to
the main function.
- initid
-
This parameter is the main data structure
used by the UDF engine. This structure will be made available to all
three routines for a particular UDF. Any changes that need to be made
to the default values of this structure should be made here in the
init routine so they will be available for the
main routine. The structure has the following
members:
- maybe_null
-
This Boolean value indicates whether the function can return an SQL
NULL value. By default, this value is
false unless any of the function arguments are
nullable, in which case the default is true. To be
safe, if you want the function to return an SQL
NULL value, set the value of this attribute to
true within the init routine.
- decimals
-
This integer value indicates the maximum number of decimals in the
result returned by this function. This attribute is used only if the
function returns a numeric result. The default value of this
attribute is determined by the arguments passed into the function.
The number of decimals in the numeric argument with the most decimal
places is used as the default. Since there is no way of knowing which
values will be passed into the function, if you want to limit the
number of decimals returned, set this attribute within the
init routine.
- max_length
-
This integer value indicates the maximum number of characters (or
digits) in the result returned by this function. The default value
depends on the type of function and the arguments passed to it. For a
string function, the default value is the length of the longest
string argument. For an integer function, the default is 21 digits
(one of which will be a sign character, if necessary). For a decimal
function, the default is 13 digits or other characters (including any
sign character or decimal point) plus the value of the decimals
attribute (see above). There is no performance penalty for setting
this value to a large number (in case the function returns a
BLOB or other large value).
- ptr
-
This generic pointer can be used arbitrarily by the routines for a
UDF. It is most useful to pass some data between the three routines
of the function or to include a reference to some data from another
part of the application. If this pointer will be used for new data
(that is, not a reference to some existing data), the memory should
be allocated here in the init routine. The
main routine will then receive the allocated
pointer to use as it will.
- args
-
This array contains the arguments passed in
from the SQL query. Because the actual processing of the arguments is
performed in the main routine, this structure is
rarely touched during the init routine. There are
two attributes of this structure that are sometimes accessed within
the init routine:
- arg_type
-
This attribute is an array of values that correspond to the constants
STRING_RESULT, INT_RESULT, and
REAL_RESULT. Within the main
routine, this array indicates the types of arguments the routine is
receiving. These types may not always be what the routine is
expecting. Instead of returning an error, it is possible to force the
arguments passed by the user into the types that the
main routine will need. This is done by setting
the value within the arg_type array in the
init routine. For example, if the first argument
needs to be a STRING-type argument, set the first
element of this array to STRING_RESULT. This will
cause any value passed in by the user (even an integer or real value)
to be coerced into a string value before the main
routine is called.
- lengths
-
This attribute is an array of values that correspond to the length of
each argument passed in by the user. Within the
main routine, the attribute describes the actual
length values of the arguments being passed in, row by row. Within
the init routine, this attribute describes maximum
possible length values for string arguments.
- message
- This value is a character pointer used to
store a user-level message about any error that occurs during the
execution of the function. This preallocated character buffer is 200
characters wide by default. However, convention dictates keeping the
message short enough to fit on one line. The message string must be
terminated by a null byte.
The return value of the
init routine is a my_bool
(Boolean) type. If the initialization completes successfully, the
routine should return 0 (false). If an error
occurs, the routine should return 1 (true). In
addition, in the case of an error, a user-level error message should
be placed into the message argument.
14.1.1.2. The main routine
The main
routine is the heart of the UDF. In fact, if no special
initialization or cleanup is required, this routine is the only one
needed to create a UDF. For a function with the name
remove_html( ), the main
routine should have the same name: remove_html( ).
The signature of the main routine depends on the
return value of the UDF. There are
three different SQL return values possible for UDFs:
STRING,
INTEGER, and
REAL. The main
routine for each of these types has the following signatures.
For a STRING:
char *func_name(UDF_INIT *initid, UDF_ARGS *args,
char *result, unsigned long *length,
char *is_null, char *error)
For an INTEGER:
long long func_name(UDF_INIT *initid, UDF_ARGS *args,
char *is_null, char *error)
For a REAL:
double func_name(UDF_INIT *initid, UDF_ARGS *args,
char *is_null, char *error)
All the different routine signatures handle incoming arguments the
same way. The differences lie in how each routine returns the result.
We will consider the four parameters that are identical among the
signatures first:
- initid
-
This is the main data structure used by the
UDF engine. It is a UDF_INIT structure, which is
described in the previous section. Any of the attributes set during
the init routine (including the generic data
ptr attribute) are available during the
main routine. Although there are no hard rules on
the matter, the attributes of these structures should be accessed
read-only from within the main routine. This is
because all the initialization is done during the
init routine, and all the deinitialization is done
during the deinit routine. Therefore, in the
middle, there is no need to modify these values.
- args
-
These are the arguments passed in from the
SQL query as a UDF_ARGS structure with the
following attributes:
- arg_count
-
This is the number of arguments passed into the UDF from the SQL
query. If you want the UDF to have a fixed number of arguments, you
can check the value here and return an error if the number is not
correct. Otherwise, UDFs can support variable numbers of arguments.
- arg_type
-
This is an array of type identifiers that indicate the SQL type of
each argument. The number of elements in this array is equal to the
arg_count attribute. Checking the values of the
arg_type array in the main
routine allows strict type checking of the incoming arguments.
However, as mentioned in the previous section, it is generally more
flexible to set the desired types in the init
routine. This will transform them from whatever the user passed in to
whatever types the main routine is expecting. The
values of the arg_type attribute correspond to the
constants STRING_RESULT,
INT_RESULT, and REAL_RESULT,
defined in the MySQL development headers.
- args
-
This is an array of the actual arguments passed to the function. As
mentioned above, the number of elements in this array is given by the
arg_count property. The values of the arguments
depend on two factors: the type of the argument and whether the
argument was a constant value. Table 14-1 contains
a mapping of types to C types. To accommodate these different
possibilities, the elements of the args array are
passed around as generic void pointers. You must therefore cast them
to the appropriate C type.
Table 14-1. The mapping of MySQL result types to C types
|
Result type
|
C type
|
|
STRING_RESULT
|
char *
|
|
INT_RESULT
|
long long
|
|
REAL_RESULT
|
double
|
- lengths
-
This value is an array of the lengths of the arguments passed to the
UDF. As with the other array attributes (arg_type
and args), the length of the array is given by the
arg_count attribute. Each value of the array is
the length of that argument (and hence the length of the
corresponding element in the args array). For
string arguments, the value is the accurate length of the string for
that particular call to the function. Because the function will be
called once for each row in the SQL query, this value will change to
reflect the length of a string argument each time. Integer and
decimal arguments have fixed lengths that are either defaults or set
within the init routine.
- is_null
-
This argument is a pointer to an integer
value. If the UDF is returning a NULL value for a
particular call to the function, set the value of this integer to 1.
Otherwise, leave it at the default of 0.
- error
-
This argument is a pointer to an integer
value. If an error occurs during the main routine
of the UDF, set the value of this integer to 1. This will return a
NULL value for this particular call to the
function. In addition, the main routine will not
be called for any subsequent rows of data in the query that called
the UDF, and NULL will be returned for each row.
There are two additional arguments specific to UDF that return
STRING results:
- result
-
This argument is a pointer to a character
array. The return value of the UDF should be
placed in the string referenced by this pointer. In addition, the
pointer should be returned as the return value of the
main routine (explained later). The MySQL UDF
engine allocates 255 bytes as the buffer for a string return value.
If the result fits within this limit, simply place
the result into this buffer and return the
pointer. If the result is bigger, you must use a
string buffer that you have allocated. This allocation should be done
by using the generic ptr attribute within the
init routine. The result string
should then be placed into this pointer, which should be returned as
the return value of the function. If you take this route, the memory
allocated for the pointer within the init routine
should be deallocated in the deinit routine.
- length
-
This argument is a pointer to an integer
value. The value referenced by this pointer should be set to the
length of the return value of the UDF. This does not include any
trailing null characters. That is, if the return value of the
function is the string make, the integer
referenced by length should be set to 6.
The return value of the main routine depends upon
the return value of the UDF as a whole. For
INTEGER_RETURN functions, the
main routine should return the return value of the
function as a long long integer value. For
REAL_RETURN functions, the main
routine should return the return value of the function as a
double value. For STRING_RETURN
functions, the main routine should return the
return value of the function as a pointer to a character array. As
mentioned above, if the resulting value is less than 255 characters,
this pointer should be the result pointer passed
into the main routine. Otherwise, it should be a pointer allocated
within the init routine.
14.1.1.3. The deinit routine
The deinit
routine is responsible for freeing any memory allocated by the
init routine and performing any other necessary
cleanup. For a UDF with the name remove_html( ),
the deinit routine must have the name
remove_html_deinit( ). It has the following
signature:
void func_name_deinit(UDF_INIT *initid)
The deinit routine takes one argument, the main
UDF_INIT structure used by all three routines.
This structure contains the generic ptr pointer
that is available for generic use throughout the routines. If this
pointer was allocated during the init routine,
this deinit routine is responsible for freeing
that memory. The deinit routine returns no value.
Example 14-1 shows a UDF that returns the contents of
an image file, given the filename of that image. This function could
be used to efficiently access image data without storing that data as
a BLOB in the database.
Example 14-1. A UDF that returns the contents of an image file based on its filename
#include <mysql/my_global.h>
#include <mysql/my_sys.h>
#include <mysql/mysql.h>
#include <mysql/m_ctype.h>
#include <mysql/m_string.h>
#define MAX_IMAGE_SIZE 1048576
#define MAX_FILE_NAME 1024
/* Our 'init' routine is responsible for some simple set-up:
- Check the argument count. This function must have exactly
one argument, the filename, so we return an error
if there are any other number.
- Coerce the types of the arguments. Theoretically, the
image names could be stored as integer IDs
(and thus stored in files with simple numbers
for names in the filesystem). Therefore, we want
to allow any type of argument, and coerce it to a string.
- Allocate the data buffer to hold the image data. This is a fairly large
buffer to accommodate images of differing sizes.
*/
my_bool getImage_init(UDF_INIT *initid, UDF_ARGS *args, char *message) {
if (args->arg_count != 1) {
strmov(message,"Usage: getImage( fileName )");
return 1;
}
args->arg_type[0] = STRING_RESULT;
if ( !(initid->ptr =
(char *) malloc( sizeof(char) * MAX_IMAGE_SIZE ) ) ) {
strmov(message, "Couldn't allocate memory!");
return 1;
}
bzero( initid->ptr, sizeof(char) * MAX_IMAGE_SIZE );
return 0;
}
/* This routine frees the memory allocated by the image buffer */
void getImage_deinit(UDF_INIT *initid) {
if (initid->ptr)
free(initid->ptr);
}
/* The main routine does most of the work for this function. The argument
passed to the function is the filename of an image file. This function first
uses 'chdir' to set the directory to the directory of the image file. This
prevents misuse of this function by using it to read an arbitrary file on
the filesystem. (We also remember the old working directory so that we can
reset it at the end.)
Once we've set the current directory, we open the image file using the
Filename passed in as the argument. If the image opens successfully, we read
the contents of the file into the empty data buffer.
Before returning the data, we set the length pointer to tell MySQL how
Large our image buffer is. Finally, we return the pointer to the data
as the string result of this function.
*/
char *getImage(UDF_INIT *initid, UDF_ARGS *args, char *result,
unsigned long *length, char *is_null, char *error) {
FILE *img_file;
size_t img_size;
char filename[MAX_FILE_NAME];
char old_dir[MAX_FILE_NAME];
bzero(filename, MAX_FILE_NAME);
bzero(old_dir, MAX_FILE_NAME);
getcwd(old_dir, MAX_FILE_NAME);
chdir("/path/to/my/pictures");
strncpy( filename, args->args[0], args->lengths[0] );
img_file = fopen( filename, "r" );
if (! img_file ) {
*is_null = 1;
return 0;
}
img_size =
fread( (void *)initid->ptr, sizeof(char),
MAX_IMAGE_SIZE, img_file );
*length = (unsigned long)img_size;
chdir(old_dir);
fclose(img_file);
return initid->ptr;
}
14.1.2. Aggregate Functions
As mentioned earlier, aggregate
functions work just like standard functions except that they have a
couple of extra routines. Those routines drive the aggregation for
each group of data:
- reset
-
This routine is called at the beginning of
each group of data. It resets any local data used to aggregate the
data so that each group starts with an empty aggregation.
- add
-
This routine is called for each row of data
within a group. The purpose of the function is to perform any data
aggregation required for the group.
The order in which routines are called illustrates the differences
between standard and aggregate functions:
init
reset
add
add
add
...
main
reset
add
add
add
...
main
reset
add
add
add
...
main
...
deinit
While a standard function is comprised of a single loop, an aggregate
function is a loop of loops. For a standard function, the
main routine is called for each row of data, which
corresponds directly with each row of the result set. For an
aggregate function, the rows of the result set are made up of groups,
in which each row represents a group of rows of data. The
add routine is called for each row of data (except
the first), while the main routine is called only
once for each group (resulting in one row of the result set).
While the signatures of the routines called within a standard
function do not differ for an aggregate function, their roles are
somewhat different.
14.1.2.1. init
Unlike a standard function, the
init function is almost always used for aggregate
functions. The
initid->ptr generic pointer plays an
important role, which it does not always do for other UDFs. This
pointer is initialized to store data about a single group of data. As
the rows are processed by the add routine, the
data structure is populated. When the main routine
is finally called for each group, it accesses the populated structure
to perform the desired calculation.
All the other abilities of the init routine are
also available for aggregate functions, such as forcing the
arguments' types and sizes and indicating the return
type.
14.1.2.2. reset
The reset routine works as a sort of
init routine that is called once per group. For a
function named calc_interest( ), the
reset routine must be named
calc_interest_reset( ). It has the following
signature:
void func_name_reset(UDF_INIT* initid, UDF_ARGS* args,
char* is_null, char *error )
The arguments are those already covered under the
main routine. The reset routine
returns no value. At the beginning of each group, this routine is
called to clear out any data that is used on per-group data. This is
generally stored in the initid->ptr pointer as
a data structure that contains the aggregated group data. This
routine should zero out or otherwise clear this structure to prepare
for a new group of data.
Because of an oddity in the MySQL UDF engine, the
add routine does not get called for the first row
of data in every group. Therefore, if your aggregate function needs
to act upon every row of data (and most will), it is necessary to
explicitly call the add routine from the
reset routine. This is because this first row of
data is passed to the reset routine, which can
then pass it to the add routine. For each
subsequent row of the group, the add routine will
be called directly.
14.1.2.3. add
The add routine is called on every row of data
except the first within every group of the SQL query. For a function
named calc_interest( ), the add
routine must be named calc_interest_add( ). It has
the following signature:
void func_name_add(UDF_INIT* initid, UDF_ARGS* args,
char* is_null, char *error )
The arguments are those already covered under the
main routine. The add routine
returns no value. The purpose of the add routine
is to aggregate the data from the rows that comprise each group
specified in the SQL query. This is generally done by taking the data
from the rows and accumulating it in a data structure (in some matter
that makes sense to the operation). The data structure is usually
stored in the initid->ptr pointer, allocated by
the init routine.
14.1.2.4. main
As mentioned earlier, the purpose of the
main routine is significantly different from that
of a standard UDF. Instead of operating on single rows of data, the
main routine operates on aggregated data collected
by the add routine. This data is generally stored
in a data structure referenced by the
initid->ptr pointer, allocated by the
init routine. The main routine
takes the data from this structure and performs any calculation
needed on the aggregated data.
The main routine returns the result in the same
manner as a standard UDF.
14.1.2.5. deinit
The deinit
routine works identically to that of a standard function. It will
almost always exist for an aggregate function, because there is
almost always an allocated initid->ptr
structure to free.
14.1.2.6. Aggregate example
Example 14-2
is a
UDF that calculates the average distance from the center of a range
of points. This function takes two arguments, which correspond to the
x and y coordinates of points on a graph. The function then
aggregates all the points given to it by the SQL query and returns
the distance from the center of the graph (x=0, y=0) to the average
of all the points.
Example 14-2. An aggregate UDF that calculates the distance from the center of a graph
#include <mysql/my_global.h>
#include <mysql/my_sys.h>
#include <mysql/mysql.h>
#include <mysql/m_ctype.h>
#include <mysql/m_string.h>
/* This is the structure that defines the aggregate of all of the points
* It contains the sum of all 'x' values, the sum of all 'y' values,
* and the total number of points counted (for a single group).
*/
struct point {
double x;
double y;
long count;
};
/* Our 'init' routine has several duties, this time:
- Check the argument count. This function must have exactly
two arguments, x and y, so we return an error if there are
any other number.
- Coerce the types of the arguments. Since we are dealing with decimal
values, we don't want to penalize the user for passing in
integers, or even numeric values stored within a string, so
we coerce the incoming arguments into a 'real' type.
- Set the 'maybe_null' flag to zero. This function will never return
a null value. Maybe a 'zero' value, but not null.
- Set the 'decimals' flag to 5 and the 'max_length' flag to 10.
This will cause the results to be in the format +/-###.#####
with a guarantee of 5 decimal places and up to three integer
places.
- Allocate the point structure. This structure will be used to house
the aggregate point data as we accumulate it.
*/
my_bool distanceFromCenter_init( UDF_INIT* initid, UDF_ARGS* args,
char* message ) {
/* Check number of arguments */
if (args->arg_count != 2) {
strmov(message,"Usage: distanceFromCenter( x, y )");
return 1;
}
/* Coerce arguments to 'real' */
args->arg_type[0] = REAL_RESULT;
args->arg_type[1] = REAL_RESULT;
/* Set flags */
initid->maybe_null = 0;
initid->decimals = 5;
initid->max_length = 10;
/* Allocate pointer */
initid->ptr = malloc( sizeof( struct point ) );
if (! initid->ptr ) {
strmov(message, "Cannot allocate memory for 'point' structure");
return 1;
}
return 0;
}
/* The 'deinit' routine only has to de-allocate the 'point' structure */
void distanceFromCenter_deinit( UDF_INIT* initid ) {
free(initid->ptr);
}
/* The 'add' routine is called once for each row of data. It is responsible
for aggregating the point information. It simply adds the 'x' and 'y'
values to the totals stored in the point structure and then increments
the count in the same structure.
*/
void distanceFromCenter_add( UDF_INIT* initid, UDF_ARGS* args,
char* is_null, char *error ) {
((struct point *)initid->ptr)->x += *((double *)args->args[0]);
((struct point *)initid->ptr)->y += *((double *)args->args[0]);
((struct point *)initid->ptr)->count++;
}
/* The 'reset' routine is called at the beginning of each group. It simply
zeros out the information in the point structure in preperation for a new
group of data. As mentioned earlier, this is the only routine called on the
first row of data in a group. Therefore, we must also call the 'add' routine
from this routine, to include the first row of data in the aggregate set.
*/
void distanceFromCenter_reset( UDF_INIT* initid, UDF_ARGS* args,
char* is_null, char *error ) {
((struct point *)initid->ptr)->x = 0.0;
((struct point *)initid->ptr)->y = 0.0;
((struct point *)initid->ptr)->count = 0;
distanceFromCenter_add( initid, args, is_null, error );
}
/* The 'main' routine has a very simple role here. It just has to take the
aggregate data from the point structure and calculate a point (x,y) value
for that data. It then runs the classic a^2 + b^2 = c^2 formula to determine
the distance to the center of the grid for that point.
double distanceFromCenter( UDF_INIT* initid, UDF_ARGS* args,
char* is_null, char *error ) {
double avg_x, avg_y;
/* Calculate the average 'x' and the average 'y'
avg_x = ((struct point *)initid->ptr)->x /
(double)((struct point *)initid->ptr)->count;
avg_y = ((struct point *)initid->ptr)->y /
(double)((struct point *)initid->ptr)->count;
/* Return the square root of a^2 + b^2 */
return sqrt( avg_x * avg_x + avg_y * avg_y );
}
14.1.3. Calling a UDF
Now that we
have successfully created the routines necessary for a UDF, the next
step is to call those functions from an SQL query. This is
accomplished in three steps:
-
Compiling the function routines
-
Loading the UDF
-
Running the UDF
As discussed earlier, the routines that make up a UDF can be
implemented in any programming language that can be compiled into an
operating system-level shared library. On most Unix-based
systems, a shared library is indicated by a .so
extension. In Microsoft Windows, shared libraries have
.dll extensions.
Once the routines are written, use whatever procedure is generally
followed for compiling source code into a shared library. For
example, compiling a shared library from a C or C++ file using the
GNU C Compiler on a Unix-based system
can be accomplished with the following command:
gcc -shared -o myfunc.so myfunc.c
The -shared flag indicates that a shared library
should be created from the functions within the source file.
Different compilers, operating systems, and programming languages
will have different methods for creating a shared library.
As mentioned earlier, multiple UDFs can be defined and implemented
within a single shared library file. However, all the routines
necessary for each function must be present within that file. That
is, you can place multiple functions within a single shared library,
but you cannot split the routines necessary for a single function
between multiple libraries.
Once a shared library has been created, it should be placed somewhere
on the operating system's filesystem that is
designated for shared libraries. On Unix-based systems, this generally means
the /lib
and /usr/lib directories as well as any
directories specified in the
LD_LIBRARY_PATH environment variable
(although the name of the environment variable differs on some
Unix-based systems). For Microsoft Windows systems, this generally means the
WINDOWS\System32 directory (where
WINDOWS is the main Windows directory, often
C:\Winnt for Windows NT and 2000 systems).
Finally, it may be necessary to restart the MySQL server for it to
find the shared library file. This may not be necessary, though, so
if restarting the MySQL server is an issue, first try to load the
function library without restarting.
Once the shared library file is in a place that is accessible to the
MySQL server, it is possible to access the UDF directly from MySQL.
To do this, the UDF must be loaded, which generally means loading the
shared library into MySQL's memory so it can access
the function routines.
Loading a new
UDF is accomplished through the SQL CREATE
FUNCTION statement:
CREATE [AGGREGATE] FUNCTION function_name
RETURNS {STRING|REAL|INTEGER}
SONAME shared_library_name
Basically, this statement indicates the name of the UDF, its return
type, and the shared library where it is located. The optional
AGGREGATE keyword indicates that the function is
an aggregate function.
The name of the function must be the name of the
main routine used to implement the function. The
name is case sensitive, so it must be capitalized the same way it is
in the shared library.
The return type of the function is specified by one of the keywords
STRING, REAL, or
INTEGER. This value must match the return type of
the main routine.
The shared library is given as an SQL literal string. It should be
only the filename (not a directory path), and it is case sensitive.
The shared library given here must be accessible to MySQL.
Behind the scenes, loading a UDF involves adding and removing data
from the func table of the system
mysql database. Therefore, to execute the
CREATE FUNCTION statement, a user must have
INSERT and
DELETE privileges for that table (or for
the whole database). In general, this should be restricted to
administrative users, because loading arbitrary UDFs can give a user
almost complete control over the MySQL system. (Imagine loading a
function that kills the main MySQL process. Because the function is
run from MySQL itself, nothing will stop it from shutting down the
system abruptly.)
The following SQL calls add two UDFs, one standard and the other
aggregate:
CREATE FUNCTION getImage RETURNS STRING SONAME 'myfuncs.so';
CREATE AGGREGATE FUNCTION distanceFromCenter
RETURNS REAL SONAME 'myfuncs.so';
Once a UDF has been loaded into MySQL, it can be accessed just as if
it were a built-in MySQL function, because MySQL functions are case
insensitive. Even though it was necessary to specify the correct case
when loading the function, once loaded, the function can be
executed
with
any
case:
SELECT GETIMAGE( image_name ) AS image FROM Product;
SELECT distanceFromCenter( x, y ) FROM Point GROUP BY shape_id
 |  |  | | 13.4. A Guest Book Servlet |  | 14.2. Alternative Character Sets |
Copyright © 2003 O'Reilly & Associates. All rights reserved.
|