For PA-RISC based systems, the performance of the
family of APIs can be tuned via the environment variables
systems, in addition to
three global variables can be used for performance tuning:
For threaded applications,
uses multiple arenas.
Memory requests from different threads are handled
by different arenas.
can be used to adjust the number of arenas and how many pages each time an
arena expands itself (the expansion factor), assuming that the page size is
In general, the more threads in an application, the more arenas
should be used for better performance.
The number of arenas can be
from 1 to 64 for threaded applications.
applications, only one arena is used.
If the environment
variable is not set, or the number of arenas is set to
be out of the range, the default number of 8 will be
The expansion factor is from 1 to 4096, default
value is 32.
Again, if the factor is out of the range,
the default value will be used.
Here is an example of how to use
$ export _M_ARENA_OPTS = 16:8
This means that the number of arenas is 16, and the expansion
size is 8*4096 bytes.
In general, the more arenas you use, the smaller the
expansion factor should be, and vice versa.
is used to turn on the small block allocator, and to set up parameters
for the small block allocator, namely,
Applications with small block allocator turned on usually run
faster than with it turned off.
Small block allocator can be
turned on through
however, it is not early
enough for C++/Java applications.
variable turns it on before the application starts.
call can still be used the same way.
environment variable is set, and no small block
allocator has been used, the subsequent
calls can still overwrite whatever is set through
If the environment variable is set, and
small block allocator has been used, then
will have no effect.
To use this environment variable,
$ export _M_SBA_OPTS = 512:100:16
This means that the
size is 512, the number of small blocks is 100, and the grain size is 16.
You have to supply all 3 values, and in that order.
If not, the default
values will be used instead.
Three new global variables,
are introduced for Itanium-based systems to over-ride the
When these three variables are initialized within an application,
has no effect.
This way, a finely tuned application can lock in
performance across different user environments.
a subsequent call to
before any block of memory was allocated will
By default, these three variables will be initialized
to zero at start up.
It is the same as setting them to
extern int __hp_malloc_maxfast=512;
extern int __hp_malloc_num_smallblocks=100;
extern int __hp_malloc_grain=16;
By default, SBA (Small Block Allocation) is turned on for Itanium-based systems.
This may contribute to better
A user can set
extern int __hp_malloc_maxfast=-1;
This will turn off SBA.
For all other possible values, please refer to
is used to turn on the thread local cache.
Turning this option on sets up a private cache for each thread
to which access is effectively non-threaded,
so there is less contention on the arenas.
For some multi-threaded
applications this can give a significant performance improvement.
The thread local cache saves blocks of sizes that have previously been
used, and thus may be requested again.
The size of the cache is configurable.
The cache is organized in buckets of sizes that are powers of two; that
is, there is a bucket for all blocks in the size range 64-127 bytes,
another for 128-255 bytes, and so on.
Thread Local Cache can be tuned by setting the
environment variable as follows:
The values must be supplied in the exact order indicated.
The first three parameters are mandatory, and the last two are optional.
denotes the number of pointers cached per bucket.
is 0, then thread local cache is disabled.
The maximum value for
is an indication of the number of buckets.
The maximum block size that will be cached is
can range between 8 and 32.
is an indication in minutes of after how long blocks in an unused cache
will be released to the arena.
is only a hint, so caches may or may not be retired after the
specified time period.
is 0, retirement is disabled.
The maximum value for
is 1440 (that is, 24 hours).
enables cached blocks to be exchanged among threads.
A thread that heavily allocates blocks of a certain size is soon
bound to run out of blocks in its private cache.
The exchange allows the thread to borrow blocks from a
global pool if available.
This may be more efficient than going back to the arena.
is a hint to the caching algorithm indicating a particular number
of cache misses after which it will search the global pool for
appropriate sized blocks.
The algorithm also releases unused cache blocks to
the global pool.
is 0 or not set, cache exchange is turned off.
is an indication of the size of the global cache pool.
This parameter is valid only if
is turned on.
The default value for
The maximum number of blocks that will be cached for each thread is
bucket_size * buckets.
Here are examples of how to use
$ export _M_CACHE_OPTS = 1024:32:20:4:8
This means that
is 20 minutes,
is 4, and
$ export _M_CACHE_OPTS = 1024:32:20
This is a valid configuration where the global pool is not activated.
have no effect on non-threaded applications, while
NOTE: Modifying these variables increases the chances of surfacing
existing user memory defects such as buffer overrun.
malloc(): AES, SVID2, SVID3, XPG2, XPG3, XPG4, FIPS 151-2, POSIX.1, ANSI C
calloc(): AES, SVID2, SVID3, XPG2, XPG3, XPG4, FIPS 151-2, POSIX.1, ANSI C
free(): AES, SVID2, SVID3, XPG2, XPG3, XPG4, FIPS 151-2, POSIX.1, ANSI C
mallinfo(): SVID2, XPG2
mallopt(): SVID2, SVID3, XPG2
realloc(): AES, SVID2, SVID3, XPG2, XPG3, XPG4, FIPS 151-2, POSIX.1, ANSI C