EXTERNAL INFLUENCES
For PA-RISC based systems, the performance of the
malloc()
family of APIs can be tuned via the environment variables
_M_ARENA_OPTS,
_M_SBA_OPTS,
and
_M_CACHE_OPTS.
For
Itanium®-based
systems, in addition to
_M_ARENA_OPTS,
_M_SBA_OPTS,
and
_M_CACHE_OPTS,
three global variables can be used for performance tuning:
__hp_malloc_maxfast,
__hp_malloc_num_smallblocks,
and
__hp_malloc_grain.
For threaded applications,
malloc()
uses multiple arenas.
Memory requests from different threads are handled
by different arenas.
_M_ARENA_OPTS
can be used to adjust the number of arenas and how many pages each time an
arena expands itself (the expansion factor), assuming that the page size is
4096 bytes.
In general, the more threads in an application, the more arenas
should be used for better performance.
The number of arenas can be
from 1 to 64 for threaded applications.
For non-threaded
applications, only one arena is used.
If the environment
variable is not set, or the number of arenas is set to
be out of the range, the default number of 8 will be
used.
The expansion factor is from 1 to 4096, default
value is 32.
Again, if the factor is out of the range,
the default value will be used.
Here is an example of how to use
_M_ARENA_OPTS,
$ export _M_ARENA_OPTS = 16:8
This means that the number of arenas is 16, and the expansion
size is 8*4096 bytes.
In general, the more arenas you use, the smaller the
expansion factor should be, and vice versa.
_M_SBA_OPTS
is used to turn on the small block allocator, and to set up parameters
for the small block allocator, namely,
maxfast,
grain,
and
numlblks.
Applications with small block allocator turned on usually run
faster than with it turned off.
Small block allocator can be
turned on through
mallopt();
however, it is not early
enough for C++/Java applications.
The environment
variable turns it on before the application starts.
The
mallopt()
call can still be used the same way.
If the
environment variable is set, and no small block
allocator has been used, the subsequent
mallopt()
calls can still overwrite whatever is set through
_M_SBA_OPTS.
If the environment variable is set, and
small block allocator has been used, then
mallopt()
will have no effect.
To use this environment variable,
$ export _M_SBA_OPTS = 512:100:16
This means that the
maxfast
size is 512, the number of small blocks is 100, and the grain size is 16.
You have to supply all 3 values, and in that order.
If not, the default
values will be used instead.
Three new global variables,
__hp_malloc_maxfast,
__hp_malloc_num_smallblocks,
and
__hp_malloc_grain
are introduced for Itanium-based systems to over-ride the
_M_SBA_OPTS
environment option.
When these three variables are initialized within an application,
_M_SBA_OPTS
has no effect.
This way, a finely tuned application can lock in
performance across different user environments.
However, as
_M_ARENA_OPTS,
a subsequent call to
mallopt()
before any block of memory was allocated will
change the
malloc()
behavior.
By default, these three variables will be initialized
to zero at start up.
It is the same as setting them to
extern int __hp_malloc_maxfast=512;
extern int __hp_malloc_num_smallblocks=100;
extern int __hp_malloc_grain=16;
By default, SBA (Small Block Allocation) is turned on for Itanium-based systems.
This may contribute to better
application performance.
A user can set
extern int __hp_malloc_maxfast=-1;
This will turn off SBA.
For all other possible values, please refer to
mallopt().
_M_CACHE_OPTS
is used to turn on the thread local cache.
Turning this option on sets up a private cache for each thread
to which access is effectively non-threaded,
so there is less contention on the arenas.
For some multi-threaded
applications this can give a significant performance improvement.
The thread local cache saves blocks of sizes that have previously been
used, and thus may be requested again.
The size of the cache is configurable.
The cache is organized in buckets of sizes that are powers of two; that
is, there is a bucket for all blocks in the size range 64-127 bytes,
another for 128-255 bytes, and so on.
Thread Local Cache can be tuned by setting the
_M_CACHE_OPTS
environment variable as follows:
_M_CACHE_OPTS=bucket_size:buckets:retirement_age:
max_cache_misses:num_global_slots
The values must be supplied in the exact order indicated.
The first three parameters are mandatory, and the last two are optional.
bucket_size
denotes the number of pointers cached per bucket.
If
bucket_size
is 0, then thread local cache is disabled.
The maximum value for
bucket_size
is 32768.
buckets
is an indication of the number of buckets.
The maximum block size that will be cached is
2^buckets.
buckets
can range between 8 and 32.
retirement_age
is an indication in minutes of after how long blocks in an unused cache
will be released to the arena.
retirement_age
is only a hint, so caches may or may not be retired after the
specified time period.
If
retirement_age
is 0, retirement is disabled.
The maximum value for
retirement_age
is 1440 (that is, 24 hours).
max_cache_misses
enables cached blocks to be exchanged among threads.
A thread that heavily allocates blocks of a certain size is soon
bound to run out of blocks in its private cache.
The exchange allows the thread to borrow blocks from a
global pool if available.
This may be more efficient than going back to the arena.
max_cache_misses
is a hint to the caching algorithm indicating a particular number
of cache misses after which it will search the global pool for
appropriate sized blocks.
The algorithm also releases unused cache blocks to
the global pool.
If
max_cache_misses
is 0 or not set, cache exchange is turned off.
num_global_slots
is an indication of the size of the global cache pool.
This parameter is valid only if
max_cache_misses
is turned on.
The default value for
num_global_slots
is 8.
The maximum number of blocks that will be cached for each thread is
bucket_size * buckets.
Here are examples of how to use
_M_CACHE_OPTS,
$ export _M_CACHE_OPTS = 1024:32:20:4:8
This means that
bucket_size
is 1024,
buckets
is 32,
retirement_age
is 20 minutes,
max_cache_misses
is 4, and
num_global_slots
is 8.
$ export _M_CACHE_OPTS = 1024:32:20
This is a valid configuration where the global pool is not activated.
_M_ARENA_OPTS
and
_M_CACHE_OPTS
have no effect on non-threaded applications, while
_M_SBA_OPTS
has.
NOTE: Modifying these variables increases the chances of surfacing
existing user memory defects such as buffer overrun.
STANDARDS CONFORMANCE
malloc(): AES, SVID2, SVID3, XPG2, XPG3, XPG4, FIPS 151-2, POSIX.1, ANSI C
calloc(): AES, SVID2, SVID3, XPG2, XPG3, XPG4, FIPS 151-2, POSIX.1, ANSI C
free(): AES, SVID2, SVID3, XPG2, XPG3, XPG4, FIPS 151-2, POSIX.1, ANSI C
mallinfo(): SVID2, XPG2
mallopt(): SVID2, SVID3, XPG2
realloc(): AES, SVID2, SVID3, XPG2, XPG3, XPG4, FIPS 151-2, POSIX.1, ANSI C