aboutsummaryrefslogtreecommitdiff
path: root/dep/jemalloc/ChangeLog
diff options
context:
space:
mode:
Diffstat (limited to 'dep/jemalloc/ChangeLog')
-rw-r--r--dep/jemalloc/ChangeLog262
1 files changed, 257 insertions, 5 deletions
diff --git a/dep/jemalloc/ChangeLog b/dep/jemalloc/ChangeLog
index d56ee999e69..8ed42cbeadb 100644
--- a/dep/jemalloc/ChangeLog
+++ b/dep/jemalloc/ChangeLog
@@ -1,10 +1,262 @@
Following are change highlights associated with official releases. Important
-bug fixes are all mentioned, but internal enhancements are omitted here for
-brevity (even though they are more fun to write about). Much more detail can be
-found in the git revision history:
+bug fixes are all mentioned, but some internal enhancements are omitted here for
+brevity. Much more detail can be found in the git revision history:
https://github.com/jemalloc/jemalloc
+* 4.0.4 (October 24, 2015)
+
+ This bugfix release fixes another xallocx() regression. No other regressions
+ have come to light in over a month, so this is likely a good starting point
+ for people who prefer to wait for "dot one" releases with all the major issues
+ shaken out.
+
+ Bug fixes:
+ - Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of large
+ allocations that have been randomly assigned an offset of 0 when
+ --enable-cache-oblivious configure option is enabled.
+
+* 4.0.3 (September 24, 2015)
+
+ This bugfix release continues the trend of xallocx() and heap profiling fixes.
+
+ Bug fixes:
+ - Fix xallocx(..., MALLOCX_ZERO) to zero all trailing bytes of large
+ allocations when --enable-cache-oblivious configure option is enabled.
+ - Fix xallocx(..., MALLOCX_ZERO) to zero trailing bytes of huge allocations
+ when resizing from/to a size class that is not a multiple of the chunk size.
+ - Fix prof_tctx_dump_iter() to filter out nodes that were created after heap
+ profile dumping started.
+ - Work around a potentially bad thread-specific data initialization
+ interaction with NPTL (glibc's pthreads implementation).
+
+* 4.0.2 (September 21, 2015)
+
+ This bugfix release addresses a few bugs specific to heap profiling.
+
+ Bug fixes:
+ - Fix ixallocx_prof_sample() to never modify nor create sampled small
+ allocations. xallocx() is in general incapable of moving small allocations,
+ so this fix removes buggy code without loss of generality.
+ - Fix irallocx_prof_sample() to always allocate large regions, even when
+ alignment is non-zero.
+ - Fix prof_alloc_rollback() to read tdata from thread-specific data rather
+ than dereferencing a potentially invalid tctx.
+
+* 4.0.1 (September 15, 2015)
+
+ This is a bugfix release that is somewhat high risk due to the amount of
+ refactoring required to address deep xallocx() problems. As a side effect of
+ these fixes, xallocx() now tries harder to partially fulfill requests for
+ optional extra space. Note that a couple of minor heap profiling
+ optimizations are included, but these are better thought of as performance
+ fixes that were integral to disovering most of the other bugs.
+
+ Optimizations:
+ - Avoid a chunk metadata read in arena_prof_tctx_set(), since it is in the
+ fast path when heap profiling is enabled. Additionally, split a special
+ case out into arena_prof_tctx_reset(), which also avoids chunk metadata
+ reads.
+ - Optimize irallocx_prof() to optimistically update the sampler state. The
+ prior implementation appears to have been a holdover from when
+ rallocx()/xallocx() functionality was combined as rallocm().
+
+ Bug fixes:
+ - Fix TLS configuration such that it is enabled by default for platforms on
+ which it works correctly.
+ - Fix arenas_cache_cleanup() and arena_get_hard() to handle
+ allocation/deallocation within the application's thread-specific data
+ cleanup functions even after arenas_cache is torn down.
+ - Fix xallocx() bugs related to size+extra exceeding HUGE_MAXCLASS.
+ - Fix chunk purge hook calls for in-place huge shrinking reallocation to
+ specify the old chunk size rather than the new chunk size. This bug caused
+ no correctness issues for the default chunk purge function, but was
+ visible to custom functions set via the "arena.<i>.chunk_hooks" mallctl.
+ - Fix heap profiling bugs:
+ + Fix heap profiling to distinguish among otherwise identical sample sites
+ with interposed resets (triggered via the "prof.reset" mallctl). This bug
+ could cause data structure corruption that would most likely result in a
+ segfault.
+ + Fix irealloc_prof() to prof_alloc_rollback() on OOM.
+ + Make one call to prof_active_get_unlocked() per allocation event, and use
+ the result throughout the relevant functions that handle an allocation
+ event. Also add a missing check in prof_realloc(). These fixes protect
+ allocation events against concurrent prof_active changes.
+ + Fix ixallocx_prof() to pass usize_max and zero to ixallocx_prof_sample()
+ in the correct order.
+ + Fix prof_realloc() to call prof_free_sampled_object() after calling
+ prof_malloc_sample_object(). Prior to this fix, if tctx and old_tctx were
+ the same, the tctx could have been prematurely destroyed.
+ - Fix portability bugs:
+ + Don't bitshift by negative amounts when encoding/decoding run sizes in
+ chunk header maps. This affected systems with page sizes greater than 8
+ KiB.
+ + Rename index_t to szind_t to avoid an existing type on Solaris.
+ + Add JEMALLOC_CXX_THROW to the memalign() function prototype, in order to
+ match glibc and avoid compilation errors when including both
+ jemalloc/jemalloc.h and malloc.h in C++ code.
+ + Don't assume that /bin/sh is appropriate when running size_classes.sh
+ during configuration.
+ + Consider __sparcv9 a synonym for __sparc64__ when defining LG_QUANTUM.
+ + Link tests to librt if it contains clock_gettime(2).
+
+* 4.0.0 (August 17, 2015)
+
+ This version contains many speed and space optimizations, both minor and
+ major. The major themes are generalization, unification, and simplification.
+ Although many of these optimizations cause no visible behavior change, their
+ cumulative effect is substantial.
+
+ New features:
+ - Normalize size class spacing to be consistent across the complete size
+ range. By default there are four size classes per size doubling, but this
+ is now configurable via the --with-lg-size-class-group option. Also add the
+ --with-lg-page, --with-lg-page-sizes, --with-lg-quantum, and
+ --with-lg-tiny-min options, which can be used to tweak page and size class
+ settings. Impacts:
+ + Worst case performance for incrementally growing/shrinking reallocation
+ is improved because there are far fewer size classes, and therefore
+ copying happens less often.
+ + Internal fragmentation is limited to 20% for all but the smallest size
+ classes (those less than four times the quantum). (1B + 4 KiB)
+ and (1B + 4 MiB) previously suffered nearly 50% internal fragmentation.
+ + Chunk fragmentation tends to be lower because there are fewer distinct run
+ sizes to pack.
+ - Add support for explicit tcaches. The "tcache.create", "tcache.flush", and
+ "tcache.destroy" mallctls control tcache lifetime and flushing, and the
+ MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to the *allocx() API
+ control which tcache is used for each operation.
+ - Implement per thread heap profiling, as well as the ability to
+ enable/disable heap profiling on a per thread basis. Add the "prof.reset",
+ "prof.lg_sample", "thread.prof.name", "thread.prof.active",
+ "opt.prof_thread_active_init", "prof.thread_active_init", and
+ "thread.prof.active" mallctls.
+ - Add support for per arena application-specified chunk allocators, configured
+ via the "arena.<i>.chunk_hooks" mallctl.
+ - Refactor huge allocation to be managed by arenas, so that arenas now
+ function as general purpose independent allocators. This is important in
+ the context of user-specified chunk allocators, aside from the scalability
+ benefits. Related new statistics:
+ + The "stats.arenas.<i>.huge.allocated", "stats.arenas.<i>.huge.nmalloc",
+ "stats.arenas.<i>.huge.ndalloc", and "stats.arenas.<i>.huge.nrequests"
+ mallctls provide high level per arena huge allocation statistics.
+ + The "arenas.nhchunks", "arenas.hchunk.<i>.size",
+ "stats.arenas.<i>.hchunks.<j>.nmalloc",
+ "stats.arenas.<i>.hchunks.<j>.ndalloc",
+ "stats.arenas.<i>.hchunks.<j>.nrequests", and
+ "stats.arenas.<i>.hchunks.<j>.curhchunks" mallctls provide per size class
+ statistics.
+ - Add the 'util' column to malloc_stats_print() output, which reports the
+ proportion of available regions that are currently in use for each small
+ size class.
+ - Add "alloc" and "free" modes for for junk filling (see the "opt.junk"
+ mallctl), so that it is possible to separately enable junk filling for
+ allocation versus deallocation.
+ - Add the jemalloc-config script, which provides information about how
+ jemalloc was configured, and how to integrate it into application builds.
+ - Add metadata statistics, which are accessible via the "stats.metadata",
+ "stats.arenas.<i>.metadata.mapped", and
+ "stats.arenas.<i>.metadata.allocated" mallctls.
+ - Add the "stats.resident" mallctl, which reports the upper limit of
+ physically resident memory mapped by the allocator.
+ - Add per arena control over unused dirty page purging, via the
+ "arenas.lg_dirty_mult", "arena.<i>.lg_dirty_mult", and
+ "stats.arenas.<i>.lg_dirty_mult" mallctls.
+ - Add the "prof.gdump" mallctl, which makes it possible to toggle the gdump
+ feature on/off during program execution.
+ - Add sdallocx(), which implements sized deallocation. The primary
+ optimization over dallocx() is the removal of a metadata read, which often
+ suffers an L1 cache miss.
+ - Add missing header includes in jemalloc/jemalloc.h, so that applications
+ only have to #include <jemalloc/jemalloc.h>.
+ - Add support for additional platforms:
+ + Bitrig
+ + Cygwin
+ + DragonFlyBSD
+ + iOS
+ + OpenBSD
+ + OpenRISC/or1k
+
+ Optimizations:
+ - Maintain dirty runs in per arena LRUs rather than in per arena trees of
+ dirty-run-containing chunks. In practice this change significantly reduces
+ dirty page purging volume.
+ - Integrate whole chunks into the unused dirty page purging machinery. This
+ reduces the cost of repeated huge allocation/deallocation, because it
+ effectively introduces a cache of chunks.
+ - Split the arena chunk map into two separate arrays, in order to increase
+ cache locality for the frequently accessed bits.
+ - Move small run metadata out of runs, into arena chunk headers. This reduces
+ run fragmentation, smaller runs reduce external fragmentation for small size
+ classes, and packed (less uniformly aligned) metadata layout improves CPU
+ cache set distribution.
+ - Randomly distribute large allocation base pointer alignment relative to page
+ boundaries in order to more uniformly utilize CPU cache sets. This can be
+ disabled via the --disable-cache-oblivious configure option, and queried via
+ the "config.cache_oblivious" mallctl.
+ - Micro-optimize the fast paths for the public API functions.
+ - Refactor thread-specific data to reside in a single structure. This assures
+ that only a single TLS read is necessary per call into the public API.
+ - Implement in-place huge allocation growing and shrinking.
+ - Refactor rtree (radix tree for chunk lookups) to be lock-free, and make
+ additional optimizations that reduce maximum lookup depth to one or two
+ levels. This resolves what was a concurrency bottleneck for per arena huge
+ allocation, because a global data structure is critical for determining
+ which arenas own which huge allocations.
+
+ Incompatible changes:
+ - Replace --enable-cc-silence with --disable-cc-silence to suppress spurious
+ warnings by default.
+ - Assure that the constness of malloc_usable_size()'s return type matches that
+ of the system implementation.
+ - Change the heap profile dump format to support per thread heap profiling,
+ rename pprof to jeprof, and enhance it with the --thread=<n> option. As a
+ result, the bundled jeprof must now be used rather than the upstream
+ (gperftools) pprof.
+ - Disable "opt.prof_final" by default, in order to avoid atexit(3), which can
+ internally deadlock on some platforms.
+ - Change the "arenas.nlruns" mallctl type from size_t to unsigned.
+ - Replace the "stats.arenas.<i>.bins.<j>.allocated" mallctl with
+ "stats.arenas.<i>.bins.<j>.curregs".
+ - Ignore MALLOC_CONF in set{uid,gid,cap} binaries.
+ - Ignore MALLOCX_ARENA(a) in dallocx(), in favor of using the
+ MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to control tcache usage.
+
+ Removed features:
+ - Remove the *allocm() API, which is superseded by the *allocx() API.
+ - Remove the --enable-dss options, and make dss non-optional on all platforms
+ which support sbrk(2).
+ - Remove the "arenas.purge" mallctl, which was obsoleted by the
+ "arena.<i>.purge" mallctl in 3.1.0.
+ - Remove the unnecessary "opt.valgrind" mallctl; jemalloc automatically
+ detects whether it is running inside Valgrind.
+ - Remove the "stats.huge.allocated", "stats.huge.nmalloc", and
+ "stats.huge.ndalloc" mallctls.
+ - Remove the --enable-mremap option.
+ - Remove the "stats.chunks.current", "stats.chunks.total", and
+ "stats.chunks.high" mallctls.
+
+ Bug fixes:
+ - Fix the cactive statistic to decrease (rather than increase) when active
+ memory decreases. This regression was first released in 3.5.0.
+ - Fix OOM handling in memalign() and valloc(). A variant of this bug existed
+ in all releases since 2.0.0, which introduced these functions.
+ - Fix an OOM-related regression in arena_tcache_fill_small(), which could
+ cause cache corruption on OOM. This regression was present in all releases
+ from 2.2.0 through 3.6.0.
+ - Fix size class overflow handling for malloc(), posix_memalign(), memalign(),
+ calloc(), and realloc() when profiling is enabled.
+ - Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
+ "secondary" precedence is specified, but sbrk(2) is not supported.
+ - Fix fallback lg_floor() implementations to handle extremely large inputs.
+ - Ensure the default purgeable zone is after the default zone on OS X.
+ - Fix latent bugs in atomic_*().
+ - Fix the "arena.<i>.dss" mallctl to handle read-only calls.
+ - Fix tls_model configuration to enable the initial-exec model when possible.
+ - Mark malloc_conf as a weak symbol so that the application can override it.
+ - Correctly detect glibc's adaptive pthread mutexes.
+ - Fix the --without-export configure option.
+
* 3.6.0 (March 31, 2014)
This version contains a critical bug fix for a regression present in 3.5.0 and
@@ -21,7 +273,7 @@ found in the git revision history:
backtracing to be reliable.
- Use dss allocation precedence for huge allocations as well as small/large
allocations.
- - Fix test assertion failure message formatting. This bug did not manifect on
+ - Fix test assertion failure message formatting. This bug did not manifest on
x86_64 systems because of implementation subtleties in va_list.
- Fix inconsequential test failures for hash and SFMT code.
@@ -516,7 +768,7 @@ found in the git revision history:
- Make it possible for the application to manually flush a thread's cache, via
the "tcache.flush" mallctl.
- Base maximum dirty page count on proportion of active memory.
- - Compute various addtional run-time statistics, including per size class
+ - Compute various additional run-time statistics, including per size class
statistics for large objects.
- Expose malloc_stats_print(), which can be called repeatedly by the
application.