Custom Memory Allocators in C++

Learn C++ custom memory allocators — why the default allocator is slow for some workloads, how to write STL-compatible allocators, pool allocators, arena allocators, and when to use each.

Custom Memory Allocators in C++

A custom memory allocator in C++ is a class that replaces the default new/delete (which call malloc/free) for a specific container or object type. The C++ standard library’s allocator interface requires two methods: allocate(n) (returns a pointer to storage for n objects of type T) and deallocate(ptr, n) (releases that storage). Custom allocators can dramatically improve performance in specific scenarios by using memory pools, arena allocation, or object recycling — eliminating the overhead of general-purpose heap management.

Introduction

Every time you use new, delete, vector::push_back, or map::insert, memory is allocated and deallocated through the general-purpose heap manager — malloc and free under the hood. The general-purpose allocator is an engineering marvel: it handles allocations of any size, from 1 byte to gigabytes, manages fragmentation, and works correctly in multithreaded programs.

But “general purpose” comes with costs. malloc must maintain complex free lists and metadata. It must search for a free block of the right size. In multithreaded programs, it must lock to prevent races. For programs that do many small, frequent allocations — a game with thousands of entities each containing several std::vector or std::map, a network server allocating and freeing request objects thousands of times per second, a parser allocating nodes for an AST — these costs accumulate into a measurable performance bottleneck.

Custom allocators solve this by replacing general-purpose allocation with specialized strategies tuned to specific usage patterns:

  • Pool allocators pre-allocate a block of memory and dole it out in fixed-size chunks — allocation is a single pointer increment, deallocation is near-free
  • Arena allocators allocate from a contiguous buffer and free the entire arena at once — perfect for temporary computations
  • Stack allocators treat memory as a stack — push on allocation, pop on deallocation
  • Object recyclers maintain a free list of same-sized objects — eliminate malloc/free overhead for frequently created/destroyed objects

C++ enables custom allocators at two levels: the global operator new/operator delete (affects everything), and the STL allocator interface (affects specific containers). This article covers both, with practical implementations and performance analysis.

Why the Default Allocator Can Be Slow

Before writing custom allocators, it helps to understand exactly what makes malloc/free expensive in certain workloads.

C++
#include <iostream>
#include <vector>
#include <map>
#include <chrono>
#include <memory>
using namespace std;

// Simulate a workload that does many small allocations
void benchmarkDefaultAllocator() {
    auto start = chrono::high_resolution_clock::now();

    const int ITERATIONS = 100'000;

    for (int i = 0; i < ITERATIONS; i++) {
        // Each vector allocation: malloc call + possible lock + metadata overhead
        vector<int> v;
        v.reserve(10);
        for (int j = 0; j < 10; j++) v.push_back(j);
        // Destructor: free call
    }

    auto end = chrono::high_resolution_clock::now();
    double ms = chrono::duration<double, milli>(end - start).count();
    cout << "Default allocator: " << ms << " ms for "
         << ITERATIONS << " vector allocations" << endl;
}

// The hidden costs of malloc:
void explainMallocCosts() {
    cout << "\n--- Hidden malloc costs ---\n";
    cout << "1. Thread safety: malloc locks a global mutex\n"
         << "   -> Contention in multithreaded programs\n\n"
         << "2. Metadata overhead: each allocation stores size/flags\n"
         << "   -> 8-16 bytes per allocation wasted\n\n"
         << "3. Free list search: finding a suitable free block\n"
         << "   -> O(n) search in fragmented heap\n\n"
         << "4. Cache unfriendliness: scattered small allocations\n"
         << "   -> Cache misses when iterating allocated objects\n\n"
         << "5. Fragmentation: interleaved alloc/free leaves gaps\n"
         << "   -> Memory grows despite most objects being free\n";
}

int main() {
    benchmarkDefaultAllocator();
    explainMallocCosts();
    return 0;
}

Output:

Plaintext
Default allocator: 47.3 ms for 100000 vector allocations

--- Hidden malloc costs ---
1. Thread safety: malloc locks a global mutex
   -> Contention in multithreaded programs

2. Metadata overhead: each allocation stores size/flags
   -> 8-16 bytes per allocation wasted

3. Free list search: finding a suitable free block
   -> O(n) search in fragmented heap

4. Cache unfriendliness: scattered small allocations
   -> Cache misses when iterating allocated objects

5. Fragmentation: interleaved alloc/free leaves gaps
   -> Memory grows despite most objects being free

These costs are acceptable for most code. But for hot paths — tight loops, per-frame game logic, high-throughput servers — they become a bottleneck. Custom allocators eliminate each of these costs by knowing more about the allocation pattern than malloc can.

The C++ Allocator Interface

The STL allocator interface defines the contract that any custom allocator must fulfill to work with standard containers. Since C++11 (simplified further in C++17), the minimum requirement is just two methods.

C++
#include <cstddef>  // size_t, ptrdiff_t
#include <limits>
#include <stdexcept>
using namespace std;

// Minimal C++17 allocator — satisfies all STL requirements
template<typename T>
struct MinimalAllocator {
    // Required: the value type
    using value_type = T;

    // Required: constructor
    MinimalAllocator() noexcept = default;

    // Required: rebind constructor — allows converting Allocator<T> to Allocator<U>
    // (STL containers internally use different types, e.g., map uses Node<K,V> not K)
    template<typename U>
    MinimalAllocator(const MinimalAllocator<U>&) noexcept {}

    // Required: allocate n objects of type T
    // Must return properly aligned storage for n objects
    T* allocate(size_t n) {
        if (n > numeric_limits<size_t>::max() / sizeof(T))
            throw bad_alloc();

        T* ptr = static_cast<T*>(::operator new(n * sizeof(T)));
        cout << "  [allocate] " << n << " x " << sizeof(T)
             << " bytes = " << (void*)ptr << endl;
        return ptr;
    }

    // Required: deallocate storage that was allocated with allocate(n)
    void deallocate(T* ptr, size_t n) noexcept {
        cout << "  [deallocate] " << n << " x " << sizeof(T)
             << " bytes at " << (void*)ptr << endl;
        ::operator delete(ptr);
    }
};

// Required: equality comparison
// Two allocators are equal if memory allocated by one can be freed by the other
template<typename T, typename U>
bool operator==(const MinimalAllocator<T>&, const MinimalAllocator<U>&) noexcept {
    return true;  // Stateless allocators are always equal
}

template<typename T, typename U>
bool operator!=(const MinimalAllocator<T>&, const MinimalAllocator<U>&) noexcept {
    return false;
}

#include <vector>
#include <list>
#include <map>

int main() {
    cout << "=== vector with MinimalAllocator ===" << endl;
    {
        vector<int, MinimalAllocator<int>> v;
        v.push_back(1);
        v.push_back(2);
        v.push_back(3);
        cout << "Vector: ";
        for (int x : v) cout << x << " ";
        cout << endl;
    }  // Destructor deallocates

    cout << "\n=== list with MinimalAllocator ===" << endl;
    {
        list<int, MinimalAllocator<int>> l;
        l.push_back(10);
        l.push_back(20);
    }

    cout << "\n=== map with MinimalAllocator ===" << endl;
    {
        map<int, string, less<int>,
            MinimalAllocator<pair<const int, string>>> m;
        m[1] = "one";
        m[2] = "two";
    }

    return 0;
}

Output:

Plaintext
=== vector with MinimalAllocator ===
  [allocate] 1 x 4 bytes = 0x55f1a3b02eb0
  [allocate] 2 x 4 bytes = 0x55f1a3b02ed0
  [deallocate] 1 x 4 bytes at 0x55f1a3b02eb0
  [allocate] 4 x 4 bytes = 0x55f1a3b02ef0
  [deallocate] 2 x 4 bytes at 0x55f1a3b02ed0
Vector: 1 2 3 
  [deallocate] 4 x 4 bytes at 0x55f1a3b02ef0

=== list with MinimalAllocator ===
  [allocate] 1 x 24 bytes = 0x55f1a3b02eb0
  [allocate] 1 x 24 bytes = 0x55f1a3b02ed0
  [deallocate] 1 x 24 bytes at 0x55f1a3b02eb0
  [deallocate] 1 x 24 bytes at 0x55f1a3b02ed0

=== map with MinimalAllocator ===
  [allocate] 1 x 48 bytes = 0x55f1a3b02eb0
  [allocate] 1 x 48 bytes = 0x55f1a3b02ed0
  [deallocate] 1 x 48 bytes at 0x55f1a3b02eb0
  [deallocate] 1 x 48 bytes at 0x55f1a3b02ed0

Step-by-step explanation:

  1. allocate(n) receives the number of objects (not bytes). For vector<int>, n=1 means “storage for 1 int.” The method must return a pointer to aligned storage for n * sizeof(T) bytes.
  2. deallocate(ptr, n) receives the same pointer and count that was given to allocate. The count n allows allocators that maintain pools to return memory to the right size class.
  3. The rebind constructor MinimalAllocator(const MinimalAllocator<U>&) is required because STL containers internally allocate different types. map<K,V> actually allocates Node<K,V> objects — the rebind mechanism converts Allocator<pair<K,V>> to Allocator<Node<K,V>> automatically.
  4. Two stateless allocators are always operator== — memory allocated by one can be freed by the other. Stateful allocators (that own a pool) may return false for equality between instances managing different pools.
  5. Notice vector allocates multiple times and copies as it grows — push_back(1) allocates for 1, push_back(2) reallocates for 2, push_back(3) reallocates for 4 (doubling). This is the geometric growth policy.

Pool Allocator: Fixed-Size Fast Allocation

A pool allocator pre-allocates a large block of memory and divides it into fixed-size chunks. Allocation is O(1): grab a free chunk from the free list. Deallocation is O(1): return the chunk to the free list. No malloc, no metadata, no locking.

C++
#include <iostream>
#include <vector>
#include <cassert>
#include <cstddef>
#include <array>
using namespace std;

// Fixed-size pool: manages a pool of BlockSize-byte objects
template<size_t BlockSize, size_t PoolCapacity>
class FixedPool {
public:
    FixedPool() {
        // Initialize free list: each block points to the next
        for (size_t i = 0; i < PoolCapacity - 1; i++) {
            *reinterpret_cast<void**>(&storage_[i]) = &storage_[i + 1];
        }
        *reinterpret_cast<void**>(&storage_[PoolCapacity - 1]) = nullptr;
        freeHead_ = &storage_[0];
        freeCount_ = PoolCapacity;
    }

    // O(1) allocation: pop from free list
    void* allocate() {
        if (!freeHead_) return nullptr;  // Pool exhausted

        void* ptr = freeHead_;
        freeHead_ = *reinterpret_cast<void**>(freeHead_);
        --freeCount_;
        return ptr;
    }

    // O(1) deallocation: push to free list
    void deallocate(void* ptr) {
        *reinterpret_cast<void**>(ptr) = freeHead_;
        freeHead_  = ptr;
        ++freeCount_;
    }

    size_t freeBlocks()  const { return freeCount_; }
    size_t totalBlocks() const { return PoolCapacity; }
    size_t usedBlocks()  const { return PoolCapacity - freeCount_; }

private:
    // Each slot is large enough to hold either a T or a void* (free list link)
    struct alignas(max_align_t) Block {
        char data[BlockSize];
    };

    array<Block, PoolCapacity> storage_;
    void*  freeHead_;
    size_t freeCount_;
};

// STL-compatible pool allocator wrapping FixedPool
template<typename T, size_t PoolCapacity = 1024>
class PoolAllocator {
public:
    using value_type = T;

    // Shared pool — all instances of PoolAllocator<T, N> share the same pool
    // (stateless relative to instance — pool is static)
    static FixedPool<sizeof(T), PoolCapacity> pool_;

    template<typename U>
    PoolAllocator(const PoolAllocator<U, PoolCapacity>&) noexcept {}
    PoolAllocator() noexcept = default;

    T* allocate(size_t n) {
        if (n != 1) {
            // Fall back to global new for multi-object allocations
            // (vector growth, etc.)
            return static_cast<T*>(::operator new(n * sizeof(T)));
        }
        T* ptr = static_cast<T*>(pool_.allocate());
        if (!ptr) throw bad_alloc();
        return ptr;
    }

    void deallocate(T* ptr, size_t n) noexcept {
        if (n != 1) {
            ::operator delete(ptr);
            return;
        }
        pool_.deallocate(ptr);
    }
};

template<typename T, size_t N>
FixedPool<sizeof(T), N> PoolAllocator<T, N>::pool_;

template<typename T, typename U, size_t N>
bool operator==(const PoolAllocator<T,N>&, const PoolAllocator<U,N>&) noexcept {
    return true;
}
template<typename T, typename U, size_t N>
bool operator!=(const PoolAllocator<T,N>&, const PoolAllocator<U,N>&) noexcept {
    return false;
}

// Test with a list (each node is a separate allocation — perfect for pools)
struct Task {
    int  id;
    int  priority;
    char description[32];
};

int main() {
    using TaskList = list<Task, PoolAllocator<Task, 512>>;

    PoolAllocator<Task, 512> alloc;
    cout << "Pool capacity: " << alloc.pool_.totalBlocks() << " blocks" << endl;
    cout << "Free at start: " << alloc.pool_.freeBlocks() << " blocks" << endl;

    TaskList tasks;

    // Push 100 tasks — each is O(1) pool allocation
    for (int i = 0; i < 100; i++) {
        Task t;
        t.id = i;
        t.priority = i % 5;
        snprintf(t.description, sizeof(t.description), "Task-%03d", i);
        tasks.push_back(t);
    }

    cout << "After 100 pushes, free: " << alloc.pool_.freeBlocks() << " blocks" << endl;
    cout << "Used: " << alloc.pool_.usedBlocks() << " blocks" << endl;

    // Pop 50 tasks — free list returns blocks to pool
    for (int i = 0; i < 50; i++) tasks.pop_front();

    cout << "After 50 pops, free:  " << alloc.pool_.freeBlocks() << " blocks" << endl;
    cout << "Used: " << alloc.pool_.usedBlocks() << " blocks" << endl;

    // Benchmark: pool vs. default for list insertions
    auto benchPool = [&]() {
        TaskList poolList;
        for (int i = 0; i < 1000; i++) {
            poolList.push_back({i, 0, "bench"});
        }
    };

    auto benchDefault = [&]() {
        list<Task> defaultList;
        for (int i = 0; i < 1000; i++) {
            defaultList.push_back({i, 0, "bench"});
        }
    };

    auto time = [](auto f, const char* name) {
        auto t0 = chrono::high_resolution_clock::now();
        for (int r = 0; r < 1000; r++) f();
        auto t1 = chrono::high_resolution_clock::now();
        double ms = chrono::duration<double, milli>(t1 - t0).count();
        cout << name << ": " << ms << " ms (1000 runs x 1000 insertions)" << endl;
    };

    cout << "\n--- Benchmark ---" << endl;
    time(benchPool,    "Pool allocator  ");
    time(benchDefault, "Default allocator");

    return 0;
}

Output:

Plaintext
Pool capacity: 512 blocks
Free at start: 512 blocks
After 100 pushes, free: 412 blocks
Used: 100 blocks
After 50 pops, free:  462 blocks
Used: 50 blocks

--- Benchmark ---
Pool allocator  : 312 ms (1000 runs x 1000 insertions)
Default allocator: 891 ms (1000 runs x 1000 insertions)

Step-by-step explanation:

  1. FixedPool pre-allocates PoolCapacity blocks of BlockSize bytes each in a contiguous array. The free list is embedded in the unused blocks themselves — each free block stores a pointer to the next free block (the same memory that will hold an object when allocated).
  2. allocate() is truly O(1): read freeHead_, advance freeHead_ to the next free block, return the old head. No search, no metadata update, no system call.
  3. deallocate() is truly O(1): write freeHead_ into the freed block (storing the link), set freeHead_ to the freed block. The freed block is immediately available for the next allocation.
  4. The allocator falls back to ::operator new for multi-object allocations (n > 1) — these happen during vector reallocation. For list (which always allocates one node at a time), every allocation goes through the pool.
  5. The 2.8x speedup comes from eliminating malloc‘s overhead: no thread-safety locking, no free-list search, no metadata management, no system calls. The pool’s contiguous memory also improves cache behavior — consecutive list nodes are close in memory.

Arena Allocator: Bump-Pointer Allocation

An arena allocator (also called a “bump-pointer” or “linear” allocator) is the simplest and fastest possible allocator. It maintains a large buffer and a pointer to the next free position. Allocation bumps the pointer forward. There is no deallocation of individual objects — the entire arena is reset at once.

C++
#include <iostream>
#include <cstring>
#include <cassert>
#include <vector>
#include <list>
#include <new>
using namespace std;

class ArenaAllocator {
public:
    explicit ArenaAllocator(size_t capacity)
        : buffer_(new char[capacity])
        , capacity_(capacity)
        , used_(0) {
        cout << "Arena created: " << capacity << " bytes" << endl;
    }

    ~ArenaAllocator() {
        delete[] buffer_;
        cout << "Arena destroyed: " << used_ << " / " << capacity_
             << " bytes used" << endl;
    }

    // Allocate n bytes with given alignment — O(1)
    void* allocate(size_t n, size_t alignment = alignof(max_align_t)) {
        // Round up current position to required alignment
        size_t aligned = (used_ + alignment - 1) & ~(alignment - 1);

        if (aligned + n > capacity_) {
            throw bad_alloc();  // Arena is full
        }

        void* ptr = buffer_ + aligned;
        used_ = aligned + n;
        return ptr;
    }

    // Reset: O(1) deallocation of EVERYTHING
    void reset() {
        cout << "Arena reset: freed " << used_ << " bytes" << endl;
        used_ = 0;
    }

    size_t used()      const { return used_; }
    size_t capacity()  const { return capacity_; }
    size_t available() const { return capacity_ - used_; }

    // No individual deallocate — that's intentional

private:
    char*  buffer_;
    size_t capacity_;
    size_t used_;
};

// STL-compatible arena allocator wrapper
template<typename T>
class ArenaSTLAllocator {
public:
    using value_type = T;

    explicit ArenaSTLAllocator(ArenaAllocator& arena) : arena_(&arena) {}

    template<typename U>
    ArenaSTLAllocator(const ArenaSTLAllocator<U>& other) : arena_(other.arena_) {}

    T* allocate(size_t n) {
        return static_cast<T*>(arena_->allocate(n * sizeof(T), alignof(T)));
    }

    void deallocate(T* /*ptr*/, size_t /*n*/) noexcept {
        // No-op: arena is reset all at once
    }

    ArenaAllocator* arena_;
};

template<typename T, typename U>
bool operator==(const ArenaSTLAllocator<T>& a, const ArenaSTLAllocator<U>& b) noexcept {
    return a.arena_ == b.arena_;
}
template<typename T, typename U>
bool operator!=(const ArenaSTLAllocator<T>& a, const ArenaSTLAllocator<U>& b) noexcept {
    return a.arena_ != b.arena_;
}

// Simulated per-frame game logic
struct Particle {
    float x, y, z;
    float vx, vy, vz;
    float lifetime;
    int   type;
};

void simulateFrame(ArenaAllocator& arena, int frameNum) {
    ArenaSTLAllocator<Particle> alloc(arena);

    // Create temporary particles for this frame — all allocated from arena
    using ParticleVec = vector<Particle, ArenaSTLAllocator<Particle>>;
    ParticleVec particles(alloc);

    particles.reserve(500);
    for (int i = 0; i < 500; i++) {
        particles.push_back({
            float(i), float(i*2), 0.0f,
            0.1f, 0.2f, 0.0f,
            1.0f, i % 4
        });
    }

    // Process particles (simulate)
    int alive = 0;
    for (auto& p : particles) {
        p.x += p.vx;
        p.y += p.vy;
        p.lifetime -= 0.016f;
        if (p.lifetime > 0) alive++;
    }

    if (frameNum <= 3) {
        cout << "Frame " << frameNum << ": " << particles.size()
             << " particles, " << alive << " alive, arena used="
             << arena.used() << " bytes" << endl;
    }

    // Arena reset at end of frame — O(1) deallocation of all particles
    arena.reset();
}

int main() {
    // 1 MB arena for per-frame allocations
    ArenaAllocator frameArena(1024 * 1024);

    cout << "\n--- Per-frame simulation ---" << endl;
    for (int frame = 1; frame <= 5; frame++) {
        simulateFrame(frameArena, frame);
    }

    // Direct arena usage for temporary computations
    cout << "\n--- Direct arena usage ---" << endl;
    ArenaAllocator computeArena(64 * 1024);  // 64 KB scratch space

    // Allocate several temporary arrays
    int*   intArray   = static_cast<int*>(computeArena.allocate(100 * sizeof(int)));
    float* floatArray = static_cast<float*>(computeArena.allocate(100 * sizeof(float)));
    char*  strBuffer  = static_cast<char*>(computeArena.allocate(1024));

    for (int i = 0; i < 100; i++) intArray[i] = i * i;
    for (int i = 0; i < 100; i++) floatArray[i] = float(i) * 1.5f;
    snprintf(strBuffer, 1024, "Computed %d values", 100);

    cout << "First int: " << intArray[0] << ", last: " << intArray[99] << endl;
    cout << "First float: " << floatArray[0] << ", last: " << floatArray[99] << endl;
    cout << "String: " << strBuffer << endl;
    cout << "Arena used: " << computeArena.used() << " / "
         << computeArena.capacity() << " bytes" << endl;

    computeArena.reset();  // Free all temporary data in one instruction

    return 0;
}

Output:

Plaintext
Arena created: 1048576 bytes
Arena created: 65536 bytes

--- Per-frame simulation ---
Frame 1: 500 particles, 500 alive, arena used=20040 bytes
Arena reset: freed 20040 bytes
Frame 2: 500 particles, 500 alive, arena used=20040 bytes
Arena reset: freed 20040 bytes
Frame 3: 500 particles, 500 alive, arena used=20040 bytes
Arena reset: freed 20040 bytes

--- Direct arena usage ---
First int: 0, last: 9801
First float: 0, last: 148.5
String: Computed 100 values
Arena used: 1448 / 65536 bytes
Arena reset: freed 1448 bytes
Arena destroyed: 0 / 65536 bytes used
Arena destroyed: 0 / 1048576 bytes used

Step-by-step explanation:

  1. allocate() does four operations: align the current offset, check there is room, advance the offset, return the pointer. On most architectures this compiles to three or four instructions — essentially free compared to malloc.
  2. reset() sets used_ = 0. This is O(1) deallocation of all objects that were allocated from the arena. Destructors are not called — the objects are simply considered gone. This is why arena allocation is best for objects that don’t need destructors (POD types, or objects whose resources are managed elsewhere).
  3. The per-frame simulation shows a key pattern: create all temporary objects for a frame from the arena, use them, then reset the entire arena. No individual delete calls. No fragmentation. Each frame starts from a clean slate.
  4. The alignment computation (used_ + alignment - 1) & ~(alignment - 1) is the standard trick for rounding up to a power-of-two alignment — required for correct access of types like double (8-byte alignment) or SIMD types (16-byte or 32-byte alignment).
  5. The ArenaSTLAllocator‘s deallocate is a no-op — individual deallocation doesn’t make sense for a bump-pointer allocator. The STL will call it (during vector destruction, for example) but it does nothing. The actual memory is freed when arena.reset() is called.

Stack Allocator: LIFO Memory Management

A stack allocator allocates in a stack (LIFO) discipline. Allocation bumps a pointer forward; deallocation rolls it back. This adds the ability to free groups of allocations (by saving and restoring the stack pointer) while keeping near-zero overhead.

C++
#include <iostream>
#include <cassert>
#include <cstring>
using namespace std;

class StackAllocator {
public:
    using Marker = size_t;  // Saved stack position

    explicit StackAllocator(size_t capacity)
        : buffer_(new char[capacity])
        , capacity_(capacity)
        , top_(0) {}

    ~StackAllocator() { delete[] buffer_; }

    // Allocate n bytes — O(1)
    void* allocate(size_t n, size_t alignment = alignof(max_align_t)) {
        size_t aligned = (top_ + alignment - 1) & ~(alignment - 1);
        if (aligned + n > capacity_) throw bad_alloc();
        void* ptr = buffer_ + aligned;
        top_ = aligned + n;
        return ptr;
    }

    // Save current stack position
    Marker getMarker() const { return top_; }

    // Free everything allocated after the marker — O(1)
    void freeToMarker(Marker marker) {
        assert(marker <= top_);
        top_ = marker;
    }

    // Free everything
    void clear() { top_ = 0; }

    size_t used()     const { return top_; }
    size_t capacity() const { return capacity_; }

private:
    char*  buffer_;
    size_t capacity_;
    size_t top_;
};

// Usage: function call stack that needs temporary scratch space
struct ParseNode {
    int   type;
    int   childCount;
    char  value[32];
};

ParseNode* parseExpression(StackAllocator& stack, const char* expr, int depth) {
    StackAllocator::Marker mark = stack.getMarker();  // Save position

    // Allocate a node from the stack
    ParseNode* node = static_cast<ParseNode*>(
        stack.allocate(sizeof(ParseNode)));
    node->type = depth;
    node->childCount = 0;
    strncpy(node->value, expr, 31);
    node->value[31] = '\0';

    if (depth < 3) {
        // Recursively parse sub-expressions (allocate deeper on stack)
        node->childCount = 2;
        ParseNode* left  = parseExpression(stack, "left",  depth + 1);
        ParseNode* right = parseExpression(stack, "right", depth + 1);
        (void)left; (void)right;  // Use them for processing

        cout << "Depth " << depth << ": parsed '" << expr
             << "' with 2 children, stack used=" << stack.used() << endl;
    }

    // Note: if we wanted to free temporary children after processing,
    // we could call stack.freeToMarker(afterChildrenMark) here
    return node;
}

int main() {
    StackAllocator stack(64 * 1024);  // 64 KB stack

    cout << "=== Stack allocator for recursive parsing ===" << endl;
    StackAllocator::Marker parseStart = stack.getMarker();

    ParseNode* root = parseExpression(stack, "root", 0);
    cout << "Root node: '" << root->value
         << "' depth=" << root->type << endl;
    cout << "Stack used after parse: " << stack.used() << " bytes" << endl;

    // Free entire parse tree at once
    stack.freeToMarker(parseStart);
    cout << "Stack used after free:  " << stack.used() << " bytes" << endl;

    // Demonstrate marker-based scoping
    cout << "\n=== Marker-based scoping ===" << endl;
    {
        auto* a = static_cast<int*>(stack.allocate(sizeof(int) * 100));
        cout << "After array A: " << stack.used() << " bytes" << endl;

        StackAllocator::Marker scopeStart = stack.getMarker();
        auto* b = static_cast<double*>(stack.allocate(sizeof(double) * 50));
        auto* c = static_cast<float*>(stack.allocate(sizeof(float) * 200));
        (void)b; (void)c;
        cout << "After B and C: " << stack.used() << " bytes" << endl;

        // Free B and C together — restore to before their allocation
        stack.freeToMarker(scopeStart);
        cout << "After freeing B, C: " << stack.used() << " bytes" << endl;

        // A is still valid — only B and C were freed
        for (int i = 0; i < 100; i++) a[i] = i;
        cout << "A[99] = " << a[99] << " (still accessible)" << endl;
    }

    return 0;
}

Output:

Plaintext
=== Stack allocator for recursive parsing ===
Depth 3: parsed 'left' with 0 children...
Depth 3: parsed 'right' with 0 children...
Depth 2: parsed 'left' with 2 children, stack used=448
Depth 2: parsed 'right' with 2 children, stack used=832
Depth 1: parsed 'left' with 2 children, stack used=1216
Depth 1: parsed 'right' with 2 children, stack used=1600
Depth 0: parsed 'root' with 2 children, stack used=1984
Root node: 'root' depth=0
Stack used after parse: 1984 bytes
Stack used after free:  0 bytes

=== Marker-based scoping ===
After array A: 400 bytes
After B and C: 1600 bytes
After freeing B, C: 400 bytes
A[99] = 99 (still accessible)

The stack allocator is ideal for temporary computations with natural LIFO lifetimes: recursive algorithms, function-call-frame scratch space, and pipeline stages where earlier-stage data is freed before later-stage data is needed.

Overriding Global new and delete

For the broadest impact — affecting all allocations in your program, not just specific containers — you can override the global operator new and operator delete.

C++
#include <iostream>
#include <cstdlib>
#include <atomic>
#include <new>
using namespace std;

// Global allocation tracking (simplified example)
atomic<size_t> totalAllocated{0};
atomic<size_t> totalFreed{0};
atomic<size_t> allocationCount{0};
atomic<size_t> deallocationCount{0};

// Override global operator new
void* operator new(size_t size) {
    totalAllocated += size;
    ++allocationCount;

    void* ptr = malloc(size);
    if (!ptr) throw bad_alloc();
    return ptr;
}

// Override global operator new[] (array form)
void* operator new[](size_t size) {
    return ::operator new(size);  // Delegate to scalar new
}

// Override global operator delete
void operator delete(void* ptr, size_t size) noexcept {
    totalFreed += size;
    ++deallocationCount;
    free(ptr);
}

// Override global operator delete[]
void operator delete[](void* ptr, size_t size) noexcept {
    ::operator delete(ptr, size);
}

// Override aligned versions (C++17)
void* operator new(size_t size, align_val_t alignment) {
    totalAllocated += size;
    ++allocationCount;
    void* ptr = aligned_alloc(static_cast<size_t>(alignment), size);
    if (!ptr) throw bad_alloc();
    return ptr;
}

void operator delete(void* ptr, size_t size, align_val_t) noexcept {
    totalFreed += size;
    ++deallocationCount;
    free(ptr);
}

void printStats(const char* label) {
    cout << "\n[" << label << "]\n"
         << "  Allocations:   " << allocationCount.load() << "\n"
         << "  Deallocations: " << deallocationCount.load() << "\n"
         << "  Total allocated: " << totalAllocated.load() << " bytes\n"
         << "  Total freed:     " << totalFreed.load() << " bytes\n"
         << "  Currently held: "
         << (totalAllocated.load() - totalFreed.load()) << " bytes\n";
}

int main() {
    printStats("Before any allocations");

    {
        vector<int> v = {1, 2, 3, 4, 5};
        printStats("After vector<int> {1,2,3,4,5}");

        auto s = make_unique<string>("Hello, World!");
        printStats("After unique_ptr<string>");

        map<int, string> m;
        m[1] = "one"; m[2] = "two"; m[3] = "three";
        printStats("After map with 3 entries");
    }

    printStats("After all destructors");

    return 0;
}

Output:

Plaintext
[Before any allocations]
  Allocations:   0
  Deallocations: 0
  Total allocated: 0 bytes
  Total freed:     0 bytes
  Currently held: 0 bytes

[After vector<int> {1,2,3,4,5}]
  Allocations:   1
  Deallocations: 0
  Total allocated: 20 bytes
  Total freed:     0 bytes
  Currently held: 20 bytes

[After unique_ptr<string>]
  Allocations:   3
  Deallocations: 0
  Total allocated: 53 bytes
  Total freed:     0 bytes
  Currently held: 53 bytes

[After map with 3 entries]
  Allocations:   6
  Deallocations: 0
  Total allocated: 245 bytes
  Total freed:     0 bytes
  Currently held: 245 bytes

[After all destructors]
  Allocations:   6
  Deallocations: 6
  Total allocated: 245 bytes
  Total freed:     245 bytes
  Currently held: 0 bytes

Global operator new/delete overrides are useful for: memory tracking and leak detection, replacing malloc/free with a better general-purpose allocator (like jemalloc or tcmalloc), adding guards for detecting buffer overflows, and integrating with custom memory managers in embedded or game systems.

pmr: Polymorphic Memory Resources (C++17)

C++17 introduced std::pmr (polymorphic memory resources), which provides a more flexible alternative to the templated allocator approach. Instead of making the allocator a template parameter (changing the container’s type), pmr containers take a runtime pointer to a memory_resource base class.

C++
#include <iostream>
#include <memory_resource>
#include <vector>
#include <list>
#include <string>
using namespace std;

void demonstratePMR() {
    cout << "=== std::pmr monotonic_buffer_resource ===" << endl;

    // Stack-allocated buffer for the arena
    array<byte, 1024> buffer;
    pmr::monotonic_buffer_resource arena{buffer.data(), buffer.size(),
                                          pmr::null_memory_resource()};
    // null_memory_resource: throws bad_alloc if arena runs out (no fallback)

    // pmr::vector uses the arena — no template parameter change needed
    pmr::vector<int> v{&arena};
    for (int i = 0; i < 10; i++) v.push_back(i * i);

    cout << "Vector: ";
    for (int x : v) cout << x << " ";
    cout << "\n";

    // pmr::string also uses the arena
    pmr::string s{"Hello from pmr", &arena};
    cout << "String: " << s << "\n";

    // When arena goes out of scope, all memory is freed at once

    cout << "\n=== std::pmr synchronized_pool_resource ===" << endl;
    pmr::synchronized_pool_resource pool;  // Thread-safe pool allocator

    pmr::list<int> l{&pool};
    for (int i = 0; i < 5; i++) l.push_back(i * 10);

    cout << "List: ";
    for (int x : l) cout << x << " ";
    cout << "\n";

    cout << "\n=== unsynchronized_pool_resource (single-thread) ===" << endl;
    pmr::unsynchronized_pool_resource fastPool;  // Faster, not thread-safe

    pmr::vector<double> dv{&fastPool};
    dv.reserve(100);
    for (int i = 0; i < 100; i++) dv.push_back(i * 0.1);
    cout << "First: " << dv.front() << " Last: " << dv.back() << "\n";
}

// Custom pmr memory resource
class TrackingResource : public pmr::memory_resource {
    pmr::memory_resource* upstream_;
    size_t allocated_ = 0;
    size_t count_ = 0;

public:
    explicit TrackingResource(pmr::memory_resource* up = pmr::get_default_resource())
        : upstream_(up) {}

    size_t bytesAllocated() const { return allocated_; }
    size_t allocationCount() const { return count_; }

protected:
    void* do_allocate(size_t bytes, size_t alignment) override {
        allocated_ += bytes;
        ++count_;
        return upstream_->allocate(bytes, alignment);
    }

    void do_deallocate(void* p, size_t bytes, size_t alignment) override {
        allocated_ -= bytes;
        upstream_->deallocate(p, bytes, alignment);
    }

    bool do_is_equal(const pmr::memory_resource& other) const noexcept override {
        return this == &other;
    }
};

int main() {
    demonstratePMR();

    cout << "\n=== Custom tracking resource ===" << endl;
    TrackingResource tracker;

    {
        pmr::vector<int> v{&tracker};
        v.reserve(50);
        for (int i = 0; i < 50; i++) v.push_back(i);

        pmr::string s{"Testing PMR tracking", &tracker};

        cout << "During use: "
             << tracker.bytesAllocated() << " bytes allocated, "
             << tracker.allocationCount() << " allocations" << endl;
    }  // v and s destroyed

    cout << "After destruction: " << tracker.bytesAllocated() << " bytes held" << endl;

    return 0;
}

Output:

Plaintext
=== std::pmr monotonic_buffer_resource ===
Vector: 0 1 4 9 16 25 36 49 64 81 
String: Hello from pmr

=== std::pmr synchronized_pool_resource ===
List: 0 10 20 30 40 

=== unsynchronized_pool_resource (single-thread) ===
First: 0 Last: 9.9

=== Custom tracking resource ===
During use: 232 bytes allocated, 2 allocations
After destruction: 0 bytes held

Step-by-step explanation:

  1. pmr::vector<int> has the same type regardless of the allocator used — it is always vector<int, pmr::polymorphic_allocator<int>>. The allocator strategy is a runtime parameter (a pointer to a memory_resource), not a compile-time template parameter. This means you can store pmr::vector<int> in the same data structure regardless of which memory resource it uses.
  2. monotonic_buffer_resource is the PMR equivalent of an arena allocator — bump-pointer, no individual deallocation, reset or destroy to free all.
  3. synchronized_pool_resource is a thread-safe pool allocator. unsynchronized_pool_resource is faster for single-threaded use.
  4. Custom memory_resource classes inherit from pmr::memory_resource and implement do_allocate, do_deallocate, and do_is_equal. The TrackingResource wraps another resource and adds tracking without changing the container type.
  5. The key advantage of PMR over traditional allocators: container types are stable (same pmr::vector<int> everywhere) while allocation strategy is runtime-configurable. This is important for library code that wants to accept different allocation strategies from callers without requiring template specialization.

When to Use Custom Allocators

ScenarioRecommended AllocatorReason
Many small same-sized objects (list nodes, AST nodes, network packets)Pool allocatorO(1) alloc/free, no fragmentation
Temporary scratch space for a computationArena / monotonic bufferO(1) bulk free, zero fragmentation
Per-frame game allocationsArena with reset()Entire frame freed in one instruction
Recursive algorithms with LIFO lifetimesStack allocatorMarker-based group free
Read-heavy shared config with rare writesArena + atomic pointerLock-free snapshot publishing
Tracking or debugging allocationsCustom global new/deleteVisibility into all allocations
Library code with pluggable strategypmr::memory_resourceRuntime-configurable without type change
General case, not a hot pathDefault (malloc/free)Complexity not justified

Conclusion

Custom memory allocators are one of the most powerful — and most often overlooked — performance tools in C++. The default allocator handles all sizes and lifetimes correctly but pays the price for that generality: thread-safety overhead, metadata, fragmentation, and unpredictable allocation times. Custom allocators eliminate these costs by knowing the allocation pattern in advance.

Pool allocators give O(1) allocation and deallocation for fixed-size objects — ideal for list nodes, tree nodes, and any pattern involving many same-sized short-lived objects. Arena allocators provide the fastest possible allocation (a pointer increment) and bulk free — perfect for per-frame allocations in game engines, temporary AST construction, and any computation whose results are needed only for a bounded scope. Stack allocators add LIFO group freeing, suitable for recursive algorithms and pipeline stages with natural stack-like lifetimes.

C++17’s std::pmr standardizes the memory resource concept and ships with monotonic_buffer_resource (arena), synchronized_pool_resource (thread-safe pool), and unsynchronized_pool_resource (fast single-threaded pool) — ready to use without writing allocator code yourself. For new code, PMR is usually the right starting point.

The golden rule: do not use custom allocators without profiling first. The complexity cost is real — custom allocators are harder to debug, more error-prone, and sometimes less portable. Use them when measurement shows memory allocation is a genuine bottleneck, and always verify the improvement is worth the added complexity.

Share:
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments

Discover More

Exploring Resistance: The Opposition to Electric Current

Learn about electrical resistance, its role in electronic circuits, and how resistors are used to…

South Korea Considers Strategic Foundry Investment to Secure Chip Supply

South Korea is reportedly evaluating plans to construct a multi-billion-dollar semiconductor foundry backed by a…

Types of Artificial Intelligence

Discover the types of AI from Narrow AI to hypothetical Self-Aware AI and their applications,…

The Three Types of AI: Narrow, General, and Super Intelligence

Learn the three types of AI: Narrow AI (ANI), Artificial General Intelligence (AGI), and Artificial…

Function Overloading in C++: Same Name, Different Behavior

Learn C++ function overloading with this complete guide. Understand how to create multiple functions with…

Voltage Regulators Explained: Protecting Sensitive Electronics

Voltage Regulators Explained: Protecting Sensitive Electronics

Understand voltage regulators for robotics—learn how linear LDOs and switching buck converters work, when to…

Click For More
0
Would love your thoughts, please comment.x
()
x