Multithreading in C++: std::thread Basics

Learn C++ multithreading from scratch — how to create threads with std::thread, pass arguments, join and detach, avoid data races, and use std::jthread in C++20.

By Techietory on May 10, 2026

Multithreading in C++: std::thread Basics

std::thread is the C++11 standard library class for creating and managing threads. You create a thread by constructing a std::thread object and passing it a callable (a function, lambda, or function object) along with any arguments. The thread begins executing immediately upon construction. Before the std::thread object is destroyed, you must either call join() (wait for the thread to finish) or detach() (let it run independently) — failing to do so terminates the program.

Introduction

Modern software demands concurrency. Whether you are processing multiple network requests simultaneously, performing background computation while keeping a user interface responsive, or parallelizing a computation across multiple CPU cores to finish faster, the ability to run code in parallel is a fundamental requirement for high-performance, high-responsiveness applications.

Before C++11, multithreading in C++ required platform-specific APIs: pthreads on POSIX systems (Linux, macOS), CreateThread on Windows, or third-party libraries like Boost.Thread. Each had different interfaces, different semantics, and code written for one platform would not compile on another.

C++11 changed this by introducing a portable, standardized threading library as part of the language itself. The centerpiece is std::thread, which provides a clean, object-oriented interface for creating and managing threads. Alongside std::thread, C++11 also introduced std::mutex, std::condition_variable, std::atomic, and other synchronization primitives — together forming a complete concurrency toolkit.

This article teaches std::thread from the ground up. You will learn to create threads, pass arguments to them, retrieve results, coordinate multiple threads, understand the critical danger of data races, and use the modern std::jthread introduced in C++20. Every concept is demonstrated with practical, runnable code with thorough step-by-step explanations.

What Is a Thread?

A thread is the smallest unit of execution within a process. Every program has at least one thread — the main thread that starts at main(). When you create additional threads, they run concurrently with the main thread, sharing the same process memory space (global variables, heap memory, open file handles) while each maintaining its own call stack, program counter, and register state.

The operating system’s scheduler decides when each thread gets CPU time. On a multi-core system, multiple threads can execute truly simultaneously on different cores. On a single-core system, the OS rapidly switches between threads, creating the illusion of parallelism.

The shared memory space is both the great advantage and the great danger of threads. Sharing is efficient — no expensive inter-process communication needed. But if two threads read and write the same memory location without coordination, you get data races — one of the most insidious categories of bugs in software engineering.

Creating Your First Thread

C++

#include <iostream>
#include <thread>
using namespace std;

// Function to run in a new thread
void greet(const string& name) {
    cout << "Hello from thread! Name: " << name << endl;
    cout << "Thread ID: " << this_thread::get_id() << endl;
}

int main() {
    cout << "Main thread ID: " << this_thread::get_id() << endl;

    // Create a thread — starts executing greet("Alice") immediately
    thread t(greet, "Alice");

    cout << "Main thread continues while t runs..." << endl;

    // Wait for t to finish before proceeding
    t.join();

    cout << "Thread t has finished. Main continues." << endl;
    return 0;
}

#include <iostream>
#include <thread>
using namespace std;

// Function to run in a new thread
void greet(const string& name) {
    cout << "Hello from thread! Name: " << name << endl;
    cout << "Thread ID: " << this_thread::get_id() << endl;
}

int main() {
    cout << "Main thread ID: " << this_thread::get_id() << endl;

    // Create a thread — starts executing greet("Alice") immediately
    thread t(greet, "Alice");

    cout << "Main thread continues while t runs..." << endl;

    // Wait for t to finish before proceeding
    t.join();

    cout << "Thread t has finished. Main continues." << endl;
    return 0;
}

Output (order of first two lines may vary):

Plaintext

Main thread ID: 140234567890112
Main thread continues while t runs...
Hello from thread! Name: Alice
Thread ID: 140234567890368
Thread t has finished. Main continues.

Main thread ID: 140234567890112
Main thread continues while t runs...
Hello from thread! Name: Alice
Thread ID: 140234567890368
Thread t has finished. Main continues.

Step-by-step explanation:

thread t(greet, "Alice") constructs a std::thread object that immediately starts executing greet("Alice") in a new OS thread. The constructor takes a callable (a function pointer here) followed by any arguments to pass to it.
After t is constructed, both threads run concurrently: the main thread proceeds to the next statement while t executes greet. The exact interleaving of their output is non-deterministic — on different runs or different machines, the lines may appear in different orders.
t.join() blocks the main thread until t finishes executing. Without join(), the main thread might reach the end of main() and destroy t before it finishes — which would call std::terminate() and crash the program.
this_thread::get_id() returns a thread ID unique to the calling thread. The main thread and the new thread have different IDs.
After join() returns, t is a joined thread — its associated OS thread has finished and t no longer represents a running thread. Calling join() on an already-joined thread is undefined behavior.

The join() and detach() Contract

Every std::thread object that represents a running thread must be either joined or detached before it is destroyed. This is a firm rule — violating it calls std::terminate().

C++

#include <iostream>
#include <thread>
#include <chrono>
using namespace std;

void slowTask(int id, int seconds) {
    cout << "Task " << id << " starting" << endl;
    this_thread::sleep_for(chrono::seconds(seconds));
    cout << "Task " << id << " done" << endl;
}

int main() {
    // --- join(): wait for the thread to complete ---
    cout << "=== join() demo ===" << endl;
    {
        thread t1(slowTask, 1, 1);
        cout << "Waiting for task 1..." << endl;
        t1.join();  // Blocks until t1 is done
        cout << "Task 1 complete, main continues" << endl;
    }

    // --- detach(): fire and forget ---
    cout << "\n=== detach() demo ===" << endl;
    {
        thread t2(slowTask, 2, 2);
        t2.detach();  // t2 runs independently; we no longer own it
        cout << "Task 2 detached — main doesn't wait" << endl;
        // t2 is still running, but the thread object t2 is now empty
        // When main exits shortly, the OS will clean up the detached thread
    }

    // --- joinable(): checking before join/detach ---
    cout << "\n=== joinable() demo ===" << endl;
    thread t3(slowTask, 3, 0);
    cout << "t3 joinable before join: " << t3.joinable() << endl;
    t3.join();
    cout << "t3 joinable after join:  " << t3.joinable() << endl;

    // Default-constructed thread: not joinable
    thread empty;
    cout << "empty joinable: " << empty.joinable() << endl;

    this_thread::sleep_for(chrono::milliseconds(100));
    cout << "\nMain thread ending" << endl;
    return 0;
}

#include <iostream>
#include <thread>
#include <chrono>
using namespace std;

void slowTask(int id, int seconds) {
    cout << "Task " << id << " starting" << endl;
    this_thread::sleep_for(chrono::seconds(seconds));
    cout << "Task " << id << " done" << endl;
}

int main() {
    // --- join(): wait for the thread to complete ---
    cout << "=== join() demo ===" << endl;
    {
        thread t1(slowTask, 1, 1);
        cout << "Waiting for task 1..." << endl;
        t1.join();  // Blocks until t1 is done
        cout << "Task 1 complete, main continues" << endl;
    }

    // --- detach(): fire and forget ---
    cout << "\n=== detach() demo ===" << endl;
    {
        thread t2(slowTask, 2, 2);
        t2.detach();  // t2 runs independently; we no longer own it
        cout << "Task 2 detached — main doesn't wait" << endl;
        // t2 is still running, but the thread object t2 is now empty
        // When main exits shortly, the OS will clean up the detached thread
    }

    // --- joinable(): checking before join/detach ---
    cout << "\n=== joinable() demo ===" << endl;
    thread t3(slowTask, 3, 0);
    cout << "t3 joinable before join: " << t3.joinable() << endl;
    t3.join();
    cout << "t3 joinable after join:  " << t3.joinable() << endl;

    // Default-constructed thread: not joinable
    thread empty;
    cout << "empty joinable: " << empty.joinable() << endl;

    this_thread::sleep_for(chrono::milliseconds(100));
    cout << "\nMain thread ending" << endl;
    return 0;
}

Output:

Plaintext

=== join() demo ===
Waiting for task 1...
Task 1 starting
Task 1 done
Task 1 complete, main continues

=== detach() demo ===
Task 2 detached — main doesn't wait
Task 2 starting

=== joinable() demo ===
Task 3 starting
Task 3 done
t3 joinable before join: 1
t3 joinable after join:  0
empty joinable: 0

Main thread ending

=== join() demo ===
Waiting for task 1...
Task 1 starting
Task 1 done
Task 1 complete, main continues

=== detach() demo ===
Task 2 detached — main doesn't wait
Task 2 starting

=== joinable() demo ===
Task 3 starting
Task 3 done
t3 joinable before join: 1
t3 joinable after join:  0
empty joinable: 0

Main thread ending

Step-by-step explanation:

t1.join() blocks the main thread until t1 finishes. This is the standard pattern when you need the result or need to know the task is done before proceeding.
t2.detach() releases ownership of the underlying OS thread. The thread continues running independently, managed by the OS. After detach(), the std::thread object t2 no longer represents a running thread — t2.joinable() returns false. You lose the ability to join it.
Detached threads are used for “fire and forget” background tasks — logging, monitoring, or tasks that should outlive the scope that created them. But beware: a detached thread must not access local variables or objects from the scope that created it — those may be destroyed while the thread is still running.
t3.joinable() returns true when the thread represents a running or joinable OS thread, and false after join(), detach(), or for default-constructed threads. Always check joinable() before calling join() or detach() in cases where the state is uncertain.
If t goes out of scope while t.joinable() is true — neither joined nor detached — the std::thread destructor calls std::terminate(). This is by design: forgetting to join is a programming error, and C++ makes it immediately visible rather than silently leaking threads.

Passing Arguments to Threads

Arguments passed to std::thread‘s constructor are copied into the thread’s internal storage before the thread starts. This is important: by the time the thread function runs, the original argument may have gone out of scope.

C++

#include <iostream>
#include <thread>
#include <string>
using namespace std;

// Takes value by value — gets its own copy
void processValue(int value, string label) {
    cout << label << ": processing " << value << endl;
    value *= 2;  // Modifies the thread's local copy — doesn't affect original
    cout << label << ": done, local value = " << value << endl;
}

// Takes by reference — modifies the caller's variable
void increment(int& counter) {
    counter++;
    cout << "Thread incremented counter to: " << counter << endl;
}

// Takes a pointer
void processPointer(int* data, int size) {
    for (int i = 0; i < size; i++) data[i] *= 2;
    cout << "Pointer processing done" << endl;
}

int main() {
    // --- Passing by value: arguments are copied ---
    int x = 10;
    thread t1(processValue, x, string("Task-A"));
    // x is copied at thread creation — even if x changes, thread sees original value
    x = 999;  // This does NOT affect what t1 sees
    t1.join();
    cout << "x in main: " << x << endl;  // Still 999

    cout << endl;

    // --- Passing by reference: use std::ref ---
    int counter = 0;
    // thread t2(increment, counter);  // ERROR: can't implicitly pass reference
    thread t2(increment, ref(counter));  // ref() wraps counter in a reference_wrapper
    t2.join();
    cout << "counter after thread: " << counter << endl;  // 1 — modified by thread

    cout << endl;

    // --- Passing a pointer ---
    int data[] = {1, 2, 3, 4, 5};
    thread t3(processPointer, data, 5);
    t3.join();
    cout << "data after thread: ";
    for (int v : data) cout << v << " ";
    cout << endl;

    return 0;
}

#include <iostream>
#include <thread>
#include <string>
using namespace std;

// Takes value by value — gets its own copy
void processValue(int value, string label) {
    cout << label << ": processing " << value << endl;
    value *= 2;  // Modifies the thread's local copy — doesn't affect original
    cout << label << ": done, local value = " << value << endl;
}

// Takes by reference — modifies the caller's variable
void increment(int& counter) {
    counter++;
    cout << "Thread incremented counter to: " << counter << endl;
}

// Takes a pointer
void processPointer(int* data, int size) {
    for (int i = 0; i < size; i++) data[i] *= 2;
    cout << "Pointer processing done" << endl;
}

int main() {
    // --- Passing by value: arguments are copied ---
    int x = 10;
    thread t1(processValue, x, string("Task-A"));
    // x is copied at thread creation — even if x changes, thread sees original value
    x = 999;  // This does NOT affect what t1 sees
    t1.join();
    cout << "x in main: " << x << endl;  // Still 999

    cout << endl;

    // --- Passing by reference: use std::ref ---
    int counter = 0;
    // thread t2(increment, counter);  // ERROR: can't implicitly pass reference
    thread t2(increment, ref(counter));  // ref() wraps counter in a reference_wrapper
    t2.join();
    cout << "counter after thread: " << counter << endl;  // 1 — modified by thread

    cout << endl;

    // --- Passing a pointer ---
    int data[] = {1, 2, 3, 4, 5};
    thread t3(processPointer, data, 5);
    t3.join();
    cout << "data after thread: ";
    for (int v : data) cout << v << " ";
    cout << endl;

    return 0;
}

Output:

Plaintext

Task-A: processing 10
Task-A: done, local value = 20
x in main: 999
Task-A: processing 10

counter after thread: 1
Thread incremented counter to: 1

Pointer processing done
data after thread: 2 4 6 8 10

Task-A: processing 10
Task-A: done, local value = 20
x in main: 999
Task-A: processing 10

counter after thread: 1
Thread incremented counter to: 1

Pointer processing done
data after thread: 2 4 6 8 10

Step-by-step explanation:

thread t1(processValue, x, string("Task-A")) copies both x (an int) and the string into the thread’s internal argument storage at construction time. When x is later changed to 999, t1‘s copy of x is unaffected — it still sees 10.
Attempting thread t2(increment, counter) would fail to compile because std::thread‘s constructor uses perfect forwarding, and it cannot implicitly convert an lvalue int into an int& — this would create a reference to a copy inside the thread, not to counter itself.
std::ref(counter) wraps counter in a std::reference_wrapper<int>, which is copyable but carries the reference semantics. The thread function receives a true reference to counter and modifies it. After join(), counter is 1.
Raw pointers are passed as-is — they are trivially copyable. The thread receives a pointer to the same data array. When the thread doubles each element, it modifies the array in the main thread’s memory.
Lifetime danger with references and pointers: If the variable being referenced or pointed to goes out of scope before the thread finishes, the thread holds a dangling reference or pointer. This is undefined behavior. Always ensure that the lifetime of referenced/pointed-to data exceeds the thread’s lifetime.

Launching Multiple Threads

Creating a pool of threads to work on parts of a problem in parallel is a fundamental concurrency pattern.

C++

#include <iostream>
#include <thread>
#include <vector>
#include <numeric>
#include <chrono>
using namespace std;

// Compute sum of a subrange
void partialSum(const vector<int>& data, int start, int end, long long& result) {
    result = 0;
    for (int i = start; i < end; i++) result += data[i];
}

int main() {
    // Build a large dataset
    const int N = 10'000'000;
    vector<int> data(N);
    iota(data.begin(), data.end(), 1);  // Fill with 1, 2, 3, ..., N

    auto startTime = chrono::high_resolution_clock::now();

    // Single-threaded sum for comparison
    long long singleSum = 0;
    for (int x : data) singleSum += x;

    auto singleEnd = chrono::high_resolution_clock::now();
    double singleMs = chrono::duration<double, milli>(singleEnd - startTime).count();

    // Multi-threaded sum: divide work among threads
    const int numThreads = 4;
    vector<thread>   threads(numThreads);
    vector<long long> partials(numThreads, 0);

    int chunkSize = N / numThreads;

    auto multiStart = chrono::high_resolution_clock::now();

    // Launch threads
    for (int i = 0; i < numThreads; i++) {
        int start = i * chunkSize;
        int end   = (i == numThreads - 1) ? N : start + chunkSize;
        threads[i] = thread(partialSum,
                            cref(data),   // const ref: read-only access
                            start, end,
                            ref(partials[i]));  // Each thread writes to its own slot
    }

    // Wait for all threads to finish
    for (thread& t : threads) t.join();

    // Combine partial results
    long long multiSum = 0;
    for (long long p : partials) multiSum += p;

    auto multiEnd = chrono::high_resolution_clock::now();
    double multiMs = chrono::duration<double, milli>(multiEnd - multiStart).count();

    cout << "Single-threaded sum: " << singleSum << " in " << singleMs << " ms" << endl;
    cout << "Multi-threaded sum:  " << multiSum  << " in " << multiMs  << " ms" << endl;
    cout << "Results match: " << (singleSum == multiSum ? "YES" : "NO") << endl;
    cout << "Speedup: " << singleMs / multiMs << "x" << endl;

    return 0;
}

#include <iostream>
#include <thread>
#include <vector>
#include <numeric>
#include <chrono>
using namespace std;

// Compute sum of a subrange
void partialSum(const vector<int>& data, int start, int end, long long& result) {
    result = 0;
    for (int i = start; i < end; i++) result += data[i];
}

int main() {
    // Build a large dataset
    const int N = 10'000'000;
    vector<int> data(N);
    iota(data.begin(), data.end(), 1);  // Fill with 1, 2, 3, ..., N

    auto startTime = chrono::high_resolution_clock::now();

    // Single-threaded sum for comparison
    long long singleSum = 0;
    for (int x : data) singleSum += x;

    auto singleEnd = chrono::high_resolution_clock::now();
    double singleMs = chrono::duration<double, milli>(singleEnd - startTime).count();

    // Multi-threaded sum: divide work among threads
    const int numThreads = 4;
    vector<thread>   threads(numThreads);
    vector<long long> partials(numThreads, 0);

    int chunkSize = N / numThreads;

    auto multiStart = chrono::high_resolution_clock::now();

    // Launch threads
    for (int i = 0; i < numThreads; i++) {
        int start = i * chunkSize;
        int end   = (i == numThreads - 1) ? N : start + chunkSize;
        threads[i] = thread(partialSum,
                            cref(data),   // const ref: read-only access
                            start, end,
                            ref(partials[i]));  // Each thread writes to its own slot
    }

    // Wait for all threads to finish
    for (thread& t : threads) t.join();

    // Combine partial results
    long long multiSum = 0;
    for (long long p : partials) multiSum += p;

    auto multiEnd = chrono::high_resolution_clock::now();
    double multiMs = chrono::duration<double, milli>(multiEnd - multiStart).count();

    cout << "Single-threaded sum: " << singleSum << " in " << singleMs << " ms" << endl;
    cout << "Multi-threaded sum:  " << multiSum  << " in " << multiMs  << " ms" << endl;
    cout << "Results match: " << (singleSum == multiSum ? "YES" : "NO") << endl;
    cout << "Speedup: " << singleMs / multiMs << "x" << endl;

    return 0;
}

Output (typical, actual times vary):

Plaintext

Single-threaded sum: 50000005000000 in 18.4 ms
Multi-threaded sum:  50000005000000 in 5.1 ms
Results match: YES
Speedup: 3.6x

Single-threaded sum: 50000005000000 in 18.4 ms
Multi-threaded sum:  50000005000000 in 5.1 ms
Results match: YES
Speedup: 3.6x

Step-by-step explanation:

The work is divided into numThreads chunks. Each thread processes a contiguous subrange and writes its partial sum into its own element of partials. Since each thread writes to a different element, there is no data race — no thread reads or writes the same memory as another.
cref(data) passes a const reference to the data vector. All threads read from the same vector simultaneously — this is safe because they are all reading, not writing.
ref(partials[i]) passes a reference to each thread’s dedicated result slot. After all threads complete (after the join() loop), the main thread sums the partial results.
The speedup is not 4x even with 4 threads due to thread creation overhead, memory bandwidth limits (all four threads compete for the same memory bus), and the serial final summation step. In practice, memory-bound operations see less-than-linear speedup; CPU-bound operations with less memory access can approach linear speedup.
The for (thread& t : threads) t.join() pattern is the standard idiom for waiting on a collection of threads. It iterates over all threads and joins each one in order. Even if earlier threads finish before later ones, join() will return immediately for already-finished threads.

Data Races: The Core Danger of Shared State

A data race occurs when two or more threads access the same memory location, at least one access is a write, and the accesses are not synchronized. Data races cause undefined behavior — the program may produce wrong results, crash, or behave inconsistently across runs.

C++

#include <iostream>
#include <thread>
#include <vector>
using namespace std;

// Unsafe counter — data race!
int unsafeCounter = 0;

void incrementUnsafe(int times) {
    for (int i = 0; i < times; i++) {
        unsafeCounter++;  // NOT ATOMIC: read-modify-write, multiple threads conflict
    }
}

int main() {
    const int numThreads = 10;
    const int timesEach  = 100000;
    const int expected   = numThreads * timesEach;

    cout << "Expected final counter: " << expected << endl;

    // Run the unsafe version multiple times to show non-determinism
    for (int trial = 0; trial < 3; trial++) {
        unsafeCounter = 0;

        vector<thread> threads;
        for (int i = 0; i < numThreads; i++) {
            threads.emplace_back(incrementUnsafe, timesEach);
        }
        for (thread& t : threads) t.join();

        cout << "Trial " << (trial+1) << " result: " << unsafeCounter
             << " (off by " << (expected - unsafeCounter) << ")" << endl;
    }

    return 0;
}

#include <iostream>
#include <thread>
#include <vector>
using namespace std;

// Unsafe counter — data race!
int unsafeCounter = 0;

void incrementUnsafe(int times) {
    for (int i = 0; i < times; i++) {
        unsafeCounter++;  // NOT ATOMIC: read-modify-write, multiple threads conflict
    }
}

int main() {
    const int numThreads = 10;
    const int timesEach  = 100000;
    const int expected   = numThreads * timesEach;

    cout << "Expected final counter: " << expected << endl;

    // Run the unsafe version multiple times to show non-determinism
    for (int trial = 0; trial < 3; trial++) {
        unsafeCounter = 0;

        vector<thread> threads;
        for (int i = 0; i < numThreads; i++) {
            threads.emplace_back(incrementUnsafe, timesEach);
        }
        for (thread& t : threads) t.join();

        cout << "Trial " << (trial+1) << " result: " << unsafeCounter
             << " (off by " << (expected - unsafeCounter) << ")" << endl;
    }

    return 0;
}

Output (typical — results vary every run):

Plaintext

Expected final counter: 1000000
Trial 1 result: 724531 (off by 275469)
Trial 2 result: 811204 (off by 188796)
Trial 3 result: 763847 (off by 236153)

Expected final counter: 1000000
Trial 1 result: 724531 (off by 275469)
Trial 2 result: 811204 (off by 188796)
Trial 3 result: 763847 (off by 236153)

Step-by-step explanation:

unsafeCounter++ is not atomic — it compiles to three machine instructions: load the current value into a register, add 1 to the register, store the register back to memory.
When two threads execute these three steps concurrently, they can interleave: Thread A loads 100, Thread B also loads 100 (before A stores), Thread A stores 101, Thread B stores 101. Two increments happened but the counter only increased by 1. This is called a lost update.
The final value is consistently less than 1,000,000 because updates are being lost. It varies between runs because the exact thread interleaving is non-deterministic.
This is a data race — undefined behavior. The C++ standard says “if a data race occurs, the behavior of the entire program is undefined.” In practice on x86 hardware you get lost updates; on other architectures or with different compiler optimizations, you might get crashes or even stranger results.
The fix is synchronization: std::atomic<int> for simple counters, std::mutex for more complex operations, or redesigning the algorithm to avoid shared state entirely (as the partial-sum example above does). These are covered in the next articles in this series.

Returning Values from Threads

std::thread does not directly support return values. There are three common patterns for getting a result back from a thread:

C++

#include <iostream>
#include <thread>
#include <future>
#include <vector>
using namespace std;

// Pattern 1: Output parameter via reference
void computeSquare_ref(int input, int& output) {
    output = input * input;
}

// Pattern 2: std::promise / std::future (the C++ standard approach)
void computeSquare_promise(int input, promise<int> prom) {
    prom.set_value(input * input);
}

// Pattern 3: std::async (simplest — returns a future directly)
int computeSquare_sync(int input) {
    return input * input;  // Regular function — async wraps it
}

int main() {
    // --- Pattern 1: output reference ---
    int result1 = 0;
    thread t1(computeSquare_ref, 7, ref(result1));
    t1.join();
    cout << "Pattern 1 (ref):     7^2 = " << result1 << endl;

    // --- Pattern 2: promise/future ---
    promise<int> prom;
    future<int>  fut = prom.get_future();

    thread t2(computeSquare_promise, 8, move(prom));
    // prom is moved into the thread — we keep the future end
    int result2 = fut.get();  // Blocks until value is set
    t2.join();
    cout << "Pattern 2 (promise): 8^2 = " << result2 << endl;

    // --- Pattern 3: std::async (preferred for simple cases) ---
    future<int> fut3 = async(launch::async, computeSquare_sync, 9);
    // async spawns a thread and wraps everything; future.get() retrieves result
    int result3 = fut3.get();
    cout << "Pattern 3 (async):   9^2 = " << result3 << endl;

    // Multiple async tasks running in parallel
    vector<future<int>> futures;
    for (int i = 1; i <= 5; i++) {
        futures.push_back(async(launch::async, computeSquare_sync, i));
    }
    cout << "Squares 1-5: ";
    for (auto& f : futures) cout << f.get() << " ";
    cout << endl;

    return 0;
}

#include <iostream>
#include <thread>
#include <future>
#include <vector>
using namespace std;

// Pattern 1: Output parameter via reference
void computeSquare_ref(int input, int& output) {
    output = input * input;
}

// Pattern 2: std::promise / std::future (the C++ standard approach)
void computeSquare_promise(int input, promise<int> prom) {
    prom.set_value(input * input);
}

// Pattern 3: std::async (simplest — returns a future directly)
int computeSquare_sync(int input) {
    return input * input;  // Regular function — async wraps it
}

int main() {
    // --- Pattern 1: output reference ---
    int result1 = 0;
    thread t1(computeSquare_ref, 7, ref(result1));
    t1.join();
    cout << "Pattern 1 (ref):     7^2 = " << result1 << endl;

    // --- Pattern 2: promise/future ---
    promise<int> prom;
    future<int>  fut = prom.get_future();

    thread t2(computeSquare_promise, 8, move(prom));
    // prom is moved into the thread — we keep the future end
    int result2 = fut.get();  // Blocks until value is set
    t2.join();
    cout << "Pattern 2 (promise): 8^2 = " << result2 << endl;

    // --- Pattern 3: std::async (preferred for simple cases) ---
    future<int> fut3 = async(launch::async, computeSquare_sync, 9);
    // async spawns a thread and wraps everything; future.get() retrieves result
    int result3 = fut3.get();
    cout << "Pattern 3 (async):   9^2 = " << result3 << endl;

    // Multiple async tasks running in parallel
    vector<future<int>> futures;
    for (int i = 1; i <= 5; i++) {
        futures.push_back(async(launch::async, computeSquare_sync, i));
    }
    cout << "Squares 1-5: ";
    for (auto& f : futures) cout << f.get() << " ";
    cout << endl;

    return 0;
}

Output:

Plaintext

Pattern 1 (ref):     7^2 = 49
Pattern 2 (promise): 8^2 = 64
Pattern 3 (async):   9^2 = 81
Squares 1-5: 1 4 9 16 25

Pattern 1 (ref):     7^2 = 49
Pattern 2 (promise): 8^2 = 64
Pattern 3 (async):   9^2 = 81
Squares 1-5: 1 4 9 16 25

Step-by-step explanation:

Pattern 1 (output reference): Pass a reference to a result variable using std::ref. The thread writes its result there. Simple but low-level — you have no built-in exception propagation.
Pattern 2 (promise/future): std::promise<T> is a channel for sending a value. The thread receives the promise (moved in, since promises are non-copyable), computes the result, and calls set_value(). The main thread holds a std::future<T> connected to the same channel and calls fut.get() to retrieve the value — blocking if the value is not yet ready. If the thread throws an exception instead of setting a value, fut.get() rethrows it.
Pattern 3 (async): std::async(launch::async, func, args...) is the highest-level option — it spawns a thread, runs the function, and returns a future that holds the result. No thread, join, or promise management needed. future.get() blocks until the result is available and propagates any exceptions. For simple parallelism, std::async is the recommended approach.
The five parallel async tasks each run in their own thread (with launch::async). All five compute concurrently, and f.get() retrieves each result in order. The results are always 1 4 9 16 25 regardless of which thread finishes first, because we retrieve them in order.

Thread Local Storage

Sometimes you want each thread to have its own private copy of a variable — not shared between threads. The thread_local keyword provides this.

C++

#include <iostream>
#include <thread>
using namespace std;

thread_local int threadCounter = 0;  // Each thread has its own copy

void workerThread(int id) {
    // Each thread modifies its own threadCounter independently
    for (int i = 0; i < 5; i++) {
        threadCounter++;
    }
    cout << "Thread " << id << " counter: " << threadCounter << endl;
}

// Thread-local logger: each thread has its own log buffer
thread_local string logBuffer;

void appendLog(const string& message) {
    logBuffer += "[" + message + "]";
}

void workerWithLog(int id) {
    appendLog("start");
    appendLog("work" + to_string(id));
    appendLog("end");
    cout << "Thread " << id << " log: " << logBuffer << endl;
}

int main() {
    cout << "=== thread_local counter ===" << endl;
    thread t1(workerThread, 1);
    thread t2(workerThread, 2);
    thread t3(workerThread, 3);
    t1.join(); t2.join(); t3.join();
    // Main thread's threadCounter is still 0 — unaffected
    cout << "Main thread counter: " << threadCounter << endl;

    cout << "\n=== thread_local log buffer ===" << endl;
    thread t4(workerWithLog, 1);
    thread t5(workerWithLog, 2);
    t4.join(); t5.join();
    cout << "Main thread log: '" << logBuffer << "' (empty)" << endl;

    return 0;
}

#include <iostream>
#include <thread>
using namespace std;

thread_local int threadCounter = 0;  // Each thread has its own copy

void workerThread(int id) {
    // Each thread modifies its own threadCounter independently
    for (int i = 0; i < 5; i++) {
        threadCounter++;
    }
    cout << "Thread " << id << " counter: " << threadCounter << endl;
}

// Thread-local logger: each thread has its own log buffer
thread_local string logBuffer;

void appendLog(const string& message) {
    logBuffer += "[" + message + "]";
}

void workerWithLog(int id) {
    appendLog("start");
    appendLog("work" + to_string(id));
    appendLog("end");
    cout << "Thread " << id << " log: " << logBuffer << endl;
}

int main() {
    cout << "=== thread_local counter ===" << endl;
    thread t1(workerThread, 1);
    thread t2(workerThread, 2);
    thread t3(workerThread, 3);
    t1.join(); t2.join(); t3.join();
    // Main thread's threadCounter is still 0 — unaffected
    cout << "Main thread counter: " << threadCounter << endl;

    cout << "\n=== thread_local log buffer ===" << endl;
    thread t4(workerWithLog, 1);
    thread t5(workerWithLog, 2);
    t4.join(); t5.join();
    cout << "Main thread log: '" << logBuffer << "' (empty)" << endl;

    return 0;
}

Output:

Plaintext

=== thread_local counter ===
Thread 1 counter: 5
Thread 2 counter: 5
Thread 3 counter: 5
Main thread counter: 0

=== thread_local log buffer ===
Thread 1 log: [start][work1][end]
Thread 2 log: [start][work2][end]
Main thread log: '' (empty)

=== thread_local counter ===
Thread 1 counter: 5
Thread 2 counter: 5
Thread 3 counter: 5
Main thread counter: 0

=== thread_local log buffer ===
Thread 1 log: [start][work1][end]
Thread 2 log: [start][work2][end]
Main thread log: '' (empty)

Step-by-step explanation:

thread_local int threadCounter = 0 declares a variable that exists once per thread. Each thread gets its own zero-initialized copy at the start. Incrementing in one thread does not affect others.
The main thread’s threadCounter remains 0 even after the worker threads increment theirs to 5. These are truly separate storage locations.
thread_local string logBuffer gives each thread its own log buffer. appendLog writes to the calling thread’s buffer. The two worker threads each build their own independent log strings without any synchronization needed.
Thread-local storage is ideal for per-thread caches, per-thread random number generators, per-thread connection pools, and any data that should be private to a thread’s execution context.

C++20 std::jthread: RAII Thread Management

C++20 introduced std::jthread — a “joining thread” that automatically calls join() in its destructor, making it a proper RAII wrapper. It also supports cooperative cancellation via std::stop_token.

C++

#include <iostream>
#include <thread>
#include <stop_token>
#include <chrono>
using namespace std;

// Regular work function
void countUp(int id, int count) {
    for (int i = 1; i <= count; i++) {
        cout << "Thread " << id << ": count = " << i << endl;
        this_thread::sleep_for(chrono::milliseconds(100));
    }
}

// Cancellable work function — checks stop_token
void cancellableWork(stop_token stopToken, int id) {
    int step = 0;
    while (!stopToken.stop_requested()) {
        cout << "Thread " << id << ": step " << ++step << endl;
        this_thread::sleep_for(chrono::milliseconds(150));
    }
    cout << "Thread " << id << ": cancellation requested, stopping." << endl;
}

int main() {
    cout << "=== std::jthread auto-join ===" << endl;
    {
        jthread t1(countUp, 1, 3);
        jthread t2(countUp, 2, 3);
        cout << "Threads running..." << endl;
        // Scope ends: t1 and t2 destructors call join() automatically
        // No explicit join() needed — and no risk of std::terminate()
    }
    cout << "Both threads finished (auto-joined)" << endl;

    cout << "\n=== std::jthread with stop_token ===" << endl;
    {
        jthread t3(cancellableWork, 3);
        // Let it run for a bit
        this_thread::sleep_for(chrono::milliseconds(500));
        // Request cancellation — the thread will notice and exit cleanly
        t3.request_stop();
        // Destructor still joins automatically
    }
    cout << "Cancellable thread stopped and joined" << endl;

    return 0;
}

#include <iostream>
#include <thread>
#include <stop_token>
#include <chrono>
using namespace std;

// Regular work function
void countUp(int id, int count) {
    for (int i = 1; i <= count; i++) {
        cout << "Thread " << id << ": count = " << i << endl;
        this_thread::sleep_for(chrono::milliseconds(100));
    }
}

// Cancellable work function — checks stop_token
void cancellableWork(stop_token stopToken, int id) {
    int step = 0;
    while (!stopToken.stop_requested()) {
        cout << "Thread " << id << ": step " << ++step << endl;
        this_thread::sleep_for(chrono::milliseconds(150));
    }
    cout << "Thread " << id << ": cancellation requested, stopping." << endl;
}

int main() {
    cout << "=== std::jthread auto-join ===" << endl;
    {
        jthread t1(countUp, 1, 3);
        jthread t2(countUp, 2, 3);
        cout << "Threads running..." << endl;
        // Scope ends: t1 and t2 destructors call join() automatically
        // No explicit join() needed — and no risk of std::terminate()
    }
    cout << "Both threads finished (auto-joined)" << endl;

    cout << "\n=== std::jthread with stop_token ===" << endl;
    {
        jthread t3(cancellableWork, 3);
        // Let it run for a bit
        this_thread::sleep_for(chrono::milliseconds(500));
        // Request cancellation — the thread will notice and exit cleanly
        t3.request_stop();
        // Destructor still joins automatically
    }
    cout << "Cancellable thread stopped and joined" << endl;

    return 0;
}

Output (timing-dependent):

Plaintext

=== std::jthread auto-join ===
Threads running...
Thread 1: count = 1
Thread 2: count = 1
Thread 1: count = 2
Thread 2: count = 2
Thread 1: count = 3
Thread 2: count = 3
Both threads finished (auto-joined)

=== std::jthread with stop_token ===
Thread 3: step 1
Thread 3: step 2
Thread 3: step 3
Thread 3: cancellation requested, stopping.
Cancellable thread stopped and joined

=== std::jthread auto-join ===
Threads running...
Thread 1: count = 1
Thread 2: count = 1
Thread 1: count = 2
Thread 2: count = 2
Thread 1: count = 3
Thread 2: count = 3
Both threads finished (auto-joined)

=== std::jthread with stop_token ===
Thread 3: step 1
Thread 3: step 2
Thread 3: step 3
Thread 3: cancellation requested, stopping.
Cancellable thread stopped and joined

Step-by-step explanation:

jthread (joining thread) automatically calls join() in its destructor. This makes it a proper RAII type — you cannot forget to join a jthread. The scope-based lifetime management ensures threads are always cleaned up properly.
t3.request_stop() signals the stop token associated with t3. The thread function receives a stop_token as its first parameter (automatically passed by jthread when the callable takes one). It checks stopToken.stop_requested() in its loop.
Cooperative cancellation is the safe, clean way to stop a thread. Forcibly terminating a thread (which C++ does not support directly for good reason) would leave resources in undefined states and skip destructors. With stop_token, the thread can clean up properly before exiting.
When t3‘s scope ends, the destructor calls request_stop() (if not already called) and then join(). This ensures the thread always stops cleanly when its owning jthread object is destroyed.
Prefer std::jthread over std::thread in new C++20 code. It eliminates the most common threading bug (forgetting to join) and provides a standard cancellation mechanism.

Thread Lifecycle and Common Patterns

Here is a summary of the complete std::thread lifecycle and the patterns to follow:

C++

#include <iostream>
#include <thread>
#include <vector>
#include <algorithm>
using namespace std;

void task(int id) {
    cout << "Task " << id << " running on thread "
         << this_thread::get_id() << endl;
}

int main() {
    // Pattern 1: Simple join
    thread t1(task, 1);
    t1.join();

    // Pattern 2: Move thread into a container, join all
    vector<thread> workers;
    for (int i = 2; i <= 5; i++) {
        workers.emplace_back(task, i);  // emplace_back constructs in-place
    }
    for (auto& t : workers) t.join();

    // Pattern 3: Conditional join (safe with joinable check)
    thread t6;
    bool shouldLaunch = true;
    if (shouldLaunch) t6 = thread(task, 6);
    if (t6.joinable()) t6.join();

    // Pattern 4: Move semantics — transfer thread ownership
    thread t7(task, 7);
    thread t8 = move(t7);  // t7 is now empty (not joinable)
    cout << "t7 joinable: " << t7.joinable() << endl;  // 0
    cout << "t8 joinable: " << t8.joinable() << endl;  // 1
    t8.join();

    // Pattern 5: Hardware concurrency hint
    unsigned int cores = thread::hardware_concurrency();
    cout << "Hardware concurrency: " << cores << " threads" << endl;

    return 0;
}

#include <iostream>
#include <thread>
#include <vector>
#include <algorithm>
using namespace std;

void task(int id) {
    cout << "Task " << id << " running on thread "
         << this_thread::get_id() << endl;
}

int main() {
    // Pattern 1: Simple join
    thread t1(task, 1);
    t1.join();

    // Pattern 2: Move thread into a container, join all
    vector<thread> workers;
    for (int i = 2; i <= 5; i++) {
        workers.emplace_back(task, i);  // emplace_back constructs in-place
    }
    for (auto& t : workers) t.join();

    // Pattern 3: Conditional join (safe with joinable check)
    thread t6;
    bool shouldLaunch = true;
    if (shouldLaunch) t6 = thread(task, 6);
    if (t6.joinable()) t6.join();

    // Pattern 4: Move semantics — transfer thread ownership
    thread t7(task, 7);
    thread t8 = move(t7);  // t7 is now empty (not joinable)
    cout << "t7 joinable: " << t7.joinable() << endl;  // 0
    cout << "t8 joinable: " << t8.joinable() << endl;  // 1
    t8.join();

    // Pattern 5: Hardware concurrency hint
    unsigned int cores = thread::hardware_concurrency();
    cout << "Hardware concurrency: " << cores << " threads" << endl;

    return 0;
}

Output:

Plaintext

Task 1 running on thread 140...1
Task 2 running on thread 140...2
Task 3 running on thread 140...3
Task 4 running on thread 140...4
Task 5 running on thread 140...5
Task 6 running on thread 140...6
t7 joinable: 0
t8 joinable: 1
Task 7 running on thread 140...7
Task 8 running on thread 140...8
Hardware concurrency: 8 threads

Task 1 running on thread 140...1
Task 2 running on thread 140...2
Task 3 running on thread 140...3
Task 4 running on thread 140...4
Task 5 running on thread 140...5
Task 6 running on thread 140...6
t7 joinable: 0
t8 joinable: 1
Task 7 running on thread 140...7
Task 8 running on thread 140...8
Hardware concurrency: 8 threads

Step-by-step explanation:

workers.emplace_back(task, i) constructs a thread directly inside the vector using the thread constructor arguments — no temporary thread is created and moved. This is more efficient than push_back(thread(task, i)).
std::thread is movable but not copyable — you cannot copy a thread object. Move semantics allow threads to be transferred: thread t8 = move(t7) transfers ownership of the running OS thread from t7 to t8. After the move, t7 is empty (not joinable) and t8 owns the thread.
thread::hardware_concurrency() returns a hint (not a guarantee) of how many threads the hardware supports concurrently — typically the number of logical CPU cores. Use this to decide how many threads to create for CPU-bound parallelism.

Common Mistakes and How to Avoid Them

Mistake 1: Forgetting to join or detach.

C++

void bad() {
    thread t(someWork);
    // t goes out of scope without join or detach — std::terminate()!
}
void good() {
    thread t(someWork);
    t.join();  // Or: t.detach(), or use jthread
}

void bad() {
    thread t(someWork);
    // t goes out of scope without join or detach — std::terminate()!
}
void good() {
    thread t(someWork);
    t.join();  // Or: t.detach(), or use jthread
}

Mistake 2: Accessing a local variable from a detached thread after the variable’s scope ends.

C++

void bad() {
    int localVar = 42;
    thread t([&localVar]() {
        this_thread::sleep_for(chrono::seconds(1));
        cout << localVar;  // DANGER: localVar may be destroyed
    });
    t.detach();
    // bad() returns, localVar is destroyed — thread has dangling reference
}

void bad() {
    int localVar = 42;
    thread t([&localVar]() {
        this_thread::sleep_for(chrono::seconds(1));
        cout << localVar;  // DANGER: localVar may be destroyed
    });
    t.detach();
    // bad() returns, localVar is destroyed — thread has dangling reference
}

Mistake 3: Race condition on shared state.

C++

int shared = 0;
// Two threads doing shared++ without synchronization = data race = UB
// Fix: use std::atomic<int> or std::mutex

int shared = 0;
// Two threads doing shared++ without synchronization = data race = UB
// Fix: use std::atomic<int> or std::mutex

Mistake 4: Joining from the wrong thread or joining twice.

C++

thread t(work);
t.join();   // OK
t.join();   // ERROR: undefined behavior — already joined, not joinable
// Always check t.joinable() if the join state is uncertain

thread t(work);
t.join();   // OK
t.join();   // ERROR: undefined behavior — already joined, not joinable
// Always check t.joinable() if the join state is uncertain

Mistake 5: Creating too many threads.

C++

// Creating one thread per task is expensive for short tasks
// Prefer a thread pool or std::async for task-based parallelism
for (int i = 0; i < 100000; i++) {
    thread t(tinyTask, i);  // BAD: 100000 threads
    t.detach();
}
// Better: use a thread pool or std::async with a reasonable limit

// Creating one thread per task is expensive for short tasks
// Prefer a thread pool or std::async for task-based parallelism
for (int i = 0; i < 100000; i++) {
    thread t(tinyTask, i);  // BAD: 100000 threads
    t.detach();
}
// Better: use a thread pool or std::async with a reasonable limit

std::thread at a Glance

Operation	Syntax	Notes
Create thread	`thread t(func, args...)`	Starts immediately
Join	`t.join()`	Blocks until thread finishes
Detach	`t.detach()`	Releases ownership — fire and forget
Check joinable	`t.joinable()`	True if join/detach not yet called
Move thread	`thread t2 = move(t1)`	Transfers ownership; t1 becomes empty
Thread ID	`this_thread::get_id()`	Returns current thread’s ID
Sleep	`this_thread::sleep_for(duration)`	Pause the calling thread
Yield	`this_thread::yield()`	Hint to OS to reschedule
Hardware hint	`thread::hardware_concurrency()`	Logical CPU core count
RAII thread (C++20)	`jthread t(func, args...)`	Auto-joins in destructor
Pass reference	`ref(variable)`	Required for lvalue references
Pass const ref	`cref(variable)`	Const reference
Return value	`std::async` / `promise`+`future`	thread itself has no return value

Conclusion

std::thread brings portable, standardized multithreading to C++, replacing the era of platform-specific threading APIs with a clean, consistent interface. Creating a thread is as simple as constructing a std::thread with a callable and its arguments. Managing it correctly requires understanding the join/detach contract: every joinable thread must be joined or detached before its std::thread object is destroyed.

The most important concept to internalize is data races: concurrent access to shared mutable state without synchronization produces undefined behavior. The examples in this article deliberately avoided shared state (each thread worked on its own data) — in the next article, you will learn std::mutex and std::lock_guard for safely sharing data between threads.

C++20’s std::jthread improves on std::thread significantly: it auto-joins in its destructor (preventing the most common threading bug) and supports cooperative cancellation via stop_token. For new C++20 code, prefer jthread over thread.

Multithreading is a powerful tool that demands careful thinking about shared state, synchronization, and thread lifetimes. Once you master these fundamentals, you unlock the ability to write software that fully leverages modern multi-core hardware — processing more data, responding faster, and doing more work in less time.