std::thread is the C++11 standard library class for creating and managing threads. You create a thread by constructing a std::thread object and passing it a callable (a function, lambda, or function object) along with any arguments. The thread begins executing immediately upon construction. Before the std::thread object is destroyed, you must either call join() (wait for the thread to finish) or detach() (let it run independently) — failing to do so terminates the program.
Introduction
Modern software demands concurrency. Whether you are processing multiple network requests simultaneously, performing background computation while keeping a user interface responsive, or parallelizing a computation across multiple CPU cores to finish faster, the ability to run code in parallel is a fundamental requirement for high-performance, high-responsiveness applications.
Before C++11, multithreading in C++ required platform-specific APIs: pthreads on POSIX systems (Linux, macOS), CreateThread on Windows, or third-party libraries like Boost.Thread. Each had different interfaces, different semantics, and code written for one platform would not compile on another.
C++11 changed this by introducing a portable, standardized threading library as part of the language itself. The centerpiece is std::thread, which provides a clean, object-oriented interface for creating and managing threads. Alongside std::thread, C++11 also introduced std::mutex, std::condition_variable, std::atomic, and other synchronization primitives — together forming a complete concurrency toolkit.
This article teaches std::thread from the ground up. You will learn to create threads, pass arguments to them, retrieve results, coordinate multiple threads, understand the critical danger of data races, and use the modern std::jthread introduced in C++20. Every concept is demonstrated with practical, runnable code with thorough step-by-step explanations.
What Is a Thread?
A thread is the smallest unit of execution within a process. Every program has at least one thread — the main thread that starts at main(). When you create additional threads, they run concurrently with the main thread, sharing the same process memory space (global variables, heap memory, open file handles) while each maintaining its own call stack, program counter, and register state.
The operating system’s scheduler decides when each thread gets CPU time. On a multi-core system, multiple threads can execute truly simultaneously on different cores. On a single-core system, the OS rapidly switches between threads, creating the illusion of parallelism.
The shared memory space is both the great advantage and the great danger of threads. Sharing is efficient — no expensive inter-process communication needed. But if two threads read and write the same memory location without coordination, you get data races — one of the most insidious categories of bugs in software engineering.
Creating Your First Thread
#include <iostream>
#include <thread>
using namespace std;
// Function to run in a new thread
void greet(const string& name) {
cout << "Hello from thread! Name: " << name << endl;
cout << "Thread ID: " << this_thread::get_id() << endl;
}
int main() {
cout << "Main thread ID: " << this_thread::get_id() << endl;
// Create a thread — starts executing greet("Alice") immediately
thread t(greet, "Alice");
cout << "Main thread continues while t runs..." << endl;
// Wait for t to finish before proceeding
t.join();
cout << "Thread t has finished. Main continues." << endl;
return 0;
}Output (order of first two lines may vary):
Main thread ID: 140234567890112
Main thread continues while t runs...
Hello from thread! Name: Alice
Thread ID: 140234567890368
Thread t has finished. Main continues.Step-by-step explanation:
thread t(greet, "Alice")constructs astd::threadobject that immediately starts executinggreet("Alice")in a new OS thread. The constructor takes a callable (a function pointer here) followed by any arguments to pass to it.- After
tis constructed, both threads run concurrently: the main thread proceeds to the next statement whiletexecutesgreet. The exact interleaving of their output is non-deterministic — on different runs or different machines, the lines may appear in different orders. t.join()blocks the main thread untiltfinishes executing. Withoutjoin(), the main thread might reach the end ofmain()and destroytbefore it finishes — which would callstd::terminate()and crash the program.this_thread::get_id()returns a thread ID unique to the calling thread. The main thread and the new thread have different IDs.- After
join()returns,tis a joined thread — its associated OS thread has finished andtno longer represents a running thread. Callingjoin()on an already-joined thread is undefined behavior.
The join() and detach() Contract
Every std::thread object that represents a running thread must be either joined or detached before it is destroyed. This is a firm rule — violating it calls std::terminate().
#include <iostream>
#include <thread>
#include <chrono>
using namespace std;
void slowTask(int id, int seconds) {
cout << "Task " << id << " starting" << endl;
this_thread::sleep_for(chrono::seconds(seconds));
cout << "Task " << id << " done" << endl;
}
int main() {
// --- join(): wait for the thread to complete ---
cout << "=== join() demo ===" << endl;
{
thread t1(slowTask, 1, 1);
cout << "Waiting for task 1..." << endl;
t1.join(); // Blocks until t1 is done
cout << "Task 1 complete, main continues" << endl;
}
// --- detach(): fire and forget ---
cout << "\n=== detach() demo ===" << endl;
{
thread t2(slowTask, 2, 2);
t2.detach(); // t2 runs independently; we no longer own it
cout << "Task 2 detached — main doesn't wait" << endl;
// t2 is still running, but the thread object t2 is now empty
// When main exits shortly, the OS will clean up the detached thread
}
// --- joinable(): checking before join/detach ---
cout << "\n=== joinable() demo ===" << endl;
thread t3(slowTask, 3, 0);
cout << "t3 joinable before join: " << t3.joinable() << endl;
t3.join();
cout << "t3 joinable after join: " << t3.joinable() << endl;
// Default-constructed thread: not joinable
thread empty;
cout << "empty joinable: " << empty.joinable() << endl;
this_thread::sleep_for(chrono::milliseconds(100));
cout << "\nMain thread ending" << endl;
return 0;
}Output:
=== join() demo ===
Waiting for task 1...
Task 1 starting
Task 1 done
Task 1 complete, main continues
=== detach() demo ===
Task 2 detached — main doesn't wait
Task 2 starting
=== joinable() demo ===
Task 3 starting
Task 3 done
t3 joinable before join: 1
t3 joinable after join: 0
empty joinable: 0
Main thread endingStep-by-step explanation:
t1.join()blocks the main thread untilt1finishes. This is the standard pattern when you need the result or need to know the task is done before proceeding.t2.detach()releases ownership of the underlying OS thread. The thread continues running independently, managed by the OS. Afterdetach(), thestd::threadobjectt2no longer represents a running thread —t2.joinable()returnsfalse. You lose the ability to join it.- Detached threads are used for “fire and forget” background tasks — logging, monitoring, or tasks that should outlive the scope that created them. But beware: a detached thread must not access local variables or objects from the scope that created it — those may be destroyed while the thread is still running.
t3.joinable()returnstruewhen the thread represents a running or joinable OS thread, andfalseafterjoin(),detach(), or for default-constructed threads. Always checkjoinable()before callingjoin()ordetach()in cases where the state is uncertain.- If
tgoes out of scope whilet.joinable()istrue— neither joined nor detached — thestd::threaddestructor callsstd::terminate(). This is by design: forgetting to join is a programming error, and C++ makes it immediately visible rather than silently leaking threads.
Passing Arguments to Threads
Arguments passed to std::thread‘s constructor are copied into the thread’s internal storage before the thread starts. This is important: by the time the thread function runs, the original argument may have gone out of scope.
#include <iostream>
#include <thread>
#include <string>
using namespace std;
// Takes value by value — gets its own copy
void processValue(int value, string label) {
cout << label << ": processing " << value << endl;
value *= 2; // Modifies the thread's local copy — doesn't affect original
cout << label << ": done, local value = " << value << endl;
}
// Takes by reference — modifies the caller's variable
void increment(int& counter) {
counter++;
cout << "Thread incremented counter to: " << counter << endl;
}
// Takes a pointer
void processPointer(int* data, int size) {
for (int i = 0; i < size; i++) data[i] *= 2;
cout << "Pointer processing done" << endl;
}
int main() {
// --- Passing by value: arguments are copied ---
int x = 10;
thread t1(processValue, x, string("Task-A"));
// x is copied at thread creation — even if x changes, thread sees original value
x = 999; // This does NOT affect what t1 sees
t1.join();
cout << "x in main: " << x << endl; // Still 999
cout << endl;
// --- Passing by reference: use std::ref ---
int counter = 0;
// thread t2(increment, counter); // ERROR: can't implicitly pass reference
thread t2(increment, ref(counter)); // ref() wraps counter in a reference_wrapper
t2.join();
cout << "counter after thread: " << counter << endl; // 1 — modified by thread
cout << endl;
// --- Passing a pointer ---
int data[] = {1, 2, 3, 4, 5};
thread t3(processPointer, data, 5);
t3.join();
cout << "data after thread: ";
for (int v : data) cout << v << " ";
cout << endl;
return 0;
}Output:
Task-A: processing 10
Task-A: done, local value = 20
x in main: 999
Task-A: processing 10
counter after thread: 1
Thread incremented counter to: 1
Pointer processing done
data after thread: 2 4 6 8 10 Step-by-step explanation:
thread t1(processValue, x, string("Task-A"))copies bothx(anint) and thestringinto the thread’s internal argument storage at construction time. Whenxis later changed to999,t1‘s copy ofxis unaffected — it still sees10.- Attempting
thread t2(increment, counter)would fail to compile becausestd::thread‘s constructor uses perfect forwarding, and it cannot implicitly convert an lvalueintinto anint&— this would create a reference to a copy inside the thread, not tocounteritself. std::ref(counter)wrapscounterin astd::reference_wrapper<int>, which is copyable but carries the reference semantics. The thread function receives a true reference tocounterand modifies it. Afterjoin(),counteris1.- Raw pointers are passed as-is — they are trivially copyable. The thread receives a pointer to the same
dataarray. When the thread doubles each element, it modifies the array in the main thread’s memory. - Lifetime danger with references and pointers: If the variable being referenced or pointed to goes out of scope before the thread finishes, the thread holds a dangling reference or pointer. This is undefined behavior. Always ensure that the lifetime of referenced/pointed-to data exceeds the thread’s lifetime.
Launching Multiple Threads
Creating a pool of threads to work on parts of a problem in parallel is a fundamental concurrency pattern.
#include <iostream>
#include <thread>
#include <vector>
#include <numeric>
#include <chrono>
using namespace std;
// Compute sum of a subrange
void partialSum(const vector<int>& data, int start, int end, long long& result) {
result = 0;
for (int i = start; i < end; i++) result += data[i];
}
int main() {
// Build a large dataset
const int N = 10'000'000;
vector<int> data(N);
iota(data.begin(), data.end(), 1); // Fill with 1, 2, 3, ..., N
auto startTime = chrono::high_resolution_clock::now();
// Single-threaded sum for comparison
long long singleSum = 0;
for (int x : data) singleSum += x;
auto singleEnd = chrono::high_resolution_clock::now();
double singleMs = chrono::duration<double, milli>(singleEnd - startTime).count();
// Multi-threaded sum: divide work among threads
const int numThreads = 4;
vector<thread> threads(numThreads);
vector<long long> partials(numThreads, 0);
int chunkSize = N / numThreads;
auto multiStart = chrono::high_resolution_clock::now();
// Launch threads
for (int i = 0; i < numThreads; i++) {
int start = i * chunkSize;
int end = (i == numThreads - 1) ? N : start + chunkSize;
threads[i] = thread(partialSum,
cref(data), // const ref: read-only access
start, end,
ref(partials[i])); // Each thread writes to its own slot
}
// Wait for all threads to finish
for (thread& t : threads) t.join();
// Combine partial results
long long multiSum = 0;
for (long long p : partials) multiSum += p;
auto multiEnd = chrono::high_resolution_clock::now();
double multiMs = chrono::duration<double, milli>(multiEnd - multiStart).count();
cout << "Single-threaded sum: " << singleSum << " in " << singleMs << " ms" << endl;
cout << "Multi-threaded sum: " << multiSum << " in " << multiMs << " ms" << endl;
cout << "Results match: " << (singleSum == multiSum ? "YES" : "NO") << endl;
cout << "Speedup: " << singleMs / multiMs << "x" << endl;
return 0;
}Output (typical, actual times vary):
Single-threaded sum: 50000005000000 in 18.4 ms
Multi-threaded sum: 50000005000000 in 5.1 ms
Results match: YES
Speedup: 3.6xStep-by-step explanation:
- The work is divided into
numThreadschunks. Each thread processes a contiguous subrange and writes its partial sum into its own element ofpartials. Since each thread writes to a different element, there is no data race — no thread reads or writes the same memory as another. cref(data)passes aconstreference to the data vector. All threads read from the same vector simultaneously — this is safe because they are all reading, not writing.ref(partials[i])passes a reference to each thread’s dedicated result slot. After all threads complete (after thejoin()loop), the main thread sums the partial results.- The speedup is not 4x even with 4 threads due to thread creation overhead, memory bandwidth limits (all four threads compete for the same memory bus), and the serial final summation step. In practice, memory-bound operations see less-than-linear speedup; CPU-bound operations with less memory access can approach linear speedup.
- The
for (thread& t : threads) t.join()pattern is the standard idiom for waiting on a collection of threads. It iterates over all threads and joins each one in order. Even if earlier threads finish before later ones,join()will return immediately for already-finished threads.
Data Races: The Core Danger of Shared State
A data race occurs when two or more threads access the same memory location, at least one access is a write, and the accesses are not synchronized. Data races cause undefined behavior — the program may produce wrong results, crash, or behave inconsistently across runs.
#include <iostream>
#include <thread>
#include <vector>
using namespace std;
// Unsafe counter — data race!
int unsafeCounter = 0;
void incrementUnsafe(int times) {
for (int i = 0; i < times; i++) {
unsafeCounter++; // NOT ATOMIC: read-modify-write, multiple threads conflict
}
}
int main() {
const int numThreads = 10;
const int timesEach = 100000;
const int expected = numThreads * timesEach;
cout << "Expected final counter: " << expected << endl;
// Run the unsafe version multiple times to show non-determinism
for (int trial = 0; trial < 3; trial++) {
unsafeCounter = 0;
vector<thread> threads;
for (int i = 0; i < numThreads; i++) {
threads.emplace_back(incrementUnsafe, timesEach);
}
for (thread& t : threads) t.join();
cout << "Trial " << (trial+1) << " result: " << unsafeCounter
<< " (off by " << (expected - unsafeCounter) << ")" << endl;
}
return 0;
}Output (typical — results vary every run):
Expected final counter: 1000000
Trial 1 result: 724531 (off by 275469)
Trial 2 result: 811204 (off by 188796)
Trial 3 result: 763847 (off by 236153)Step-by-step explanation:
unsafeCounter++is not atomic — it compiles to three machine instructions: load the current value into a register, add 1 to the register, store the register back to memory.- When two threads execute these three steps concurrently, they can interleave: Thread A loads
100, Thread B also loads100(before A stores), Thread A stores101, Thread B stores101. Two increments happened but the counter only increased by 1. This is called a lost update. - The final value is consistently less than
1,000,000because updates are being lost. It varies between runs because the exact thread interleaving is non-deterministic. - This is a data race — undefined behavior. The C++ standard says “if a data race occurs, the behavior of the entire program is undefined.” In practice on x86 hardware you get lost updates; on other architectures or with different compiler optimizations, you might get crashes or even stranger results.
- The fix is synchronization:
std::atomic<int>for simple counters,std::mutexfor more complex operations, or redesigning the algorithm to avoid shared state entirely (as the partial-sum example above does). These are covered in the next articles in this series.
Returning Values from Threads
std::thread does not directly support return values. There are three common patterns for getting a result back from a thread:
#include <iostream>
#include <thread>
#include <future>
#include <vector>
using namespace std;
// Pattern 1: Output parameter via reference
void computeSquare_ref(int input, int& output) {
output = input * input;
}
// Pattern 2: std::promise / std::future (the C++ standard approach)
void computeSquare_promise(int input, promise<int> prom) {
prom.set_value(input * input);
}
// Pattern 3: std::async (simplest — returns a future directly)
int computeSquare_sync(int input) {
return input * input; // Regular function — async wraps it
}
int main() {
// --- Pattern 1: output reference ---
int result1 = 0;
thread t1(computeSquare_ref, 7, ref(result1));
t1.join();
cout << "Pattern 1 (ref): 7^2 = " << result1 << endl;
// --- Pattern 2: promise/future ---
promise<int> prom;
future<int> fut = prom.get_future();
thread t2(computeSquare_promise, 8, move(prom));
// prom is moved into the thread — we keep the future end
int result2 = fut.get(); // Blocks until value is set
t2.join();
cout << "Pattern 2 (promise): 8^2 = " << result2 << endl;
// --- Pattern 3: std::async (preferred for simple cases) ---
future<int> fut3 = async(launch::async, computeSquare_sync, 9);
// async spawns a thread and wraps everything; future.get() retrieves result
int result3 = fut3.get();
cout << "Pattern 3 (async): 9^2 = " << result3 << endl;
// Multiple async tasks running in parallel
vector<future<int>> futures;
for (int i = 1; i <= 5; i++) {
futures.push_back(async(launch::async, computeSquare_sync, i));
}
cout << "Squares 1-5: ";
for (auto& f : futures) cout << f.get() << " ";
cout << endl;
return 0;
}Output:
Pattern 1 (ref): 7^2 = 49
Pattern 2 (promise): 8^2 = 64
Pattern 3 (async): 9^2 = 81
Squares 1-5: 1 4 9 16 25 Step-by-step explanation:
- Pattern 1 (output reference): Pass a reference to a result variable using
std::ref. The thread writes its result there. Simple but low-level — you have no built-in exception propagation. - Pattern 2 (promise/future):
std::promise<T>is a channel for sending a value. The thread receives thepromise(moved in, since promises are non-copyable), computes the result, and callsset_value(). The main thread holds astd::future<T>connected to the same channel and callsfut.get()to retrieve the value — blocking if the value is not yet ready. If the thread throws an exception instead of setting a value,fut.get()rethrows it. - Pattern 3 (async):
std::async(launch::async, func, args...)is the highest-level option — it spawns a thread, runs the function, and returns afuturethat holds the result. Nothread,join, orpromisemanagement needed.future.get()blocks until the result is available and propagates any exceptions. For simple parallelism,std::asyncis the recommended approach. - The five parallel
asynctasks each run in their own thread (withlaunch::async). All five compute concurrently, andf.get()retrieves each result in order. The results are always1 4 9 16 25regardless of which thread finishes first, because we retrieve them in order.
Thread Local Storage
Sometimes you want each thread to have its own private copy of a variable — not shared between threads. The thread_local keyword provides this.
#include <iostream>
#include <thread>
using namespace std;
thread_local int threadCounter = 0; // Each thread has its own copy
void workerThread(int id) {
// Each thread modifies its own threadCounter independently
for (int i = 0; i < 5; i++) {
threadCounter++;
}
cout << "Thread " << id << " counter: " << threadCounter << endl;
}
// Thread-local logger: each thread has its own log buffer
thread_local string logBuffer;
void appendLog(const string& message) {
logBuffer += "[" + message + "]";
}
void workerWithLog(int id) {
appendLog("start");
appendLog("work" + to_string(id));
appendLog("end");
cout << "Thread " << id << " log: " << logBuffer << endl;
}
int main() {
cout << "=== thread_local counter ===" << endl;
thread t1(workerThread, 1);
thread t2(workerThread, 2);
thread t3(workerThread, 3);
t1.join(); t2.join(); t3.join();
// Main thread's threadCounter is still 0 — unaffected
cout << "Main thread counter: " << threadCounter << endl;
cout << "\n=== thread_local log buffer ===" << endl;
thread t4(workerWithLog, 1);
thread t5(workerWithLog, 2);
t4.join(); t5.join();
cout << "Main thread log: '" << logBuffer << "' (empty)" << endl;
return 0;
}Output:
=== thread_local counter ===
Thread 1 counter: 5
Thread 2 counter: 5
Thread 3 counter: 5
Main thread counter: 0
=== thread_local log buffer ===
Thread 1 log: [start][work1][end]
Thread 2 log: [start][work2][end]
Main thread log: '' (empty)Step-by-step explanation:
thread_local int threadCounter = 0declares a variable that exists once per thread. Each thread gets its own zero-initialized copy at the start. Incrementing in one thread does not affect others.- The main thread’s
threadCounterremains0even after the worker threads increment theirs to5. These are truly separate storage locations. thread_local string logBuffergives each thread its own log buffer.appendLogwrites to the calling thread’s buffer. The two worker threads each build their own independent log strings without any synchronization needed.- Thread-local storage is ideal for per-thread caches, per-thread random number generators, per-thread connection pools, and any data that should be private to a thread’s execution context.
C++20 std::jthread: RAII Thread Management
C++20 introduced std::jthread — a “joining thread” that automatically calls join() in its destructor, making it a proper RAII wrapper. It also supports cooperative cancellation via std::stop_token.
#include <iostream>
#include <thread>
#include <stop_token>
#include <chrono>
using namespace std;
// Regular work function
void countUp(int id, int count) {
for (int i = 1; i <= count; i++) {
cout << "Thread " << id << ": count = " << i << endl;
this_thread::sleep_for(chrono::milliseconds(100));
}
}
// Cancellable work function — checks stop_token
void cancellableWork(stop_token stopToken, int id) {
int step = 0;
while (!stopToken.stop_requested()) {
cout << "Thread " << id << ": step " << ++step << endl;
this_thread::sleep_for(chrono::milliseconds(150));
}
cout << "Thread " << id << ": cancellation requested, stopping." << endl;
}
int main() {
cout << "=== std::jthread auto-join ===" << endl;
{
jthread t1(countUp, 1, 3);
jthread t2(countUp, 2, 3);
cout << "Threads running..." << endl;
// Scope ends: t1 and t2 destructors call join() automatically
// No explicit join() needed — and no risk of std::terminate()
}
cout << "Both threads finished (auto-joined)" << endl;
cout << "\n=== std::jthread with stop_token ===" << endl;
{
jthread t3(cancellableWork, 3);
// Let it run for a bit
this_thread::sleep_for(chrono::milliseconds(500));
// Request cancellation — the thread will notice and exit cleanly
t3.request_stop();
// Destructor still joins automatically
}
cout << "Cancellable thread stopped and joined" << endl;
return 0;
}Output (timing-dependent):
=== std::jthread auto-join ===
Threads running...
Thread 1: count = 1
Thread 2: count = 1
Thread 1: count = 2
Thread 2: count = 2
Thread 1: count = 3
Thread 2: count = 3
Both threads finished (auto-joined)
=== std::jthread with stop_token ===
Thread 3: step 1
Thread 3: step 2
Thread 3: step 3
Thread 3: cancellation requested, stopping.
Cancellable thread stopped and joinedStep-by-step explanation:
jthread(joining thread) automatically callsjoin()in its destructor. This makes it a proper RAII type — you cannot forget to join ajthread. The scope-based lifetime management ensures threads are always cleaned up properly.t3.request_stop()signals the stop token associated witht3. The thread function receives astop_tokenas its first parameter (automatically passed byjthreadwhen the callable takes one). It checksstopToken.stop_requested()in its loop.- Cooperative cancellation is the safe, clean way to stop a thread. Forcibly terminating a thread (which C++ does not support directly for good reason) would leave resources in undefined states and skip destructors. With
stop_token, the thread can clean up properly before exiting. - When
t3‘s scope ends, the destructor callsrequest_stop()(if not already called) and thenjoin(). This ensures the thread always stops cleanly when its owningjthreadobject is destroyed. - Prefer
std::jthreadoverstd::threadin new C++20 code. It eliminates the most common threading bug (forgetting to join) and provides a standard cancellation mechanism.
Thread Lifecycle and Common Patterns
Here is a summary of the complete std::thread lifecycle and the patterns to follow:
#include <iostream>
#include <thread>
#include <vector>
#include <algorithm>
using namespace std;
void task(int id) {
cout << "Task " << id << " running on thread "
<< this_thread::get_id() << endl;
}
int main() {
// Pattern 1: Simple join
thread t1(task, 1);
t1.join();
// Pattern 2: Move thread into a container, join all
vector<thread> workers;
for (int i = 2; i <= 5; i++) {
workers.emplace_back(task, i); // emplace_back constructs in-place
}
for (auto& t : workers) t.join();
// Pattern 3: Conditional join (safe with joinable check)
thread t6;
bool shouldLaunch = true;
if (shouldLaunch) t6 = thread(task, 6);
if (t6.joinable()) t6.join();
// Pattern 4: Move semantics — transfer thread ownership
thread t7(task, 7);
thread t8 = move(t7); // t7 is now empty (not joinable)
cout << "t7 joinable: " << t7.joinable() << endl; // 0
cout << "t8 joinable: " << t8.joinable() << endl; // 1
t8.join();
// Pattern 5: Hardware concurrency hint
unsigned int cores = thread::hardware_concurrency();
cout << "Hardware concurrency: " << cores << " threads" << endl;
return 0;
}Output:
Task 1 running on thread 140...1
Task 2 running on thread 140...2
Task 3 running on thread 140...3
Task 4 running on thread 140...4
Task 5 running on thread 140...5
Task 6 running on thread 140...6
t7 joinable: 0
t8 joinable: 1
Task 7 running on thread 140...7
Task 8 running on thread 140...8
Hardware concurrency: 8 threadsStep-by-step explanation:
workers.emplace_back(task, i)constructs athreaddirectly inside the vector using the thread constructor arguments — no temporary thread is created and moved. This is more efficient thanpush_back(thread(task, i)).std::threadis movable but not copyable — you cannot copy a thread object. Move semantics allow threads to be transferred:thread t8 = move(t7)transfers ownership of the running OS thread fromt7tot8. After the move,t7is empty (not joinable) andt8owns the thread.thread::hardware_concurrency()returns a hint (not a guarantee) of how many threads the hardware supports concurrently — typically the number of logical CPU cores. Use this to decide how many threads to create for CPU-bound parallelism.
Common Mistakes and How to Avoid Them
Mistake 1: Forgetting to join or detach.
void bad() {
thread t(someWork);
// t goes out of scope without join or detach — std::terminate()!
}
void good() {
thread t(someWork);
t.join(); // Or: t.detach(), or use jthread
}Mistake 2: Accessing a local variable from a detached thread after the variable’s scope ends.
void bad() {
int localVar = 42;
thread t([&localVar]() {
this_thread::sleep_for(chrono::seconds(1));
cout << localVar; // DANGER: localVar may be destroyed
});
t.detach();
// bad() returns, localVar is destroyed — thread has dangling reference
}Mistake 3: Race condition on shared state.
int shared = 0;
// Two threads doing shared++ without synchronization = data race = UB
// Fix: use std::atomic<int> or std::mutexMistake 4: Joining from the wrong thread or joining twice.
thread t(work);
t.join(); // OK
t.join(); // ERROR: undefined behavior — already joined, not joinable
// Always check t.joinable() if the join state is uncertainMistake 5: Creating too many threads.
// Creating one thread per task is expensive for short tasks
// Prefer a thread pool or std::async for task-based parallelism
for (int i = 0; i < 100000; i++) {
thread t(tinyTask, i); // BAD: 100000 threads
t.detach();
}
// Better: use a thread pool or std::async with a reasonable limitstd::thread at a Glance
| Operation | Syntax | Notes |
|---|---|---|
| Create thread | thread t(func, args...) | Starts immediately |
| Join | t.join() | Blocks until thread finishes |
| Detach | t.detach() | Releases ownership — fire and forget |
| Check joinable | t.joinable() | True if join/detach not yet called |
| Move thread | thread t2 = move(t1) | Transfers ownership; t1 becomes empty |
| Thread ID | this_thread::get_id() | Returns current thread’s ID |
| Sleep | this_thread::sleep_for(duration) | Pause the calling thread |
| Yield | this_thread::yield() | Hint to OS to reschedule |
| Hardware hint | thread::hardware_concurrency() | Logical CPU core count |
| RAII thread (C++20) | jthread t(func, args...) | Auto-joins in destructor |
| Pass reference | ref(variable) | Required for lvalue references |
| Pass const ref | cref(variable) | Const reference |
| Return value | std::async / promise+future | thread itself has no return value |
Conclusion
std::thread brings portable, standardized multithreading to C++, replacing the era of platform-specific threading APIs with a clean, consistent interface. Creating a thread is as simple as constructing a std::thread with a callable and its arguments. Managing it correctly requires understanding the join/detach contract: every joinable thread must be joined or detached before its std::thread object is destroyed.
The most important concept to internalize is data races: concurrent access to shared mutable state without synchronization produces undefined behavior. The examples in this article deliberately avoided shared state (each thread worked on its own data) — in the next article, you will learn std::mutex and std::lock_guard for safely sharing data between threads.
C++20’s std::jthread improves on std::thread significantly: it auto-joins in its destructor (preventing the most common threading bug) and supports cooperative cancellation via stop_token. For new C++20 code, prefer jthread over thread.
Multithreading is a powerful tool that demands careful thinking about shared state, synchronization, and thread lifetimes. Once you master these fundamentals, you unlock the ability to write software that fully leverages modern multi-core hardware — processing more data, responding faster, and doing more work in less time.








