Understanding Eigenvalues and Eigenvectors in PCA

Understand eigenvalues and eigenvectors from scratch with geometric intuition and Python code. Learn how they power PCA, what they mean, and how to compute them — no linear algebra background…

Understanding Eigenvalues and Eigenvectors in PCA

An eigenvector of a matrix A is a special vector that only gets stretched (not rotated) when multiplied by A: Av = λv. The scalar λ is the eigenvalue — how much the vector gets stretched. In PCA, the eigenvectors of the data’s covariance matrix are the principal component directions (the axes along which the data varies most), and the eigenvalues are the amount of variance along each direction. The eigenvector with the largest eigenvalue points in the direction of maximum variance — PC1.

Introduction

“Eigenvalue” and “eigenvector” are German mathematical terms — “eigen” means “own” or “characteristic.” They describe the self-similar directions of a linear transformation: vectors that the transformation stretches or shrinks but never rotates. Every matrix transformation has its own set of these characteristic directions, and finding them reveals the fundamental geometry of the transformation.

For PCA, this geometric meaning translates directly into data science: the eigenvectors of the covariance matrix are the directions in which the data varies most strongly, and the eigenvalues quantify exactly how much variation exists in each direction. The first eigenvector (with the largest eigenvalue) is PC1 — the single axis that captures the most variance. The second eigenvector is PC2 — perpendicular to PC1 and capturing the most remaining variance. And so on for each subsequent component.

This article builds understanding of eigenvalues and eigenvectors from the ground up: geometric intuition through visual examples, the mathematical definition, how to compute them (power iteration, QR algorithm, and numpy), the connection to covariance matrices, and all the properties that make eigenvectors the natural coordinate system for PCA.

The Geometric Intuition: Vectors That Don’t Rotate

Most vectors change direction when multiplied by a matrix. Eigenvectors are the special ones that only change length — their direction is preserved.

Python
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
import warnings
warnings.filterwarnings('ignore')

np.random.seed(42)


def visualize_eigenvector_property(figsize=(16, 6)):
    """
    Show that eigenvectors only get scaled (not rotated) by the matrix,
    while ordinary vectors change direction.

    A = [[3, 1], [1, 2]] (symmetric 2x2 with known eigenvectors)
    """
    A = np.array([[3.0, 1.0],
                  [1.0, 2.0]])

    eigenvalues, eigenvectors = np.linalg.eigh(A)
    # eigh returns ascending; reverse
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues  = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]

    v1 = eigenvectors[:, 0]  # Eigenvector for largest eigenvalue
    v2 = eigenvectors[:, 1]  # Eigenvector for smaller eigenvalue
    lam1, lam2 = eigenvalues

    fig, axes = plt.subplots(1, 3, figsize=figsize)

    def draw_vector(ax, vec, color, label, scale=1.0, style='->', lw=2.5):
        ax.annotate('', xy=vec * scale, xytext=(0, 0),
                    arrowprops=dict(arrowstyle=style, color=color,
                                    lw=lw, mutation_scale=18))
        ax.text(vec[0] * scale * 1.15, vec[1] * scale * 1.15,
                label, color=color, fontsize=10, fontweight='bold',
                ha='center')

    # ── Panel 1: Before transformation ──────────────────────────
    ax = axes[0]
    # A regular vector (not an eigenvector)
    v_regular = np.array([1.0, 0.0])  # Standard basis vector

    draw_vector(ax, v1,        'coral',         f'v₁ (eigvec)', scale=1.5)
    draw_vector(ax, v2,        'steelblue',     f'v₂ (eigvec)', scale=1.5)
    draw_vector(ax, v_regular, 'goldenrod',     'e₁ (regular)', scale=1.5, lw=2)

    ax.set_xlim(-2.5, 2.5); ax.set_ylim(-2.5, 2.5)
    ax.axhline(0, color='black', lw=0.8, alpha=0.4)
    ax.axvline(0, color='black', lw=0.8, alpha=0.4)
    ax.set_title('Before: A × v\n(Before transformation)',
                 fontsize=10, fontweight='bold')
    ax.set_aspect('equal'); ax.grid(True, alpha=0.2)

    # ── Panel 2: After transformation ─────────────────────────────
    ax = axes[1]
    Av1      = A @ v1
    Av2      = A @ v2
    Av_reg   = A @ v_regular

    draw_vector(ax, Av1,    'coral',     f'A·v₁ = {lam1:.2f}·v₁', scale=0.6)
    draw_vector(ax, Av2,    'steelblue', f'A·v₂ = {lam2:.2f}·v₂', scale=0.6)
    draw_vector(ax, Av_reg, 'goldenrod', 'A·e₁ (rotated!)',        scale=0.6)
    draw_vector(ax, v_regular, 'goldenrod', 'e₁ (original)', scale=1.0,
                style='->', lw=1.2)  # Show original for comparison

    ax.set_xlim(-2.5, 2.5); ax.set_ylim(-2.5, 2.5)
    ax.axhline(0, color='black', lw=0.8, alpha=0.4)
    ax.axvline(0, color='black', lw=0.8, alpha=0.4)
    ax.set_title('After: A applied to each vector\n'
                 'Eigenvectors only stretched, NOT rotated',
                 fontsize=10, fontweight='bold')
    ax.set_aspect('equal'); ax.grid(True, alpha=0.2)

    # ── Panel 3: Visual verification Av = λv ─────────────────────
    ax = axes[2]
    ax.text(0.5, 0.88,
            f'Matrix A = [[3, 1], [1, 2]]\n\n'
            f'Eigenvalue λ₁ = {lam1:.4f}\n'
            f'Eigenvector v₁ = [{v1[0]:.4f}, {v1[1]:.4f}]\n\n'
            f'Verify: A·v₁ = [{Av1[0]:.4f}, {Av1[1]:.4f}]\n'
            f'        λ₁·v₁ = [{lam1*v1[0]:.4f}, {lam1*v1[1]:.4f}]\n'
            f'        Equal? {np.allclose(Av1, lam1*v1)}\n\n'
            f'Eigenvalue λ₂ = {lam2:.4f}\n'
            f'Eigenvector v₂ = [{v2[0]:.4f}, {v2[1]:.4f}]\n\n'
            f'Verify: A·v₂ = [{Av2[0]:.4f}, {Av2[1]:.4f}]\n'
            f'        λ₂·v₂ = [{lam2*v2[0]:.4f}, {lam2*v2[1]:.4f}]\n'
            f'        Equal? {np.allclose(Av2, lam2*v2)}\n\n'
            f'v₁ · v₂ = {v1 @ v2:.6f} (orthogonal ✓)',
            transform=ax.transAxes, fontsize=9,
            verticalalignment='top', ha='center',
            fontfamily='monospace',
            bbox=dict(boxstyle='round', fc='lightyellow',
                      ec='steelblue', alpha=0.9))
    ax.set_title('Algebraic Verification\n'
                 'A·v = λ·v (definition of eigenvector)',
                 fontsize=10, fontweight='bold')
    ax.axis('off')

    plt.suptitle('Eigenvectors: The Directions a Matrix Does NOT Rotate',
                 fontsize=13, fontweight='bold', y=1.02)
    plt.tight_layout()
    plt.savefig('eigenvector_geometric.png', dpi=150, bbox_inches='tight')
    plt.show()
    print("Saved: eigenvector_geometric.png")


visualize_eigenvector_property()

The Characteristic Equation: Finding Eigenvalues

To find eigenvalues, solve the characteristic equation:

det(AλI)=0\det(A – \lambda I) = 0

For a 2×2 matrix, this is a quadratic equation. For larger matrices, it is a polynomial of degree n — which is why numerical methods are needed for n > 4 or so.

Python
import numpy as np
import sympy as sp


def solve_characteristic_equation_symbolic():
    """
    Use SymPy to derive eigenvalues analytically for a 2×2 matrix.
    Shows the full algebraic derivation step-by-step.
    """
    # Define symbolic matrix
    lam = sp.Symbol('lambda')

    A_sym = sp.Matrix([[3, 1],
                        [1, 2]])

    print("=== Characteristic Equation: Step-by-Step ===\n")
    print(f"  Matrix A = {A_sym.tolist()}\n")

    # A - λI
    I      = sp.eye(2)
    A_lam  = A_sym - lam * I
    print(f"  A - λI =")
    sp.pprint(A_lam)
    print()

    # Determinant
    det_expr = A_lam.det()
    det_expr_expanded = sp.expand(det_expr)
    print(f"  det(A - λI) = {det_expr}")
    print(f"  Expanded:    {det_expr_expanded}")
    print()

    # Solve for eigenvalues
    eigenvalues_sym = sp.solve(det_expr_expanded, lam)
    print(f"  Solving det(A - λI) = 0:")
    print(f"  λ = {eigenvalues_sym}\n")

    # For each eigenvalue, find eigenvector
    for lv in sorted(eigenvalues_sym, reverse=True):
        print(f"  For λ = {lv}:")
        A_minus_lI = A_sym - lv * sp.eye(2)
        print(f"    (A - λI) =")
        sp.pprint(A_minus_lI)
        # Row reduce to find null space
        null_sp = A_minus_lI.nullspace()
        if null_sp:
            v = null_sp[0]
            v_normalized = v / v.norm()
            print(f"    Eigenvector (normalized) = {[float(x) for x in v_normalized]}")
        print()


def solve_by_hand_3x3():
    """
    Show the characteristic polynomial for a 3×3 symmetric matrix
    and its eigenvalues computed numerically.
    """
    A = np.array([[4.0, 2.0, 1.0],
                  [2.0, 3.0, 1.0],
                  [1.0, 1.0, 2.0]])

    print("=== 3×3 Symmetric Matrix ===\n")
    print(f"  A = \n{A}\n")

    eigenvalues, eigenvectors = np.linalg.eigh(A)
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues  = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]

    print(f"  Eigenvalues (descending): {eigenvalues.round(4)}")
    print(f"  Sum = trace(A): {eigenvalues.sum():.4f} == {A.trace():.4f} ✓")
    print(f"  Product = det(A): {eigenvalues.prod():.4f} == {np.linalg.det(A):.4f}\n")

    print(f"  Eigenvectors (columns):")
    for i, (ev, lam) in enumerate(zip(eigenvectors.T, eigenvalues)):
        print(f"    v{i+1} (λ={lam:.4f}): [{', '.join(f'{x:.4f}' for x in ev)}]")
        # Verify orthogonality
        if i > 0:
            dot = eigenvectors[:, 0] @ ev
            print(f"      v1 · v{i+1} = {dot:.8f} (orthogonal ✓)")

    print(f"\n  Key properties verified:")
    print(f"    1. Av = λv: max error = "
          f"{max(np.linalg.norm(A @ eigenvectors[:,i] - eigenvalues[i]*eigenvectors[:,i]) for i in range(3)):.2e}")
    print(f"    2. All eigenvectors orthogonal: "
          f"||VᵀV - I|| = {np.linalg.norm(eigenvectors.T @ eigenvectors - np.eye(3)):.2e}")
    print(f"    3. Trace = sum(eigenvalues): "
          f"{abs(A.trace() - eigenvalues.sum()):.2e}")
    print(f"    4. Det = product(eigenvalues): "
          f"{abs(np.linalg.det(A) - eigenvalues.prod()):.2e}")


solve_characteristic_equation_symbolic()
solve_by_hand_3x3()

Key Properties of Eigenvalues and Eigenvectors

Understanding these properties deepens the geometric and algebraic intuition, and they appear throughout PCA derivations.

Python
import numpy as np


def demonstrate_eigenvalue_properties():
    """
    Demonstrate five fundamental eigenvalue/eigenvector properties
    with numerical verification.
    """
    np.random.seed(42)

    # Use a symmetric positive semi-definite matrix (like a covariance matrix)
    X = np.random.randn(50, 4)
    A = (X.T @ X) / (50 - 1)  # Sample covariance matrix

    eigenvalues, eigenvectors = np.linalg.eigh(A)
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues  = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    V = eigenvectors  # Columns are eigenvectors

    print("=== Five Key Eigenvalue Properties ===\n")
    print(f"  A = random 4×4 covariance matrix\n")

    # Property 1: Trace = sum of eigenvalues
    print(f"  1. Trace = sum of eigenvalues")
    print(f"     trace(A)        = {A.trace():.8f}")
    print(f"     sum(eigenvalues) = {eigenvalues.sum():.8f}")
    print(f"     Equal? {np.allclose(A.trace(), eigenvalues.sum())}\n")

    # Property 2: Determinant = product of eigenvalues
    print(f"  2. Det = product of eigenvalues")
    print(f"     det(A)            = {np.linalg.det(A):.8f}")
    print(f"     prod(eigenvalues) = {eigenvalues.prod():.8f}")
    print(f"     Equal? {np.allclose(np.linalg.det(A), eigenvalues.prod(), rtol=1e-5)}\n")

    # Property 3: Eigenvectors are orthogonal (for symmetric A)
    print(f"  3. Eigenvectors are orthogonal (symmetric matrix → orthonormal set)")
    VtV = V.T @ V
    print(f"     ||VᵀV - I|| = {np.linalg.norm(VtV - np.eye(4)):.2e}")
    print(f"     VᵀV ≈ I? {np.allclose(VtV, np.eye(4))}\n")

    # Property 4: Eigendecomposition A = V Λ V^T
    print(f"  4. Eigendecomposition: A = V·diag(λ)·Vᵀ")
    A_reconstructed = V @ np.diag(eigenvalues) @ V.T
    print(f"     ||A - VΛVᵀ|| = {np.linalg.norm(A - A_reconstructed):.2e}")
    print(f"     Exact reconstruction? {np.allclose(A, A_reconstructed)}\n")

    # Property 5: All eigenvalues ≥ 0 for positive semi-definite matrices
    print(f"  5. Positive semi-definite matrix → non-negative eigenvalues")
    print(f"     Eigenvalues: {eigenvalues.round(4)}")
    print(f"     All ≥ 0? {all(eigenvalues >= -1e-10)}")
    print(f"     (Covariance matrices are always PSD — variance cannot be negative)\n")

    # Bonus: Rank = number of nonzero eigenvalues
    rank = np.linalg.matrix_rank(A)
    n_nonzero_eig = (eigenvalues > 1e-10).sum()
    print(f"  Bonus: rank(A) = {rank}, nonzero eigenvalues = {n_nonzero_eig}")


demonstrate_eigenvalue_properties()

Computing Eigenvalues: Power Iteration and QR Algorithm

Understanding the algorithms that compute eigenvalues reveals why they work and what can go wrong.

Python
import numpy as np


def power_iteration(A, n_iterations=100, random_state=42):
    """
    Power iteration: finds the DOMINANT eigenvector (largest eigenvalue).

    Algorithm:
    1. Start with a random vector v
    2. Repeatedly multiply: v ← A·v / ||A·v||
    3. The vector converges to the dominant eigenvector
    4. λ ≈ vᵀ·A·v (Rayleigh quotient) after convergence

    Convergence rate: proportional to (λ₂/λ₁)ⁿ
    Faster when the dominant eigenvalue is much larger than the second.
    """
    np.random.seed(random_state)
    n = A.shape[0]
    v = np.random.randn(n)
    v = v / np.linalg.norm(v)

    true_eigenvalues, true_eigenvectors = np.linalg.eigh(A)
    idx = np.argsort(np.abs(true_eigenvalues))[::-1]
    lam_true = true_eigenvalues[idx[0]]
    v_true   = true_eigenvectors[:, idx[0]]

    history = []
    for i in range(n_iterations):
        v_new   = A @ v
        lam_est = v @ v_new                # Rayleigh quotient
        v       = v_new / np.linalg.norm(v_new)

        # Track error (account for sign ambiguity)
        angle_error = min(
            np.linalg.norm(v - v_true),
            np.linalg.norm(v + v_true)
        )
        lam_error = abs(lam_est - lam_true)
        history.append((i + 1, lam_est, angle_error, lam_error))

    return v, lam_est, history


# Demonstrate power iteration on a 4×4 symmetric matrix
np.random.seed(42)
X_power = np.random.randn(100, 4)
A_power = (X_power.T @ X_power) / 99

print("=== Power Iteration: Finding the Dominant Eigenvector ===\n")
v_dom, lam_dom, history = power_iteration(A_power, n_iterations=30)

true_eig, true_vec = np.linalg.eigh(A_power)
lam_true = true_eig.max()

print(f"  A = 4×4 covariance matrix")
print(f"  True dominant eigenvalue: {lam_true:.8f}\n")
print(f"  {'Iter':>5} | {'λ estimate':>14} | {'λ error':>12} | {'Vec angle err':>14}")
print(f"  {'-'*52}")
for it, lam_est, angle_err, lam_err in history[:10]:
    print(f"  {it:>5} | {lam_est:>14.8f} | {lam_err:>12.2e} | {angle_err:>14.2e}")
print(f"  ...")
for it, lam_est, angle_err, lam_err in history[-3:]:
    print(f"  {it:>5} | {lam_est:>14.8f} | {lam_err:>12.2e} | {angle_err:>14.2e}")

print(f"\n  Converged eigenvalue: {lam_dom:.8f}")
print(f"  True eigenvalue:      {lam_true:.8f}")
print(f"  Error:                {abs(lam_dom - lam_true):.2e}")


def deflation_all_components(A, n_components=None):
    """
    Deflation: extract all eigenvectors by iteratively removing
    the dominant component after each power iteration.

    After finding dominant eigenvector v₁ with eigenvalue λ₁:
    A' = A - λ₁ · v₁ · v₁ᵀ   (deflated matrix — PC1 removed)
    Repeat on A' to get λ₂, v₂, etc.

    This is how early iterative PCA algorithms worked.
    Modern algorithms use QR decomposition for better stability.
    """
    n = A.shape[0]
    if n_components is None:
        n_components = n

    eigenvalues  = []
    eigenvectors = []
    A_def        = A.copy()

    for k in range(n_components):
        v_k, lam_k, _ = power_iteration(A_def, n_iterations=200,
                                          random_state=k)
        eigenvalues.append(lam_k)
        eigenvectors.append(v_k)
        # Deflate
        A_def = A_def - lam_k * np.outer(v_k, v_k)

    return np.array(eigenvalues), np.column_stack(eigenvectors)


print("\n=== Deflation: All Eigenvectors via Power Iteration ===\n")
eig_vals_defl, eig_vecs_defl = deflation_all_components(A_power)
eig_vals_true, eig_vecs_true = np.linalg.eigh(A_power)
idx = np.argsort(eig_vals_true)[::-1]
eig_vals_true  = eig_vals_true[idx]

print(f"  {'Component':>10} | {'Deflation λ':>14} | {'True λ':>12} | {'Error':>10}")
print(f"  {'-'*52}")
for i, (lv_d, lv_t) in enumerate(zip(eig_vals_defl, eig_vals_true)):
    print(f"  {i+1:>10} | {lv_d:>14.6f} | {lv_t:>12.6f} | "
          f"{abs(lv_d-lv_t):>10.2e}")

The Covariance Matrix: Bridge Between Data and PCA

The covariance matrix is the central object connecting raw data to PCA. Its eigendecomposition directly produces the principal components.

Python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler


def covariance_to_pca():
    """
    Complete walkthrough: raw data → covariance matrix → eigendecomposition → PCA.
    Shows every intermediate step and validates against sklearn.
    """
    iris   = load_iris()
    X      = iris.data
    X_s    = StandardScaler().fit_transform(X)
    n, d   = X_s.shape

    print("=== From Covariance Matrix to PCA ===\n")
    print(f"  Data: {n} samples × {d} features (Iris, standardized)\n")

    # Step 1: Covariance matrix
    C = np.cov(X_s.T)
    print(f"  Covariance matrix C ({d}×{d}):")
    print(f"  (All diagonal = 1.0 because data is standardized = correlation matrix)\n")
    print(f"  {'':>22}", end='')
    for name in iris.feature_names:
        print(f"  {name[:8]:>10}", end='')
    print()
    for i, row_name in enumerate(iris.feature_names):
        print(f"  {row_name[:22]:>22}", end='')
        for j in range(d):
            print(f"  {C[i,j]:>10.4f}", end='')
        print()
    print()

    # Step 2: Eigendecomposition
    eigenvalues, eigenvectors = np.linalg.eigh(C)
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues  = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]

    total_var    = eigenvalues.sum()
    var_explained = eigenvalues / total_var * 100
    cum_var       = np.cumsum(var_explained)

    print(f"  Eigendecomposition results:")
    print(f"  {'PC':>4} | {'Eigenvalue':>12} | {'Var%':>8} | {'Cum%':>8} | "
          f"{'PC Direction (loadings)'}")
    print(f"  {'-'*85}")
    for i in range(d):
        loadings = ', '.join(f'{x:+.4f}' for x in eigenvectors[:, i])
        print(f"  {i+1:>4} | {eigenvalues[i]:>12.6f} | {var_explained[i]:>8.2f} | "
              f"{cum_var[i]:>8.2f} | [{loadings}]")

    # Step 3: Project data
    X_pca = X_s @ eigenvectors
    print(f"\n  Projected data shape: {X_pca.shape}")
    print(f"  PC1-PC2 variance explained: {cum_var[1]:.2f}%")

    # Verify: columns of X_pca should have variances ≈ eigenvalues
    print(f"\n  Variance of projected data (should equal eigenvalues):")
    for i in range(d):
        actual_var = X_pca[:, i].var(ddof=1)
        print(f"    PC{i+1}: actual={actual_var:.6f}, eigenvalue={eigenvalues[i]:.6f}, "
              f"match={np.isclose(actual_var, eigenvalues[i])}")

    # Verify: projected components are uncorrelated
    corr_matrix = np.corrcoef(X_pca.T)
    off_diag = corr_matrix[np.triu_indices(d, k=1)]
    print(f"\n  Correlation between components (should all be ~0):")
    print(f"    Max |off-diagonal correlation|: "
          f"{np.abs(off_diag).max():.2e} ≈ 0 ✓")

    return eigenvalues, eigenvectors, X_pca


eigenvalues, eigenvectors, X_pca = covariance_to_pca()

Eigenvalues in PCA: Visualizing What They Mean

The eigenvalue of a principal component tells you the variance of the data along that component’s direction. Visualizing this concretely connects the abstract math to data intuition.

Python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler


def visualize_eigenvalue_meaning(figsize=(16, 10)):
    """
    Visual demonstration of what eigenvalues mean in PCA:
    1. Show data projected onto each PC
    2. Each projection has variance = corresponding eigenvalue
    3. Total variance is the sum of all eigenvalues (= trace of cov matrix)
    """
    wine  = load_wine()
    X_s   = StandardScaler().fit_transform(wine.data)
    n, d  = X_s.shape

    C = np.cov(X_s.T)
    eigenvalues, eigenvectors = np.linalg.eigh(C)
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues  = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]
    X_pca = X_s @ eigenvectors

    var_pct = eigenvalues / eigenvalues.sum() * 100

    fig = plt.figure(figsize=figsize)
    gs  = plt.GridSpec(2, 2, figure=fig, hspace=0.4, wspace=0.35)

    # ── Panel 1: Scree plot with eigenvalues ─────────────────────
    ax1 = fig.add_subplot(gs[0, 0])
    colors_bar = plt.cm.Blues(np.linspace(0.4, 0.9, d))[::-1]
    bars = ax1.bar(range(1, d+1), eigenvalues, color=colors_bar,
                    edgecolor='white', linewidth=0.5)
    for bar, ev, pct in zip(bars, eigenvalues, var_pct):
        if pct > 3:
            ax1.text(bar.get_x() + bar.get_width()/2,
                     bar.get_height() + 0.02,
                     f'{pct:.1f}%', ha='center', fontsize=7, color='black')
    ax1.set_title('Eigenvalues = Variance per Component\n'
                   '(Height = how much variance the PC captures)',
                   fontsize=10, fontweight='bold')
    ax1.set_xlabel('Principal Component', fontsize=10)
    ax1.set_ylabel('Eigenvalue (= variance)', fontsize=10)
    ax1.set_xticks(range(1, d+1))
    ax1.grid(True, alpha=0.3, axis='y')

    # ── Panel 2: Distribution of each PC (width = eigenvalue) ────
    ax2 = fig.add_subplot(gs[0, 1])
    for i in range(min(5, d)):
        ax2.hist(X_pca[:, i], bins=20, alpha=0.6,
                  label=f'PC{i+1} (var={eigenvalues[i]:.2f})',
                  density=True)
    ax2.set_title('Data Distribution Along Each PC\n'
                   '(Wider spread = larger eigenvalue = more variance)',
                   fontsize=10, fontweight='bold')
    ax2.set_xlabel('Projection value', fontsize=10)
    ax2.set_ylabel('Density', fontsize=10)
    ax2.legend(fontsize=7)
    ax2.grid(True, alpha=0.3)

    # ── Panel 3: 2D scatter with ellipse ─────────────────────────
    ax3 = fig.add_subplot(gs[1, 0])
    colors_wine = ['coral', 'steelblue', 'mediumseagreen']
    for cls, color in enumerate(colors_wine):
        mask = wine.target == cls
        ax3.scatter(X_pca[mask, 0], X_pca[mask, 1], c=color,
                     s=30, edgecolors='white', linewidth=0.3, alpha=0.8,
                     label=f'Class {cls}')

    # Draw ellipse showing ±1 std (= ±√eigenvalue) along each PC
    theta = np.linspace(0, 2*np.pi, 100)
    std1, std2 = np.sqrt(eigenvalues[0]), np.sqrt(eigenvalues[1])
    ax3.plot(std1 * np.cos(theta), std2 * np.sin(theta),
              'k--', lw=2, alpha=0.5, label='±1σ ellipse')
    ax3.axhline(0, color='black', lw=0.8, alpha=0.4)
    ax3.axvline(0, color='black', lw=0.8, alpha=0.4)
    ax3.set_title(f'PC1 vs PC2 Scatter\n'
                   f'σ₁={std1:.2f}, σ₂={std2:.2f} (≈ spread in each direction)',
                   fontsize=10, fontweight='bold')
    ax3.set_xlabel(f'PC1 ({var_pct[0]:.1f}% variance)', fontsize=10)
    ax3.set_ylabel(f'PC2 ({var_pct[1]:.1f}% variance)', fontsize=10)
    ax3.legend(fontsize=8); ax3.grid(True, alpha=0.2)

    # ── Panel 4: Cumulative variance ─────────────────────────────
    ax4 = fig.add_subplot(gs[1, 1])
    cum_var = np.cumsum(var_pct)
    ax4.fill_between(range(1, d+1), cum_var, alpha=0.2, color='steelblue')
    ax4.plot(range(1, d+1), cum_var, 'o-', color='steelblue',
              lw=2.5, markersize=8)
    for threshold, color in [(80, 'goldenrod'), (95, 'coral'), (99, 'steelblue')]:
        k = np.searchsorted(cum_var, threshold) + 1
        ax4.axhline(y=threshold, color=color, linestyle='--', lw=1.5, alpha=0.8,
                     label=f'{threshold}% at k={k}')
        ax4.axvline(x=k, color=color, linestyle=':', lw=1, alpha=0.6)
    ax4.set_xlabel('Number of Components Kept', fontsize=10)
    ax4.set_ylabel('Cumulative Variance (%)', fontsize=10)
    ax4.set_title('Cumulative Variance vs k\n'
                   'How much information is retained?',
                   fontsize=10, fontweight='bold')
    ax4.legend(fontsize=8); ax4.grid(True, alpha=0.3)

    plt.suptitle('Eigenvalues in PCA: Measuring Variance Along Each Direction\n'
                 'Wine Dataset (13 features)',
                 fontsize=13, fontweight='bold')
    plt.savefig('eigenvalue_meaning_pca.png', dpi=150, bbox_inches='tight')
    plt.show()
    print("Saved: eigenvalue_meaning_pca.png")


visualize_eigenvalue_meaning()

Eigendecomposition vs SVD in Practice

Python
import numpy as np
import time
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_digits


def compare_eigendecomposition_svd():
    """
    Compare eigendecomposition and SVD numerically:
    - Accuracy on a realistic dataset
    - Runtime comparison
    - Numerical stability with near-singular matrices
    """
    digits = load_digits()
    X_s    = StandardScaler().fit_transform(digits.data)
    n, d   = X_s.shape

    print("=== Eigendecomposition vs SVD: Practical Comparison ===\n")

    # Method 1: Eigendecomposition of covariance matrix
    t0 = time.perf_counter()
    C  = (X_s.T @ X_s) / (n - 1)
    eig_vals, eig_vecs = np.linalg.eigh(C)
    idx = np.argsort(eig_vals)[::-1]
    eig_vals = eig_vals[idx]; eig_vecs = eig_vecs[:, idx]
    t_eig = time.perf_counter() - t0

    # Method 2: SVD of data matrix
    t0 = time.perf_counter()
    U, s, Vt = np.linalg.svd(X_s, full_matrices=False)
    svd_vals = s**2 / (n - 1)
    svd_vecs = Vt.T  # Principal component directions
    t_svd = time.perf_counter() - t0

    print(f"  {'Method':<25} | {'Time (s)':>10} | Notes")
    print(f"  {'-'*55}")
    print(f"  {'Eigendecomp (eigh)':<25} | {t_eig:>10.6f} | Builds n×n cov matrix")
    print(f"  {'SVD (full)':<25} | {t_svd:>10.6f} | Works on n×d data matrix")

    # Accuracy comparison
    top_k = 5
    max_diff_vals = np.max(np.abs(eig_vals[:top_k] - svd_vals[:top_k]))

    print(f"\n  Top {top_k} eigenvalues comparison:")
    print(f"  {'PC':>4} | {'Eigh':>12} | {'SVD':>12} | {'Diff':>10}")
    print(f"  {'-'*42}")
    for i in range(top_k):
        print(f"  {i+1:>4} | {eig_vals[i]:>12.6f} | {svd_vals[i]:>12.6f} | "
              f"{abs(eig_vals[i]-svd_vals[i]):>10.2e}")

    print(f"\n  Max difference: {max_diff_vals:.2e} (numerical precision)")

    # Stability test: near-singular matrix
    print(f"\n  Stability test: near-singular covariance matrix")
    np.random.seed(42)
    X_sing = np.random.randn(50, 100)  # n < d: rank-deficient
    X_sing_s = StandardScaler().fit_transform(X_sing)

    C_sing = (X_sing_s.T @ X_sing_s) / 49

    # Check for negative eigenvalues (instability indicator)
    eig_sing, _ = np.linalg.eigh(C_sing)
    min_eig = eig_sing.min()
    n_negative = (eig_sing < -1e-10).sum()

    print(f"  Data shape: {X_sing_s.shape} (rank-deficient: n < d)")
    print(f"  Min eigenvalue (eigh): {min_eig:.6e}")
    print(f"  Negative eigenvalues: {n_negative}")
    print(f"  Note: SVD handles this naturally; eigh may show tiny negatives")


compare_eigendecomposition_svd()

Connecting Everything: Eigenvalues, PCA, and Data Science

Python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline


def eigenvalue_guided_pca():
    """
    Use eigenvalue analysis to make an informed k choice,
    then measure the impact on a downstream classifier.

    The 'eigenvalue > 1' Kaiser criterion: keep components
    whose eigenvalue > 1 (they explain more variance than
    a single original standardized feature).
    """
    cancer = load_breast_cancer()
    X_s    = StandardScaler().fit_transform(cancer.data)
    n, d   = X_s.shape

    C = np.cov(X_s.T)
    eigenvalues, _ = np.linalg.eigh(C)
    eigenvalues    = eigenvalues[::-1]  # Descending

    # Kaiser criterion: eigenvalue > 1
    k_kaiser = (eigenvalues > 1).sum()
    # 95% variance
    cum_var = np.cumsum(eigenvalues) / eigenvalues.sum()
    k_95    = np.searchsorted(cum_var, 0.95) + 1

    print("=== Eigenvalue-Guided k Selection ===\n")
    print(f"  Breast Cancer dataset: {d} features\n")
    print(f"  Kaiser criterion (λ > 1):     k = {k_kaiser}")
    print(f"  95% variance threshold:       k = {k_95}")
    print(f"  Total features:               d = {d}\n")

    # Evaluate logistic regression at different k values
    test_ks = sorted(set([1, 2, k_kaiser, k_95, 10, 20, d]))
    cv = 5

    print(f"  Logistic Regression: accuracy at different k values")
    print(f"  {'k':>5} | {'Var%':>8} | {'CV Acc':>8} | Notes")
    print(f"  {'-'*45}")

    for k in test_ks:
        if k > d:
            continue
        pca_k = PCA(n_components=k)
        X_pca = pca_k.fit_transform(X_s)
        var_pct = pca_k.explained_variance_ratio_.sum() * 100

        pipe = Pipeline([
            ('scaler', StandardScaler()),
            ('pca',    PCA(n_components=k)),
            ('clf',    LogisticRegression(max_iter=1000))
        ])
        acc = cross_val_score(pipe, cancer.data, cancer.target, cv=cv).mean()

        flag = ""
        if k == k_kaiser:
            flag = " ← Kaiser"
        elif k == k_95:
            flag = " ← 95% var"
        elif k == d:
            flag = " ← All features"
        print(f"  {k:>5} | {var_pct:>8.1f} | {acc:>8.4f} |{flag}")

    print(f"\n  Insight: Often k_Kaiser or k_95 gives near-optimal accuracy")
    print(f"  with much fewer features than the full set.")


eigenvalue_guided_pca()

Visualizing Linear Transformations and Their Eigenvectors

A matrix multiplication is a linear transformation — it maps every vector in the plane to a new position. Visualizing what a transformation does to a grid of points makes the eigenvector concept concrete: eigenvectors are the arrows that point along the “seams” of the transformation.

Python
import numpy as np
import matplotlib.pyplot as plt


def visualize_transformation_eigenvectors(figsize=(16, 7)):
    """
    Show three different 2×2 matrix transformations and their eigenvectors.
    Each transformation deforms the unit grid differently;
    eigenvectors point along the invariant directions.
    """
    matrices = {
        'Scaling\n[[3,0],[0,1]]':
            np.array([[3.0, 0.0], [0.0, 1.0]]),
        'Shear+Scale\n[[2,1],[0,1]]':
            np.array([[2.0, 1.0], [0.0, 1.0]]),
        'Covariance\n[[2,1.5],[1.5,2]]':
            np.array([[2.0, 1.5], [1.5, 2.0]]),
    }

    fig, axes = plt.subplots(1, 3, figsize=figsize)

    for ax, (name, A) in zip(axes, matrices.items()):
        # Grid before transformation
        grid_range = np.linspace(-2, 2, 9)
        X_grid, Y_grid = np.meshgrid(grid_range, grid_range)
        pts = np.column_stack([X_grid.ravel(), Y_grid.ravel()])

        # Apply transformation
        pts_transformed = pts @ A.T

        # Draw original grid (faint)
        for x in grid_range:
            line_x = np.array([[x, y] for y in np.linspace(-2, 2, 30)])
            line_t  = line_x @ A.T
            ax.plot(line_t[:, 0], line_t[:, 1], 'lightblue', lw=0.8, alpha=0.5)
        for y in grid_range:
            line_y = np.array([[x, y] for x in np.linspace(-2, 2, 30)])
            line_t  = line_y @ A.T
            ax.plot(line_t[:, 0], line_t[:, 1], 'lightblue', lw=0.8, alpha=0.5)

        # Draw eigenvectors
        try:
            eigenvalues, eigenvectors = np.linalg.eig(A)
            # Only use real eigenvectors
            real_idx = np.where(np.abs(eigenvalues.imag) < 1e-10)[0]
            eigenvalues  = eigenvalues[real_idx].real
            eigenvectors = eigenvectors[:, real_idx].real

            eigvec_colors = ['coral', 'mediumseagreen']
            for i, (ev, lam, color) in enumerate(
                    zip(eigenvectors.T, eigenvalues, eigvec_colors)):
                ev_norm = ev / np.linalg.norm(ev)
                scale   = abs(lam) * 1.5
                # Original direction (dashed)
                ax.annotate('', xy=ev_norm * 2, xytext=-ev_norm * 2,
                            arrowprops=dict(arrowstyle='->', color=color,
                                            lw=1.5, linestyle='dashed',
                                            mutation_scale=12, alpha=0.5))
                # Transformed direction
                ev_t = A @ ev_norm
                ax.annotate('', xy=ev_t / np.linalg.norm(ev_t) * scale,
                            xytext=(0, 0),
                            arrowprops=dict(arrowstyle='->', color=color,
                                            lw=2.5, mutation_scale=15))
                ax.text(ev_t[0]/np.linalg.norm(ev_t)*scale*1.15,
                        ev_t[1]/np.linalg.norm(ev_t)*scale*1.15,
                        f'v{i+1}\nλ={lam:.2f}',
                        color=color, fontsize=8, fontweight='bold', ha='center')
        except np.linalg.LinAlgError:
            pass

        ax.set_xlim(-5, 5); ax.set_ylim(-5, 5)
        ax.axhline(0, color='black', lw=0.8, alpha=0.4)
        ax.axvline(0, color='black', lw=0.8, alpha=0.4)
        ax.set_aspect('equal')
        ax.set_title(name, fontsize=10, fontweight='bold')
        ax.set_xlabel('x', fontsize=9); ax.set_ylabel('y', fontsize=9)
        ax.grid(True, alpha=0.15)

    plt.suptitle('Linear Transformations and Their Eigenvectors\n'
                 'Grid shows deformed space; arrows = eigenvectors (scale = eigenvalue)',
                 fontsize=12, fontweight='bold', y=1.02)
    plt.tight_layout()
    plt.savefig('transformation_eigenvectors.png', dpi=150, bbox_inches='tight')
    plt.show()
    print("Saved: transformation_eigenvectors.png")

    print("\n  For the covariance matrix [[2, 1.5], [1.5, 2]]:")
    A_cov = np.array([[2.0, 1.5], [1.5, 2.0]])
    eig_v, eig_vec = np.linalg.eigh(A_cov)
    eig_v   = eig_v[::-1]; eig_vec = eig_vec[:, ::-1]
    print(f"    λ₁ = {eig_v[0]:.4f}, v₁ = [{eig_vec[0,0]:.4f}, {eig_vec[1,0]:.4f}]")
    print(f"    λ₂ = {eig_v[1]:.4f}, v₂ = [{eig_vec[0,1]:.4f}, {eig_vec[1,1]:.4f}]")
    print(f"    v₁ · v₂ = {eig_vec[:,0] @ eig_vec[:,1]:.6f} (orthogonal ✓)")
    print(f"    For a covariance matrix, v₁ points in the direction of max variance")


visualize_transformation_eigenvectors()

The Spectral Theorem: Why PCA Works

The mathematical guarantee that PCA works as described comes from the Spectral Theorem: every real symmetric matrix has a complete set of real, orthogonal eigenvectors that form a basis for the entire space.

Covariance matrices are always symmetric (C = C^T) and positive semi-definite. The Spectral Theorem therefore guarantees:

  1. All eigenvalues are real (no complex numbers)
  2. All eigenvalues are ≥ 0 (variances are non-negative)
  3. Eigenvectors for different eigenvalues are orthogonal (PCs are perpendicular)
  4. There are exactly d eigenvectors for a d-dimensional covariance matrix (complete basis)
Python
import numpy as np


def verify_spectral_theorem():
    """
    Verify all Spectral Theorem guarantees numerically for a random
    covariance matrix.
    """
    np.random.seed(42)
    d = 6
    X = np.random.randn(200, d)
    C = np.cov(X.T)  # Guaranteed symmetric PSD

    eigenvalues, eigenvectors = np.linalg.eigh(C)
    idx = np.argsort(eigenvalues)[::-1]
    eigenvalues  = eigenvalues[idx]
    eigenvectors = eigenvectors[:, idx]

    print("=== Spectral Theorem Verification ===\n")
    print(f"  C = {d}×{d} covariance matrix from random {d}D data\n")

    # Check 1: Real eigenvalues
    eig_real = np.linalg.eigvals(C)
    all_real = np.all(np.abs(eig_real.imag) < 1e-10)
    print(f"  1. All eigenvalues real?  {all_real} ✓")
    print(f"     Max |imaginary part|: {np.abs(eig_real.imag).max():.2e}")

    # Check 2: Non-negative eigenvalues
    all_nonneg = np.all(eigenvalues >= -1e-10)
    print(f"\n  2. All eigenvalues ≥ 0?  {all_nonneg} ✓")
    print(f"     Eigenvalues: {eigenvalues.round(4)}")

    # Check 3: Orthogonal eigenvectors
    VtV = eigenvectors.T @ eigenvectors
    orth_err = np.linalg.norm(VtV - np.eye(d))
    print(f"\n  3. Eigenvectors orthonormal? ||VᵀV - I|| = {orth_err:.2e}")
    print(f"     (Should be ~0) ✓")

    # Check 4: Complete basis (d eigenvectors)
    rank_V = np.linalg.matrix_rank(eigenvectors)
    print(f"\n  4. Complete basis (rank = d)?  rank(V) = {rank_V} = d = {d} ✓")

    # Check 5: Spectral decomposition A = VΛVᵀ
    C_reconstructed = eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T
    recon_err = np.linalg.norm(C - C_reconstructed)
    print(f"\n  5. Spectral decomposition C = VΛVᵀ?")
    print(f"     ||C - VΛVᵀ|| = {recon_err:.2e} ✓")

    print(f"\n  All Spectral Theorem guarantees hold numerically.")
    print(f"  These properties are what make PCA mathematically sound.")


verify_spectral_theorem()

Special Cases: Zero, Negative, and Complex Eigenvalues

Not all matrices have the nice non-negative real eigenvalues of covariance matrices. Understanding edge cases deepens understanding and helps debug numerical issues.

Python
import numpy as np


def special_eigenvalue_cases():
    """
    Demonstrate what zero, negative, and complex eigenvalues mean,
    and why covariance matrices are always safe from these issues.
    """
    print("=== Special Eigenvalue Cases ===\n")

    # Case 1: Zero eigenvalue — rank-deficient matrix
    A_sing = np.array([[2.0, 2.0], [1.0, 1.0]])
    eig_s  = np.linalg.eigvals(A_sing)
    print(f"  1. Zero eigenvalue (singular matrix, rank < n)")
    print(f"     A = [[2,2],[1,1]]; eigenvalues = {eig_s.round(6)}")
    print(f"     Meaning: one direction gets collapsed to zero length")
    print(f"     In PCA: zero-eigenvalue components have no variance → discard\n")

    # Case 2: Negative eigenvalue — indefinite matrix
    A_indef = np.array([[1.0, 3.0], [3.0, 1.0]])
    eig_i   = np.linalg.eigvals(A_indef)
    print(f"  2. Negative eigenvalue (indefinite, non-PSD matrix)")
    print(f"     A = [[1,3],[3,1]]; eigenvalues = {eig_i.round(6)}")
    print(f"     Meaning: the transformation REFLECTS along that eigenvector")
    print(f"     In PCA: covariance matrices are ALWAYS PSD")
    print(f"     Proof: vᵀCv = ||Xv||²/(n-1) ≥ 0 for any vector v\n")

    # Case 3: Complex eigenvalues — rotation
    A_rot = np.array([[0.0, -1.0], [1.0, 0.0]])  # 90° rotation
    eig_r = np.linalg.eigvals(A_rot)
    print(f"  3. Complex eigenvalues (rotation matrix)")
    print(f"     A = [[0,-1],[1,0]] (90° rotation); eigenvalues = {eig_r}")
    print(f"     Meaning: no real fixed direction — rotation has none")
    print(f"     In PCA: symmetric matrices always have REAL eigenvalues\n")

    print(f"  Summary: covariance matrices are:")
    print(f"    - Symmetric (C = Cᵀ) → real eigenvalues guaranteed")
    print(f"    - Positive semi-definite → non-negative eigenvalues guaranteed")
    print(f"    - Both → PCA is always well-defined and interpretable")


special_eigenvalue_cases()

The safety of covariance matrices for PCA stems from a simple algebraic identity: for any vector v, v^T C v = v^T (X^T X/(n-1)) v = ‖Xv‖²/(n-1) ≥ 0. This proves the covariance matrix is always positive semi-definite, guaranteeing all eigenvalues are ≥ 0 and real. The tiny negative eigenvalues that occasionally appear in practice (e.g., −10^{-15}) are purely floating-point rounding errors and should be clipped to zero.

Summary

Eigenvalues and eigenvectors are the mathematical heart of PCA. An eigenvector of a matrix is a direction that the matrix only stretches (by factor λ, the eigenvalue) — never rotates. For the data’s covariance matrix, the eigenvectors are the principal component directions, and the eigenvalues are the amount of variance along each direction.

The characteristic equation det(A − λI) = 0 is the algebraic tool for finding eigenvalues: it produces a polynomial whose roots are the eigenvalues. For symmetric matrices (like covariance matrices), all eigenvalues are real and non-negative, and all eigenvectors are orthogonal — these properties make covariance eigendecomposition the perfect tool for PCA.

Five key properties of eigenvalues connect to PCA intuition: their sum equals the matrix trace (total variance); their product equals the determinant; eigenvectors of symmetric matrices are always orthogonal; together they reconstruct the matrix exactly (A = VΛV^T); and for positive semi-definite matrices (covariances), all eigenvalues are ≥ 0.

Power iteration is the conceptually simplest algorithm for finding the dominant eigenvector — repeatedly multiply by A and normalize. The QR algorithm (used in numpy.linalg.eigh) finds all eigenvalues simultaneously. SVD is the numerically preferred method for PCA specifically because it avoids squaring the data matrix and handles rank-deficient matrices more gracefully.

The Kaiser criterion (keep components with eigenvalue > 1) provides a fast heuristic for choosing k: components with eigenvalue > 1 explain more variance than a single original feature, making them worth keeping. Combined with the 95% cumulative variance threshold, these two criteria typically agree and provide a principled data-driven choice of k without any downstream evaluation.

The mathematical guarantee comes from the Spectral Theorem: every real symmetric matrix has a complete set of real, orthogonal eigenvectors. Covariance matrices are always symmetric and positive semi-definite, so all eigenvalues are real and non-negative, all eigenvectors are orthogonal, and together they span the entire feature space — exactly the properties PCA requires.

Understanding eigenvalues and eigenvectors makes every PCA decision more principled: the k-selection problem becomes “how many eigenvalues are large enough to be meaningful?”, the Kaiser criterion (λ > 1) gives a data-driven answer grounded in the mathematics, and debugging unexpected PCA behavior (near-zero eigenvalues, poor explained variance) becomes a matter of inspecting the eigenspectrum rather than guessing at parameters.

Share:
Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments

Discover More

Qualcomm Snapdragon X2 Elite Targets Premium Laptop Market with 5GHz Performance

Qualcomm unveils Snapdragon X2 Elite processor at CES 2026, delivering 5GHz performance and 80 TOPS…

Voltage Regulators Explained: Protecting Sensitive Electronics

Voltage Regulators Explained: Protecting Sensitive Electronics

Understand voltage regulators for robotics—learn how linear LDOs and switching buck converters work, when to…

Why LEDs Need Resistors: Calculating Your First Current-Limiting Resistor

Learn why LEDs require current-limiting resistors, understand the physics behind LED behavior, and master the…

AI-Powered Human Trafficking Detection Tools Enter Operational Law Enforcement Deployment

AI-Powered Human Trafficking Detection Tools Enter Operational Law Enforcement Deployment

Advanced data science models trained to identify trafficking networks through pattern recognition across communications, financial,…

Function Overloading in C++: Same Name, Different Behavior

Learn C++ function overloading with this complete guide. Understand how to create multiple functions with…

Debugging Python Code: Tips for AI Beginners

Master Python debugging for AI projects. Learn to read error messages, use print debugging, leverage…

Click For More
0
Would love your thoughts, please comment.x
()
x