An eigenvector of a matrix A is a special vector that only gets stretched (not rotated) when multiplied by A: Av = λv. The scalar λ is the eigenvalue — how much the vector gets stretched. In PCA, the eigenvectors of the data’s covariance matrix are the principal component directions (the axes along which the data varies most), and the eigenvalues are the amount of variance along each direction. The eigenvector with the largest eigenvalue points in the direction of maximum variance — PC1.
Introduction
“Eigenvalue” and “eigenvector” are German mathematical terms — “eigen” means “own” or “characteristic.” They describe the self-similar directions of a linear transformation: vectors that the transformation stretches or shrinks but never rotates. Every matrix transformation has its own set of these characteristic directions, and finding them reveals the fundamental geometry of the transformation.
For PCA, this geometric meaning translates directly into data science: the eigenvectors of the covariance matrix are the directions in which the data varies most strongly, and the eigenvalues quantify exactly how much variation exists in each direction. The first eigenvector (with the largest eigenvalue) is PC1 — the single axis that captures the most variance. The second eigenvector is PC2 — perpendicular to PC1 and capturing the most remaining variance. And so on for each subsequent component.
This article builds understanding of eigenvalues and eigenvectors from the ground up: geometric intuition through visual examples, the mathematical definition, how to compute them (power iteration, QR algorithm, and numpy), the connection to covariance matrices, and all the properties that make eigenvectors the natural coordinate system for PCA.
The Geometric Intuition: Vectors That Don’t Rotate
Most vectors change direction when multiplied by a matrix. Eigenvectors are the special ones that only change length — their direction is preserved.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
import warnings
warnings.filterwarnings('ignore')
np.random.seed(42)
def visualize_eigenvector_property(figsize=(16, 6)):
"""
Show that eigenvectors only get scaled (not rotated) by the matrix,
while ordinary vectors change direction.
A = [[3, 1], [1, 2]] (symmetric 2x2 with known eigenvectors)
"""
A = np.array([[3.0, 1.0],
[1.0, 2.0]])
eigenvalues, eigenvectors = np.linalg.eigh(A)
# eigh returns ascending; reverse
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
v1 = eigenvectors[:, 0] # Eigenvector for largest eigenvalue
v2 = eigenvectors[:, 1] # Eigenvector for smaller eigenvalue
lam1, lam2 = eigenvalues
fig, axes = plt.subplots(1, 3, figsize=figsize)
def draw_vector(ax, vec, color, label, scale=1.0, style='->', lw=2.5):
ax.annotate('', xy=vec * scale, xytext=(0, 0),
arrowprops=dict(arrowstyle=style, color=color,
lw=lw, mutation_scale=18))
ax.text(vec[0] * scale * 1.15, vec[1] * scale * 1.15,
label, color=color, fontsize=10, fontweight='bold',
ha='center')
# ── Panel 1: Before transformation ──────────────────────────
ax = axes[0]
# A regular vector (not an eigenvector)
v_regular = np.array([1.0, 0.0]) # Standard basis vector
draw_vector(ax, v1, 'coral', f'v₁ (eigvec)', scale=1.5)
draw_vector(ax, v2, 'steelblue', f'v₂ (eigvec)', scale=1.5)
draw_vector(ax, v_regular, 'goldenrod', 'e₁ (regular)', scale=1.5, lw=2)
ax.set_xlim(-2.5, 2.5); ax.set_ylim(-2.5, 2.5)
ax.axhline(0, color='black', lw=0.8, alpha=0.4)
ax.axvline(0, color='black', lw=0.8, alpha=0.4)
ax.set_title('Before: A × v\n(Before transformation)',
fontsize=10, fontweight='bold')
ax.set_aspect('equal'); ax.grid(True, alpha=0.2)
# ── Panel 2: After transformation ─────────────────────────────
ax = axes[1]
Av1 = A @ v1
Av2 = A @ v2
Av_reg = A @ v_regular
draw_vector(ax, Av1, 'coral', f'A·v₁ = {lam1:.2f}·v₁', scale=0.6)
draw_vector(ax, Av2, 'steelblue', f'A·v₂ = {lam2:.2f}·v₂', scale=0.6)
draw_vector(ax, Av_reg, 'goldenrod', 'A·e₁ (rotated!)', scale=0.6)
draw_vector(ax, v_regular, 'goldenrod', 'e₁ (original)', scale=1.0,
style='->', lw=1.2) # Show original for comparison
ax.set_xlim(-2.5, 2.5); ax.set_ylim(-2.5, 2.5)
ax.axhline(0, color='black', lw=0.8, alpha=0.4)
ax.axvline(0, color='black', lw=0.8, alpha=0.4)
ax.set_title('After: A applied to each vector\n'
'Eigenvectors only stretched, NOT rotated',
fontsize=10, fontweight='bold')
ax.set_aspect('equal'); ax.grid(True, alpha=0.2)
# ── Panel 3: Visual verification Av = λv ─────────────────────
ax = axes[2]
ax.text(0.5, 0.88,
f'Matrix A = [[3, 1], [1, 2]]\n\n'
f'Eigenvalue λ₁ = {lam1:.4f}\n'
f'Eigenvector v₁ = [{v1[0]:.4f}, {v1[1]:.4f}]\n\n'
f'Verify: A·v₁ = [{Av1[0]:.4f}, {Av1[1]:.4f}]\n'
f' λ₁·v₁ = [{lam1*v1[0]:.4f}, {lam1*v1[1]:.4f}]\n'
f' Equal? {np.allclose(Av1, lam1*v1)}\n\n'
f'Eigenvalue λ₂ = {lam2:.4f}\n'
f'Eigenvector v₂ = [{v2[0]:.4f}, {v2[1]:.4f}]\n\n'
f'Verify: A·v₂ = [{Av2[0]:.4f}, {Av2[1]:.4f}]\n'
f' λ₂·v₂ = [{lam2*v2[0]:.4f}, {lam2*v2[1]:.4f}]\n'
f' Equal? {np.allclose(Av2, lam2*v2)}\n\n'
f'v₁ · v₂ = {v1 @ v2:.6f} (orthogonal ✓)',
transform=ax.transAxes, fontsize=9,
verticalalignment='top', ha='center',
fontfamily='monospace',
bbox=dict(boxstyle='round', fc='lightyellow',
ec='steelblue', alpha=0.9))
ax.set_title('Algebraic Verification\n'
'A·v = λ·v (definition of eigenvector)',
fontsize=10, fontweight='bold')
ax.axis('off')
plt.suptitle('Eigenvectors: The Directions a Matrix Does NOT Rotate',
fontsize=13, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig('eigenvector_geometric.png', dpi=150, bbox_inches='tight')
plt.show()
print("Saved: eigenvector_geometric.png")
visualize_eigenvector_property()The Characteristic Equation: Finding Eigenvalues
To find eigenvalues, solve the characteristic equation:
For a 2×2 matrix, this is a quadratic equation. For larger matrices, it is a polynomial of degree n — which is why numerical methods are needed for n > 4 or so.
import numpy as np
import sympy as sp
def solve_characteristic_equation_symbolic():
"""
Use SymPy to derive eigenvalues analytically for a 2×2 matrix.
Shows the full algebraic derivation step-by-step.
"""
# Define symbolic matrix
lam = sp.Symbol('lambda')
A_sym = sp.Matrix([[3, 1],
[1, 2]])
print("=== Characteristic Equation: Step-by-Step ===\n")
print(f" Matrix A = {A_sym.tolist()}\n")
# A - λI
I = sp.eye(2)
A_lam = A_sym - lam * I
print(f" A - λI =")
sp.pprint(A_lam)
print()
# Determinant
det_expr = A_lam.det()
det_expr_expanded = sp.expand(det_expr)
print(f" det(A - λI) = {det_expr}")
print(f" Expanded: {det_expr_expanded}")
print()
# Solve for eigenvalues
eigenvalues_sym = sp.solve(det_expr_expanded, lam)
print(f" Solving det(A - λI) = 0:")
print(f" λ = {eigenvalues_sym}\n")
# For each eigenvalue, find eigenvector
for lv in sorted(eigenvalues_sym, reverse=True):
print(f" For λ = {lv}:")
A_minus_lI = A_sym - lv * sp.eye(2)
print(f" (A - λI) =")
sp.pprint(A_minus_lI)
# Row reduce to find null space
null_sp = A_minus_lI.nullspace()
if null_sp:
v = null_sp[0]
v_normalized = v / v.norm()
print(f" Eigenvector (normalized) = {[float(x) for x in v_normalized]}")
print()
def solve_by_hand_3x3():
"""
Show the characteristic polynomial for a 3×3 symmetric matrix
and its eigenvalues computed numerically.
"""
A = np.array([[4.0, 2.0, 1.0],
[2.0, 3.0, 1.0],
[1.0, 1.0, 2.0]])
print("=== 3×3 Symmetric Matrix ===\n")
print(f" A = \n{A}\n")
eigenvalues, eigenvectors = np.linalg.eigh(A)
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
print(f" Eigenvalues (descending): {eigenvalues.round(4)}")
print(f" Sum = trace(A): {eigenvalues.sum():.4f} == {A.trace():.4f} ✓")
print(f" Product = det(A): {eigenvalues.prod():.4f} == {np.linalg.det(A):.4f} ✓\n")
print(f" Eigenvectors (columns):")
for i, (ev, lam) in enumerate(zip(eigenvectors.T, eigenvalues)):
print(f" v{i+1} (λ={lam:.4f}): [{', '.join(f'{x:.4f}' for x in ev)}]")
# Verify orthogonality
if i > 0:
dot = eigenvectors[:, 0] @ ev
print(f" v1 · v{i+1} = {dot:.8f} (orthogonal ✓)")
print(f"\n Key properties verified:")
print(f" 1. Av = λv: max error = "
f"{max(np.linalg.norm(A @ eigenvectors[:,i] - eigenvalues[i]*eigenvectors[:,i]) for i in range(3)):.2e}")
print(f" 2. All eigenvectors orthogonal: "
f"||VᵀV - I|| = {np.linalg.norm(eigenvectors.T @ eigenvectors - np.eye(3)):.2e}")
print(f" 3. Trace = sum(eigenvalues): "
f"{abs(A.trace() - eigenvalues.sum()):.2e}")
print(f" 4. Det = product(eigenvalues): "
f"{abs(np.linalg.det(A) - eigenvalues.prod()):.2e}")
solve_characteristic_equation_symbolic()
solve_by_hand_3x3()Key Properties of Eigenvalues and Eigenvectors
Understanding these properties deepens the geometric and algebraic intuition, and they appear throughout PCA derivations.
import numpy as np
def demonstrate_eigenvalue_properties():
"""
Demonstrate five fundamental eigenvalue/eigenvector properties
with numerical verification.
"""
np.random.seed(42)
# Use a symmetric positive semi-definite matrix (like a covariance matrix)
X = np.random.randn(50, 4)
A = (X.T @ X) / (50 - 1) # Sample covariance matrix
eigenvalues, eigenvectors = np.linalg.eigh(A)
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
V = eigenvectors # Columns are eigenvectors
print("=== Five Key Eigenvalue Properties ===\n")
print(f" A = random 4×4 covariance matrix\n")
# Property 1: Trace = sum of eigenvalues
print(f" 1. Trace = sum of eigenvalues")
print(f" trace(A) = {A.trace():.8f}")
print(f" sum(eigenvalues) = {eigenvalues.sum():.8f}")
print(f" Equal? {np.allclose(A.trace(), eigenvalues.sum())}\n")
# Property 2: Determinant = product of eigenvalues
print(f" 2. Det = product of eigenvalues")
print(f" det(A) = {np.linalg.det(A):.8f}")
print(f" prod(eigenvalues) = {eigenvalues.prod():.8f}")
print(f" Equal? {np.allclose(np.linalg.det(A), eigenvalues.prod(), rtol=1e-5)}\n")
# Property 3: Eigenvectors are orthogonal (for symmetric A)
print(f" 3. Eigenvectors are orthogonal (symmetric matrix → orthonormal set)")
VtV = V.T @ V
print(f" ||VᵀV - I|| = {np.linalg.norm(VtV - np.eye(4)):.2e}")
print(f" VᵀV ≈ I? {np.allclose(VtV, np.eye(4))}\n")
# Property 4: Eigendecomposition A = V Λ V^T
print(f" 4. Eigendecomposition: A = V·diag(λ)·Vᵀ")
A_reconstructed = V @ np.diag(eigenvalues) @ V.T
print(f" ||A - VΛVᵀ|| = {np.linalg.norm(A - A_reconstructed):.2e}")
print(f" Exact reconstruction? {np.allclose(A, A_reconstructed)}\n")
# Property 5: All eigenvalues ≥ 0 for positive semi-definite matrices
print(f" 5. Positive semi-definite matrix → non-negative eigenvalues")
print(f" Eigenvalues: {eigenvalues.round(4)}")
print(f" All ≥ 0? {all(eigenvalues >= -1e-10)}")
print(f" (Covariance matrices are always PSD — variance cannot be negative)\n")
# Bonus: Rank = number of nonzero eigenvalues
rank = np.linalg.matrix_rank(A)
n_nonzero_eig = (eigenvalues > 1e-10).sum()
print(f" Bonus: rank(A) = {rank}, nonzero eigenvalues = {n_nonzero_eig}")
demonstrate_eigenvalue_properties()Computing Eigenvalues: Power Iteration and QR Algorithm
Understanding the algorithms that compute eigenvalues reveals why they work and what can go wrong.
import numpy as np
def power_iteration(A, n_iterations=100, random_state=42):
"""
Power iteration: finds the DOMINANT eigenvector (largest eigenvalue).
Algorithm:
1. Start with a random vector v
2. Repeatedly multiply: v ← A·v / ||A·v||
3. The vector converges to the dominant eigenvector
4. λ ≈ vᵀ·A·v (Rayleigh quotient) after convergence
Convergence rate: proportional to (λ₂/λ₁)ⁿ
Faster when the dominant eigenvalue is much larger than the second.
"""
np.random.seed(random_state)
n = A.shape[0]
v = np.random.randn(n)
v = v / np.linalg.norm(v)
true_eigenvalues, true_eigenvectors = np.linalg.eigh(A)
idx = np.argsort(np.abs(true_eigenvalues))[::-1]
lam_true = true_eigenvalues[idx[0]]
v_true = true_eigenvectors[:, idx[0]]
history = []
for i in range(n_iterations):
v_new = A @ v
lam_est = v @ v_new # Rayleigh quotient
v = v_new / np.linalg.norm(v_new)
# Track error (account for sign ambiguity)
angle_error = min(
np.linalg.norm(v - v_true),
np.linalg.norm(v + v_true)
)
lam_error = abs(lam_est - lam_true)
history.append((i + 1, lam_est, angle_error, lam_error))
return v, lam_est, history
# Demonstrate power iteration on a 4×4 symmetric matrix
np.random.seed(42)
X_power = np.random.randn(100, 4)
A_power = (X_power.T @ X_power) / 99
print("=== Power Iteration: Finding the Dominant Eigenvector ===\n")
v_dom, lam_dom, history = power_iteration(A_power, n_iterations=30)
true_eig, true_vec = np.linalg.eigh(A_power)
lam_true = true_eig.max()
print(f" A = 4×4 covariance matrix")
print(f" True dominant eigenvalue: {lam_true:.8f}\n")
print(f" {'Iter':>5} | {'λ estimate':>14} | {'λ error':>12} | {'Vec angle err':>14}")
print(f" {'-'*52}")
for it, lam_est, angle_err, lam_err in history[:10]:
print(f" {it:>5} | {lam_est:>14.8f} | {lam_err:>12.2e} | {angle_err:>14.2e}")
print(f" ...")
for it, lam_est, angle_err, lam_err in history[-3:]:
print(f" {it:>5} | {lam_est:>14.8f} | {lam_err:>12.2e} | {angle_err:>14.2e}")
print(f"\n Converged eigenvalue: {lam_dom:.8f}")
print(f" True eigenvalue: {lam_true:.8f}")
print(f" Error: {abs(lam_dom - lam_true):.2e}")
def deflation_all_components(A, n_components=None):
"""
Deflation: extract all eigenvectors by iteratively removing
the dominant component after each power iteration.
After finding dominant eigenvector v₁ with eigenvalue λ₁:
A' = A - λ₁ · v₁ · v₁ᵀ (deflated matrix — PC1 removed)
Repeat on A' to get λ₂, v₂, etc.
This is how early iterative PCA algorithms worked.
Modern algorithms use QR decomposition for better stability.
"""
n = A.shape[0]
if n_components is None:
n_components = n
eigenvalues = []
eigenvectors = []
A_def = A.copy()
for k in range(n_components):
v_k, lam_k, _ = power_iteration(A_def, n_iterations=200,
random_state=k)
eigenvalues.append(lam_k)
eigenvectors.append(v_k)
# Deflate
A_def = A_def - lam_k * np.outer(v_k, v_k)
return np.array(eigenvalues), np.column_stack(eigenvectors)
print("\n=== Deflation: All Eigenvectors via Power Iteration ===\n")
eig_vals_defl, eig_vecs_defl = deflation_all_components(A_power)
eig_vals_true, eig_vecs_true = np.linalg.eigh(A_power)
idx = np.argsort(eig_vals_true)[::-1]
eig_vals_true = eig_vals_true[idx]
print(f" {'Component':>10} | {'Deflation λ':>14} | {'True λ':>12} | {'Error':>10}")
print(f" {'-'*52}")
for i, (lv_d, lv_t) in enumerate(zip(eig_vals_defl, eig_vals_true)):
print(f" {i+1:>10} | {lv_d:>14.6f} | {lv_t:>12.6f} | "
f"{abs(lv_d-lv_t):>10.2e}")The Covariance Matrix: Bridge Between Data and PCA
The covariance matrix is the central object connecting raw data to PCA. Its eigendecomposition directly produces the principal components.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
def covariance_to_pca():
"""
Complete walkthrough: raw data → covariance matrix → eigendecomposition → PCA.
Shows every intermediate step and validates against sklearn.
"""
iris = load_iris()
X = iris.data
X_s = StandardScaler().fit_transform(X)
n, d = X_s.shape
print("=== From Covariance Matrix to PCA ===\n")
print(f" Data: {n} samples × {d} features (Iris, standardized)\n")
# Step 1: Covariance matrix
C = np.cov(X_s.T)
print(f" Covariance matrix C ({d}×{d}):")
print(f" (All diagonal = 1.0 because data is standardized = correlation matrix)\n")
print(f" {'':>22}", end='')
for name in iris.feature_names:
print(f" {name[:8]:>10}", end='')
print()
for i, row_name in enumerate(iris.feature_names):
print(f" {row_name[:22]:>22}", end='')
for j in range(d):
print(f" {C[i,j]:>10.4f}", end='')
print()
print()
# Step 2: Eigendecomposition
eigenvalues, eigenvectors = np.linalg.eigh(C)
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
total_var = eigenvalues.sum()
var_explained = eigenvalues / total_var * 100
cum_var = np.cumsum(var_explained)
print(f" Eigendecomposition results:")
print(f" {'PC':>4} | {'Eigenvalue':>12} | {'Var%':>8} | {'Cum%':>8} | "
f"{'PC Direction (loadings)'}")
print(f" {'-'*85}")
for i in range(d):
loadings = ', '.join(f'{x:+.4f}' for x in eigenvectors[:, i])
print(f" {i+1:>4} | {eigenvalues[i]:>12.6f} | {var_explained[i]:>8.2f} | "
f"{cum_var[i]:>8.2f} | [{loadings}]")
# Step 3: Project data
X_pca = X_s @ eigenvectors
print(f"\n Projected data shape: {X_pca.shape}")
print(f" PC1-PC2 variance explained: {cum_var[1]:.2f}%")
# Verify: columns of X_pca should have variances ≈ eigenvalues
print(f"\n Variance of projected data (should equal eigenvalues):")
for i in range(d):
actual_var = X_pca[:, i].var(ddof=1)
print(f" PC{i+1}: actual={actual_var:.6f}, eigenvalue={eigenvalues[i]:.6f}, "
f"match={np.isclose(actual_var, eigenvalues[i])}")
# Verify: projected components are uncorrelated
corr_matrix = np.corrcoef(X_pca.T)
off_diag = corr_matrix[np.triu_indices(d, k=1)]
print(f"\n Correlation between components (should all be ~0):")
print(f" Max |off-diagonal correlation|: "
f"{np.abs(off_diag).max():.2e} ≈ 0 ✓")
return eigenvalues, eigenvectors, X_pca
eigenvalues, eigenvectors, X_pca = covariance_to_pca()Eigenvalues in PCA: Visualizing What They Mean
The eigenvalue of a principal component tells you the variance of the data along that component’s direction. Visualizing this concretely connects the abstract math to data intuition.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
def visualize_eigenvalue_meaning(figsize=(16, 10)):
"""
Visual demonstration of what eigenvalues mean in PCA:
1. Show data projected onto each PC
2. Each projection has variance = corresponding eigenvalue
3. Total variance is the sum of all eigenvalues (= trace of cov matrix)
"""
wine = load_wine()
X_s = StandardScaler().fit_transform(wine.data)
n, d = X_s.shape
C = np.cov(X_s.T)
eigenvalues, eigenvectors = np.linalg.eigh(C)
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
X_pca = X_s @ eigenvectors
var_pct = eigenvalues / eigenvalues.sum() * 100
fig = plt.figure(figsize=figsize)
gs = plt.GridSpec(2, 2, figure=fig, hspace=0.4, wspace=0.35)
# ── Panel 1: Scree plot with eigenvalues ─────────────────────
ax1 = fig.add_subplot(gs[0, 0])
colors_bar = plt.cm.Blues(np.linspace(0.4, 0.9, d))[::-1]
bars = ax1.bar(range(1, d+1), eigenvalues, color=colors_bar,
edgecolor='white', linewidth=0.5)
for bar, ev, pct in zip(bars, eigenvalues, var_pct):
if pct > 3:
ax1.text(bar.get_x() + bar.get_width()/2,
bar.get_height() + 0.02,
f'{pct:.1f}%', ha='center', fontsize=7, color='black')
ax1.set_title('Eigenvalues = Variance per Component\n'
'(Height = how much variance the PC captures)',
fontsize=10, fontweight='bold')
ax1.set_xlabel('Principal Component', fontsize=10)
ax1.set_ylabel('Eigenvalue (= variance)', fontsize=10)
ax1.set_xticks(range(1, d+1))
ax1.grid(True, alpha=0.3, axis='y')
# ── Panel 2: Distribution of each PC (width = eigenvalue) ────
ax2 = fig.add_subplot(gs[0, 1])
for i in range(min(5, d)):
ax2.hist(X_pca[:, i], bins=20, alpha=0.6,
label=f'PC{i+1} (var={eigenvalues[i]:.2f})',
density=True)
ax2.set_title('Data Distribution Along Each PC\n'
'(Wider spread = larger eigenvalue = more variance)',
fontsize=10, fontweight='bold')
ax2.set_xlabel('Projection value', fontsize=10)
ax2.set_ylabel('Density', fontsize=10)
ax2.legend(fontsize=7)
ax2.grid(True, alpha=0.3)
# ── Panel 3: 2D scatter with ellipse ─────────────────────────
ax3 = fig.add_subplot(gs[1, 0])
colors_wine = ['coral', 'steelblue', 'mediumseagreen']
for cls, color in enumerate(colors_wine):
mask = wine.target == cls
ax3.scatter(X_pca[mask, 0], X_pca[mask, 1], c=color,
s=30, edgecolors='white', linewidth=0.3, alpha=0.8,
label=f'Class {cls}')
# Draw ellipse showing ±1 std (= ±√eigenvalue) along each PC
theta = np.linspace(0, 2*np.pi, 100)
std1, std2 = np.sqrt(eigenvalues[0]), np.sqrt(eigenvalues[1])
ax3.plot(std1 * np.cos(theta), std2 * np.sin(theta),
'k--', lw=2, alpha=0.5, label='±1σ ellipse')
ax3.axhline(0, color='black', lw=0.8, alpha=0.4)
ax3.axvline(0, color='black', lw=0.8, alpha=0.4)
ax3.set_title(f'PC1 vs PC2 Scatter\n'
f'σ₁={std1:.2f}, σ₂={std2:.2f} (≈ spread in each direction)',
fontsize=10, fontweight='bold')
ax3.set_xlabel(f'PC1 ({var_pct[0]:.1f}% variance)', fontsize=10)
ax3.set_ylabel(f'PC2 ({var_pct[1]:.1f}% variance)', fontsize=10)
ax3.legend(fontsize=8); ax3.grid(True, alpha=0.2)
# ── Panel 4: Cumulative variance ─────────────────────────────
ax4 = fig.add_subplot(gs[1, 1])
cum_var = np.cumsum(var_pct)
ax4.fill_between(range(1, d+1), cum_var, alpha=0.2, color='steelblue')
ax4.plot(range(1, d+1), cum_var, 'o-', color='steelblue',
lw=2.5, markersize=8)
for threshold, color in [(80, 'goldenrod'), (95, 'coral'), (99, 'steelblue')]:
k = np.searchsorted(cum_var, threshold) + 1
ax4.axhline(y=threshold, color=color, linestyle='--', lw=1.5, alpha=0.8,
label=f'{threshold}% at k={k}')
ax4.axvline(x=k, color=color, linestyle=':', lw=1, alpha=0.6)
ax4.set_xlabel('Number of Components Kept', fontsize=10)
ax4.set_ylabel('Cumulative Variance (%)', fontsize=10)
ax4.set_title('Cumulative Variance vs k\n'
'How much information is retained?',
fontsize=10, fontweight='bold')
ax4.legend(fontsize=8); ax4.grid(True, alpha=0.3)
plt.suptitle('Eigenvalues in PCA: Measuring Variance Along Each Direction\n'
'Wine Dataset (13 features)',
fontsize=13, fontweight='bold')
plt.savefig('eigenvalue_meaning_pca.png', dpi=150, bbox_inches='tight')
plt.show()
print("Saved: eigenvalue_meaning_pca.png")
visualize_eigenvalue_meaning()Eigendecomposition vs SVD in Practice
import numpy as np
import time
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_digits
def compare_eigendecomposition_svd():
"""
Compare eigendecomposition and SVD numerically:
- Accuracy on a realistic dataset
- Runtime comparison
- Numerical stability with near-singular matrices
"""
digits = load_digits()
X_s = StandardScaler().fit_transform(digits.data)
n, d = X_s.shape
print("=== Eigendecomposition vs SVD: Practical Comparison ===\n")
# Method 1: Eigendecomposition of covariance matrix
t0 = time.perf_counter()
C = (X_s.T @ X_s) / (n - 1)
eig_vals, eig_vecs = np.linalg.eigh(C)
idx = np.argsort(eig_vals)[::-1]
eig_vals = eig_vals[idx]; eig_vecs = eig_vecs[:, idx]
t_eig = time.perf_counter() - t0
# Method 2: SVD of data matrix
t0 = time.perf_counter()
U, s, Vt = np.linalg.svd(X_s, full_matrices=False)
svd_vals = s**2 / (n - 1)
svd_vecs = Vt.T # Principal component directions
t_svd = time.perf_counter() - t0
print(f" {'Method':<25} | {'Time (s)':>10} | Notes")
print(f" {'-'*55}")
print(f" {'Eigendecomp (eigh)':<25} | {t_eig:>10.6f} | Builds n×n cov matrix")
print(f" {'SVD (full)':<25} | {t_svd:>10.6f} | Works on n×d data matrix")
# Accuracy comparison
top_k = 5
max_diff_vals = np.max(np.abs(eig_vals[:top_k] - svd_vals[:top_k]))
print(f"\n Top {top_k} eigenvalues comparison:")
print(f" {'PC':>4} | {'Eigh':>12} | {'SVD':>12} | {'Diff':>10}")
print(f" {'-'*42}")
for i in range(top_k):
print(f" {i+1:>4} | {eig_vals[i]:>12.6f} | {svd_vals[i]:>12.6f} | "
f"{abs(eig_vals[i]-svd_vals[i]):>10.2e}")
print(f"\n Max difference: {max_diff_vals:.2e} (numerical precision)")
# Stability test: near-singular matrix
print(f"\n Stability test: near-singular covariance matrix")
np.random.seed(42)
X_sing = np.random.randn(50, 100) # n < d: rank-deficient
X_sing_s = StandardScaler().fit_transform(X_sing)
C_sing = (X_sing_s.T @ X_sing_s) / 49
# Check for negative eigenvalues (instability indicator)
eig_sing, _ = np.linalg.eigh(C_sing)
min_eig = eig_sing.min()
n_negative = (eig_sing < -1e-10).sum()
print(f" Data shape: {X_sing_s.shape} (rank-deficient: n < d)")
print(f" Min eigenvalue (eigh): {min_eig:.6e}")
print(f" Negative eigenvalues: {n_negative}")
print(f" Note: SVD handles this naturally; eigh may show tiny negatives")
compare_eigendecomposition_svd()Connecting Everything: Eigenvalues, PCA, and Data Science
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
def eigenvalue_guided_pca():
"""
Use eigenvalue analysis to make an informed k choice,
then measure the impact on a downstream classifier.
The 'eigenvalue > 1' Kaiser criterion: keep components
whose eigenvalue > 1 (they explain more variance than
a single original standardized feature).
"""
cancer = load_breast_cancer()
X_s = StandardScaler().fit_transform(cancer.data)
n, d = X_s.shape
C = np.cov(X_s.T)
eigenvalues, _ = np.linalg.eigh(C)
eigenvalues = eigenvalues[::-1] # Descending
# Kaiser criterion: eigenvalue > 1
k_kaiser = (eigenvalues > 1).sum()
# 95% variance
cum_var = np.cumsum(eigenvalues) / eigenvalues.sum()
k_95 = np.searchsorted(cum_var, 0.95) + 1
print("=== Eigenvalue-Guided k Selection ===\n")
print(f" Breast Cancer dataset: {d} features\n")
print(f" Kaiser criterion (λ > 1): k = {k_kaiser}")
print(f" 95% variance threshold: k = {k_95}")
print(f" Total features: d = {d}\n")
# Evaluate logistic regression at different k values
test_ks = sorted(set([1, 2, k_kaiser, k_95, 10, 20, d]))
cv = 5
print(f" Logistic Regression: accuracy at different k values")
print(f" {'k':>5} | {'Var%':>8} | {'CV Acc':>8} | Notes")
print(f" {'-'*45}")
for k in test_ks:
if k > d:
continue
pca_k = PCA(n_components=k)
X_pca = pca_k.fit_transform(X_s)
var_pct = pca_k.explained_variance_ratio_.sum() * 100
pipe = Pipeline([
('scaler', StandardScaler()),
('pca', PCA(n_components=k)),
('clf', LogisticRegression(max_iter=1000))
])
acc = cross_val_score(pipe, cancer.data, cancer.target, cv=cv).mean()
flag = ""
if k == k_kaiser:
flag = " ← Kaiser"
elif k == k_95:
flag = " ← 95% var"
elif k == d:
flag = " ← All features"
print(f" {k:>5} | {var_pct:>8.1f} | {acc:>8.4f} |{flag}")
print(f"\n Insight: Often k_Kaiser or k_95 gives near-optimal accuracy")
print(f" with much fewer features than the full set.")
eigenvalue_guided_pca()Visualizing Linear Transformations and Their Eigenvectors
A matrix multiplication is a linear transformation — it maps every vector in the plane to a new position. Visualizing what a transformation does to a grid of points makes the eigenvector concept concrete: eigenvectors are the arrows that point along the “seams” of the transformation.
import numpy as np
import matplotlib.pyplot as plt
def visualize_transformation_eigenvectors(figsize=(16, 7)):
"""
Show three different 2×2 matrix transformations and their eigenvectors.
Each transformation deforms the unit grid differently;
eigenvectors point along the invariant directions.
"""
matrices = {
'Scaling\n[[3,0],[0,1]]':
np.array([[3.0, 0.0], [0.0, 1.0]]),
'Shear+Scale\n[[2,1],[0,1]]':
np.array([[2.0, 1.0], [0.0, 1.0]]),
'Covariance\n[[2,1.5],[1.5,2]]':
np.array([[2.0, 1.5], [1.5, 2.0]]),
}
fig, axes = plt.subplots(1, 3, figsize=figsize)
for ax, (name, A) in zip(axes, matrices.items()):
# Grid before transformation
grid_range = np.linspace(-2, 2, 9)
X_grid, Y_grid = np.meshgrid(grid_range, grid_range)
pts = np.column_stack([X_grid.ravel(), Y_grid.ravel()])
# Apply transformation
pts_transformed = pts @ A.T
# Draw original grid (faint)
for x in grid_range:
line_x = np.array([[x, y] for y in np.linspace(-2, 2, 30)])
line_t = line_x @ A.T
ax.plot(line_t[:, 0], line_t[:, 1], 'lightblue', lw=0.8, alpha=0.5)
for y in grid_range:
line_y = np.array([[x, y] for x in np.linspace(-2, 2, 30)])
line_t = line_y @ A.T
ax.plot(line_t[:, 0], line_t[:, 1], 'lightblue', lw=0.8, alpha=0.5)
# Draw eigenvectors
try:
eigenvalues, eigenvectors = np.linalg.eig(A)
# Only use real eigenvectors
real_idx = np.where(np.abs(eigenvalues.imag) < 1e-10)[0]
eigenvalues = eigenvalues[real_idx].real
eigenvectors = eigenvectors[:, real_idx].real
eigvec_colors = ['coral', 'mediumseagreen']
for i, (ev, lam, color) in enumerate(
zip(eigenvectors.T, eigenvalues, eigvec_colors)):
ev_norm = ev / np.linalg.norm(ev)
scale = abs(lam) * 1.5
# Original direction (dashed)
ax.annotate('', xy=ev_norm * 2, xytext=-ev_norm * 2,
arrowprops=dict(arrowstyle='->', color=color,
lw=1.5, linestyle='dashed',
mutation_scale=12, alpha=0.5))
# Transformed direction
ev_t = A @ ev_norm
ax.annotate('', xy=ev_t / np.linalg.norm(ev_t) * scale,
xytext=(0, 0),
arrowprops=dict(arrowstyle='->', color=color,
lw=2.5, mutation_scale=15))
ax.text(ev_t[0]/np.linalg.norm(ev_t)*scale*1.15,
ev_t[1]/np.linalg.norm(ev_t)*scale*1.15,
f'v{i+1}\nλ={lam:.2f}',
color=color, fontsize=8, fontweight='bold', ha='center')
except np.linalg.LinAlgError:
pass
ax.set_xlim(-5, 5); ax.set_ylim(-5, 5)
ax.axhline(0, color='black', lw=0.8, alpha=0.4)
ax.axvline(0, color='black', lw=0.8, alpha=0.4)
ax.set_aspect('equal')
ax.set_title(name, fontsize=10, fontweight='bold')
ax.set_xlabel('x', fontsize=9); ax.set_ylabel('y', fontsize=9)
ax.grid(True, alpha=0.15)
plt.suptitle('Linear Transformations and Their Eigenvectors\n'
'Grid shows deformed space; arrows = eigenvectors (scale = eigenvalue)',
fontsize=12, fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig('transformation_eigenvectors.png', dpi=150, bbox_inches='tight')
plt.show()
print("Saved: transformation_eigenvectors.png")
print("\n For the covariance matrix [[2, 1.5], [1.5, 2]]:")
A_cov = np.array([[2.0, 1.5], [1.5, 2.0]])
eig_v, eig_vec = np.linalg.eigh(A_cov)
eig_v = eig_v[::-1]; eig_vec = eig_vec[:, ::-1]
print(f" λ₁ = {eig_v[0]:.4f}, v₁ = [{eig_vec[0,0]:.4f}, {eig_vec[1,0]:.4f}]")
print(f" λ₂ = {eig_v[1]:.4f}, v₂ = [{eig_vec[0,1]:.4f}, {eig_vec[1,1]:.4f}]")
print(f" v₁ · v₂ = {eig_vec[:,0] @ eig_vec[:,1]:.6f} (orthogonal ✓)")
print(f" For a covariance matrix, v₁ points in the direction of max variance")
visualize_transformation_eigenvectors()The Spectral Theorem: Why PCA Works
The mathematical guarantee that PCA works as described comes from the Spectral Theorem: every real symmetric matrix has a complete set of real, orthogonal eigenvectors that form a basis for the entire space.
Covariance matrices are always symmetric (C = C^T) and positive semi-definite. The Spectral Theorem therefore guarantees:
- All eigenvalues are real (no complex numbers)
- All eigenvalues are ≥ 0 (variances are non-negative)
- Eigenvectors for different eigenvalues are orthogonal (PCs are perpendicular)
- There are exactly d eigenvectors for a d-dimensional covariance matrix (complete basis)
import numpy as np
def verify_spectral_theorem():
"""
Verify all Spectral Theorem guarantees numerically for a random
covariance matrix.
"""
np.random.seed(42)
d = 6
X = np.random.randn(200, d)
C = np.cov(X.T) # Guaranteed symmetric PSD
eigenvalues, eigenvectors = np.linalg.eigh(C)
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
print("=== Spectral Theorem Verification ===\n")
print(f" C = {d}×{d} covariance matrix from random {d}D data\n")
# Check 1: Real eigenvalues
eig_real = np.linalg.eigvals(C)
all_real = np.all(np.abs(eig_real.imag) < 1e-10)
print(f" 1. All eigenvalues real? {all_real} ✓")
print(f" Max |imaginary part|: {np.abs(eig_real.imag).max():.2e}")
# Check 2: Non-negative eigenvalues
all_nonneg = np.all(eigenvalues >= -1e-10)
print(f"\n 2. All eigenvalues ≥ 0? {all_nonneg} ✓")
print(f" Eigenvalues: {eigenvalues.round(4)}")
# Check 3: Orthogonal eigenvectors
VtV = eigenvectors.T @ eigenvectors
orth_err = np.linalg.norm(VtV - np.eye(d))
print(f"\n 3. Eigenvectors orthonormal? ||VᵀV - I|| = {orth_err:.2e}")
print(f" (Should be ~0) ✓")
# Check 4: Complete basis (d eigenvectors)
rank_V = np.linalg.matrix_rank(eigenvectors)
print(f"\n 4. Complete basis (rank = d)? rank(V) = {rank_V} = d = {d} ✓")
# Check 5: Spectral decomposition A = VΛVᵀ
C_reconstructed = eigenvectors @ np.diag(eigenvalues) @ eigenvectors.T
recon_err = np.linalg.norm(C - C_reconstructed)
print(f"\n 5. Spectral decomposition C = VΛVᵀ?")
print(f" ||C - VΛVᵀ|| = {recon_err:.2e} ✓")
print(f"\n All Spectral Theorem guarantees hold numerically.")
print(f" These properties are what make PCA mathematically sound.")
verify_spectral_theorem()Special Cases: Zero, Negative, and Complex Eigenvalues
Not all matrices have the nice non-negative real eigenvalues of covariance matrices. Understanding edge cases deepens understanding and helps debug numerical issues.
import numpy as np
def special_eigenvalue_cases():
"""
Demonstrate what zero, negative, and complex eigenvalues mean,
and why covariance matrices are always safe from these issues.
"""
print("=== Special Eigenvalue Cases ===\n")
# Case 1: Zero eigenvalue — rank-deficient matrix
A_sing = np.array([[2.0, 2.0], [1.0, 1.0]])
eig_s = np.linalg.eigvals(A_sing)
print(f" 1. Zero eigenvalue (singular matrix, rank < n)")
print(f" A = [[2,2],[1,1]]; eigenvalues = {eig_s.round(6)}")
print(f" Meaning: one direction gets collapsed to zero length")
print(f" In PCA: zero-eigenvalue components have no variance → discard\n")
# Case 2: Negative eigenvalue — indefinite matrix
A_indef = np.array([[1.0, 3.0], [3.0, 1.0]])
eig_i = np.linalg.eigvals(A_indef)
print(f" 2. Negative eigenvalue (indefinite, non-PSD matrix)")
print(f" A = [[1,3],[3,1]]; eigenvalues = {eig_i.round(6)}")
print(f" Meaning: the transformation REFLECTS along that eigenvector")
print(f" In PCA: covariance matrices are ALWAYS PSD")
print(f" Proof: vᵀCv = ||Xv||²/(n-1) ≥ 0 for any vector v\n")
# Case 3: Complex eigenvalues — rotation
A_rot = np.array([[0.0, -1.0], [1.0, 0.0]]) # 90° rotation
eig_r = np.linalg.eigvals(A_rot)
print(f" 3. Complex eigenvalues (rotation matrix)")
print(f" A = [[0,-1],[1,0]] (90° rotation); eigenvalues = {eig_r}")
print(f" Meaning: no real fixed direction — rotation has none")
print(f" In PCA: symmetric matrices always have REAL eigenvalues\n")
print(f" Summary: covariance matrices are:")
print(f" - Symmetric (C = Cᵀ) → real eigenvalues guaranteed")
print(f" - Positive semi-definite → non-negative eigenvalues guaranteed")
print(f" - Both → PCA is always well-defined and interpretable")
special_eigenvalue_cases()The safety of covariance matrices for PCA stems from a simple algebraic identity: for any vector v, v^T C v = v^T (X^T X/(n-1)) v = ‖Xv‖²/(n-1) ≥ 0. This proves the covariance matrix is always positive semi-definite, guaranteeing all eigenvalues are ≥ 0 and real. The tiny negative eigenvalues that occasionally appear in practice (e.g., −10^{-15}) are purely floating-point rounding errors and should be clipped to zero.
Summary
Eigenvalues and eigenvectors are the mathematical heart of PCA. An eigenvector of a matrix is a direction that the matrix only stretches (by factor λ, the eigenvalue) — never rotates. For the data’s covariance matrix, the eigenvectors are the principal component directions, and the eigenvalues are the amount of variance along each direction.
The characteristic equation det(A − λI) = 0 is the algebraic tool for finding eigenvalues: it produces a polynomial whose roots are the eigenvalues. For symmetric matrices (like covariance matrices), all eigenvalues are real and non-negative, and all eigenvectors are orthogonal — these properties make covariance eigendecomposition the perfect tool for PCA.
Five key properties of eigenvalues connect to PCA intuition: their sum equals the matrix trace (total variance); their product equals the determinant; eigenvectors of symmetric matrices are always orthogonal; together they reconstruct the matrix exactly (A = VΛV^T); and for positive semi-definite matrices (covariances), all eigenvalues are ≥ 0.
Power iteration is the conceptually simplest algorithm for finding the dominant eigenvector — repeatedly multiply by A and normalize. The QR algorithm (used in numpy.linalg.eigh) finds all eigenvalues simultaneously. SVD is the numerically preferred method for PCA specifically because it avoids squaring the data matrix and handles rank-deficient matrices more gracefully.
The Kaiser criterion (keep components with eigenvalue > 1) provides a fast heuristic for choosing k: components with eigenvalue > 1 explain more variance than a single original feature, making them worth keeping. Combined with the 95% cumulative variance threshold, these two criteria typically agree and provide a principled data-driven choice of k without any downstream evaluation.
The mathematical guarantee comes from the Spectral Theorem: every real symmetric matrix has a complete set of real, orthogonal eigenvectors. Covariance matrices are always symmetric and positive semi-definite, so all eigenvalues are real and non-negative, all eigenvectors are orthogonal, and together they span the entire feature space — exactly the properties PCA requires.
Understanding eigenvalues and eigenvectors makes every PCA decision more principled: the k-selection problem becomes “how many eigenvalues are large enough to be meaningful?”, the Kaiser criterion (λ > 1) gives a data-driven answer grounded in the mathematics, and debugging unexpected PCA behavior (near-zero eigenvalues, poor explained variance) becomes a matter of inspecting the eigenspectrum rather than guessing at parameters.








