Working with Image Data in Python

Learn to work with image data in Python. Master Pillow, OpenCV, image arrays, preprocessing, augmentation, feature extraction, and when to use CNNs vs. pre-trained models for computer vision.

Working with Image Data in Python

Images in Python are represented as NumPy arrays: a grayscale image is a 2D array of shape (height, width), an RGB color image is a 3D array of shape (height, width, 3), and a batch of images for deep learning is a 4D array of shape (batch_size, height, width, channels). The Pillow library handles loading, saving, and basic manipulation; OpenCV adds computer vision operations like edge detection and feature extraction; and deep learning frameworks (PyTorch, TensorFlow) use these arrays as model inputs. The most practical approach for most data scientists is to use a pre-trained CNN model from torchvision or Hugging Face as a feature extractor, then train a simple classifier on top — this avoids training from scratch and works well even with small datasets.

Introduction

Images are one of the most information-dense data types humans work with, and they’re increasingly central to data science. An e-commerce company analyzes product photos to automatically tag categories and detect defects. A healthcare system uses chest X-rays to assist radiologists. A manufacturing plant uses camera feeds to detect assembly line defects in real time. A retail chain counts in-store foot traffic from security camera footage. Satellite imagery tracks crop health, deforestation, and urban growth. Social media platforms classify billions of uploaded photos every day.

Unlike text (which has a clear unit — the word) or tabular data (which has a clear unit — the row), images are fundamentally spatial and continuous. The meaning of a pixel depends on its context — the surrounding pixels at the same scale and across multiple scales simultaneously. This is why deep learning, and convolutional neural networks in particular, has been so transformative for image analysis: CNNs are architecturally designed to exploit spatial structure.

This article introduces image data for data science practitioners: what images are as data, how to load and manipulate them in Python, essential preprocessing and augmentation techniques, feature extraction approaches from classical histograms to pre-trained deep networks, and a practical guide to when and how to use each tool.

What Is an Image as Data?

An image is a 2D grid of pixels (picture elements), where each pixel holds an intensity value. The key concepts:

Grayscale vs. Color Images

Grayscale: Each pixel holds a single intensity value from 0 (black) to 255 (white). Shape: (height, width).

RGB color: Each pixel holds three values — Red, Green, Blue channels — each from 0 to 255. Shape: (height, width, 3). The combination of the three channel values determines the perceived color.

RGBA: RGB plus an Alpha (transparency) channel. Shape: (height, width, 4). Common in PNG files.

BGR: OpenCV stores color images as Blue-Green-Red (not RGB). This is a common source of bugs when mixing PIL/Pillow (RGB) with OpenCV (BGR).

Data Types and Value Ranges

Plaintext
uint8  (0–255):      Standard image format, 1 byte per channel
float32 (0.0–1.0):   Common for neural network inputs
float32 (-1.0–1.0):  Normalized range used by some pretrained models
uint16 (0–65535):    Medical imaging (CT, MRI, X-ray) and raw camera sensor data
Python
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# A 4×4 grayscale image (handcrafted as a numpy array)
gray_array = np.array([
    [0,   64,  128, 192],
    [32,  96,  160, 224],
    [64,  128, 192, 255],
    [0,   0,   0,   0  ]
], dtype=np.uint8)

print(f"Grayscale array shape: {gray_array.shape}")  # (4, 4)
print(f"Dtype: {gray_array.dtype}")                  # uint8
print(f"Value range: [{gray_array.min()}, {gray_array.max()}]")

# A 4×4 RGB image (3 channels)
rgb_array = np.zeros((4, 4, 3), dtype=np.uint8)
rgb_array[:, :, 0] = 200    # Red channel
rgb_array[:, :, 1] = 50     # Green channel
rgb_array[:, :, 2] = 50     # Blue channel
rgb_array[2, 2] = [255, 255, 0]  # One yellow pixel

print(f"\nRGB array shape: {rgb_array.shape}")       # (4, 4, 3)
print(f"Channel 0 (red): {rgb_array[:,:,0]}")

Loading and Saving Images

Pillow (PIL)

Pillow is the standard Python imaging library — the right tool for loading, saving, and basic manipulation.

Python
from PIL import Image, ImageFilter, ImageEnhance, ImageDraw
import numpy as np
import matplotlib.pyplot as plt

# ── Load an image ─────────────────────────────────────────────────
img = Image.open("data/images/sample.jpg")

print(f"Format:    {img.format}")       # JPEG
print(f"Mode:      {img.mode}")         # RGB
print(f"Size:      {img.size}")         # (width, height) in pixels, e.g. (1920, 1080)

# Important: PIL uses (width, height), NumPy uses (height, width)
# Always be aware of this axis convention difference!

# ── Convert to numpy array ────────────────────────────────────────
img_array = np.array(img)
print(f"Array shape: {img_array.shape}")  # (height, width, 3)
print(f"Dtype:       {img_array.dtype}")  # uint8
print(f"Value range: [{img_array.min()}, {img_array.max()}]")

# ── Convert between modes ─────────────────────────────────────────
img_gray = img.convert("L")          # RGB → Grayscale
img_rgba = img.convert("RGBA")       # RGB → RGBA (add alpha channel)
img_rgb  = img_gray.convert("RGB")   # Grayscale → RGB (3 identical channels)

print(f"\nGrayscale shape: {np.array(img_gray).shape}")  # (height, width)
print(f"RGBA shape:      {np.array(img_rgba).shape}")    # (height, width, 4)

# ── Convert numpy array back to PIL ──────────────────────────────
# Note: ensure correct dtype and value range
array_uint8 = (img_array * 0.5).astype(np.uint8)  # Darken: multiply by 0.5
img_from_array = Image.fromarray(array_uint8)

# ── Save images ───────────────────────────────────────────────────
img.save("output/sample_copy.jpg", quality=90)     # JPEG with quality setting
img.save("output/sample_copy.png")                 # PNG (lossless)
img_gray.save("output/sample_gray.jpg")

# ── Display images ─────────────────────────────────────────────────
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
axes[0].imshow(img)
axes[0].set_title(f"Original RGB {img.size}")
axes[0].axis("off")
axes[1].imshow(img_gray, cmap="gray")
axes[1].set_title("Grayscale")
axes[1].axis("off")
plt.tight_layout()
plt.savefig("output/original_vs_gray.png", dpi=100, bbox_inches="tight")
plt.show()

OpenCV

OpenCV (cv2) is the industry-standard computer vision library — faster than Pillow for many operations, with a much broader range of image processing functions.

Python
import cv2
import numpy as np

# ── Load an image ─────────────────────────────────────────────────
# WARNING: OpenCV loads images in BGR order, not RGB!
img_bgr = cv2.imread("data/images/sample.jpg")    # Returns BGR array
img_bgr_gray = cv2.imread("data/images/sample.jpg", cv2.IMREAD_GRAYSCALE)

print(f"OpenCV shape: {img_bgr.shape}")   # (height, width, 3) in BGR

# ── Convert between color spaces ─────────────────────────────────
# Always convert BGR → RGB when displaying with matplotlib or PIL
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
img_gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
img_hsv  = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2HSV)  # Hue-Saturation-Value
img_lab  = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2LAB)  # L*a*b* perceptual color

# ── Save with OpenCV ──────────────────────────────────────────────
cv2.imwrite("output/sample_cv.jpg", img_bgr)   # Saves in BGR (correct for CV files)
cv2.imwrite("output/sample_gray_cv.jpg", img_gray)

# ── A critical helper: always convert OpenCV images before matplotlib ──
def cv2_to_display(img_bgr: np.ndarray) -> np.ndarray:
    """Convert OpenCV BGR image to RGB for display with matplotlib."""
    if len(img_bgr.shape) == 3 and img_bgr.shape[2] == 3:
        return cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
    return img_bgr  # Grayscale: no conversion needed

plt.imshow(cv2_to_display(img_bgr))
plt.axis("off")
plt.title("Correctly displayed OpenCV image")
plt.show()

Essential Image Operations

Resizing

Python
from PIL import Image
import cv2
import numpy as np

# Resizing is critical for ML — models expect fixed-size inputs

# ── Pillow resizing ────────────────────────────────────────────────
img = Image.open("data/images/sample.jpg")
original_size = img.size  # (width, height)

# Resize to exact dimensions (may distort aspect ratio)
img_224 = img.resize((224, 224), Image.LANCZOS)  # Best quality resampling

# Resize to fit within a max dimension (preserve aspect ratio)
def resize_with_aspect(img: Image.Image, max_dim: int) -> Image.Image:
    """Resize to fit within max_dim × max_dim, preserving aspect ratio."""
    w, h = img.size
    if w > h:
        new_w, new_h = max_dim, int(max_dim * h / w)
    else:
        new_w, new_h = int(max_dim * w / h), max_dim
    return img.resize((new_w, new_h), Image.LANCZOS)

img_maxed = resize_with_aspect(img, 512)

# Thumbnail: PIL's built-in aspect-ratio-preserving resize
img_thumb = img.copy()
img_thumb.thumbnail((128, 128))  # Modifies in-place

# ── Center crop: resize then crop to exact square ─────────────────
def resize_and_center_crop(img: Image.Image, target_size: int) -> Image.Image:
    """
    ImageNet-style preprocessing: resize shortest edge to target_size,
    then crop a centered square.
    """
    w, h = img.size
    # Resize so the shortest edge equals target_size
    if w < h:
        new_w, new_h = target_size, int(target_size * h / w)
    else:
        new_w, new_h = int(target_size * w / h), target_size

    img_resized = img.resize((new_w, new_h), Image.LANCZOS)

    # Center crop
    left = (new_w - target_size) // 2
    top  = (new_h - target_size) // 2
    return img_resized.crop((left, top, left + target_size, top + target_size))

img_224_cropped = resize_and_center_crop(img, 224)
print(f"After center crop: {img_224_cropped.size}")  # (224, 224)

# ── OpenCV resizing ────────────────────────────────────────────────
img_cv = cv2.imread("data/images/sample.jpg")
img_cv_resized = cv2.resize(img_cv, (224, 224),
                              interpolation=cv2.INTER_LANCZOS4)

Image Normalization for ML

Python
import numpy as np
from PIL import Image

def normalize_image(
    img_array: np.ndarray,
    method: str = "imagenet"
) -> np.ndarray:
    """
    Normalize an image array for neural network input.

    Parameters
    ----------
    img_array : np.ndarray
        Image array of shape (H, W, 3), dtype uint8 (0-255).
    method : str
        'imagenet': subtract ImageNet mean, divide by std (standard for pretrained models)
        'zero_one': scale to [0, 1]
        'minus_one_one': scale to [-1, 1]

    Returns
    -------
    np.ndarray
        Normalized array of dtype float32.
    """
    arr = img_array.astype(np.float32)

    if method == "zero_one":
        return arr / 255.0

    elif method == "minus_one_one":
        return (arr / 127.5) - 1.0

    elif method == "imagenet":
        # ImageNet statistics (computed over the full ImageNet dataset)
        IMAGENET_MEAN = np.array([0.485, 0.456, 0.406])
        IMAGENET_STD  = np.array([0.229, 0.224, 0.225])
        arr = arr / 255.0
        arr = (arr - IMAGENET_MEAN) / IMAGENET_STD
        return arr.astype(np.float32)

    else:
        raise ValueError(f"Unknown normalization method: {method}")


img = Image.open("data/images/sample.jpg").resize((224, 224))
img_array = np.array(img)

print(f"Original range: [{img_array.min()}, {img_array.max()}]")

img_01 = normalize_image(img_array, "zero_one")
print(f"[0,1] range: [{img_01.min():.3f}, {img_01.max():.3f}]")

img_imagenet = normalize_image(img_array, "imagenet")
print(f"ImageNet normalized range: [{img_imagenet.min():.3f}, {img_imagenet.max():.3f}]")

Image Augmentation: Expanding Small Datasets

Data augmentation artificially expands a training dataset by applying random transformations that preserve the semantic label. A photo of a dog is still a dog when flipped horizontally.

Python
from PIL import Image, ImageFilter, ImageEnhance
import numpy as np
import random

def augment_image(
    img: Image.Image,
    horizontal_flip_prob: float = 0.5,
    brightness_range: tuple = (0.7, 1.3),
    contrast_range: tuple = (0.8, 1.2),
    rotation_range: tuple = (-15, 15),
    noise_std: float = 0.02,
    random_crop_frac: float = 0.9
) -> Image.Image:
    """
    Apply random augmentations to a PIL image.

    Each augmentation is applied with the given probability or
    drawn from the specified range. Safe for training classifiers.

    Parameters
    ----------
    img : PIL.Image
        Input image to augment.
    horizontal_flip_prob : float
        Probability of horizontal flip (0 = never, 1 = always).
    brightness_range : tuple
        (min, max) brightness multiplier.
    contrast_range : tuple
        (min, max) contrast multiplier.
    rotation_range : tuple
        (min_degrees, max_degrees) rotation.
    noise_std : float
        Standard deviation of Gaussian noise (0 = no noise).
    random_crop_frac : float
        Fraction of original size to crop to (then resize back).

    Returns
    -------
    PIL.Image
        Augmented image (same size as input).
    """
    original_size = img.size

    # Horizontal flip
    if random.random() < horizontal_flip_prob:
        img = img.transpose(Image.FLIP_LEFT_RIGHT)

    # Brightness adjustment
    brightness_factor = random.uniform(*brightness_range)
    img = ImageEnhance.Brightness(img).enhance(brightness_factor)

    # Contrast adjustment
    contrast_factor = random.uniform(*contrast_range)
    img = ImageEnhance.Contrast(img).enhance(contrast_factor)

    # Random rotation
    angle = random.uniform(*rotation_range)
    img = img.rotate(angle, fillcolor=(128, 128, 128), expand=False)

    # Random crop (then resize back to original)
    w, h = img.size
    crop_w = int(w * random_crop_frac)
    crop_h = int(h * random_crop_frac)
    left  = random.randint(0, w - crop_w)
    top   = random.randint(0, h - crop_h)
    img   = img.crop((left, top, left + crop_w, top + crop_h))
    img   = img.resize(original_size, Image.LANCZOS)

    # Gaussian noise
    if noise_std > 0:
        arr = np.array(img).astype(np.float32)
        noise = np.random.normal(0, noise_std * 255, arr.shape)
        arr = np.clip(arr + noise, 0, 255).astype(np.uint8)
        img = Image.fromarray(arr)

    return img


# Visualize augmentations
img = Image.open("data/images/sample.jpg").resize((224, 224))

fig, axes = plt.subplots(2, 5, figsize=(15, 7))
axes[0, 0].imshow(img)
axes[0, 0].set_title("Original")
axes[0, 0].axis("off")

for i in range(1, 10):
    row, col = divmod(i, 5)
    augmented = augment_image(img)
    axes[row, col].imshow(augmented)
    axes[row, col].set_title(f"Augmented #{i}")
    axes[row, col].axis("off")

plt.suptitle("Data Augmentation Examples", fontsize=14, fontweight="bold")
plt.tight_layout()
plt.savefig("output/augmentation_examples.png", dpi=100)
plt.show()

Loading a Dataset of Images

For real projects, you need to load many images efficiently:

Python
import os
import numpy as np
from PIL import Image
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor
import pandas as pd

def load_image_dataset(
    image_dir: str,
    target_size: tuple = (224, 224),
    max_images: int = None,
    extensions: tuple = (".jpg", ".jpeg", ".png", ".bmp", ".webp")
) -> tuple:
    """
    Load all images from a directory into a numpy array.

    Assumes directory structure:
        image_dir/
            class_a/
                img001.jpg
                img002.jpg
            class_b/
                img003.jpg

    Parameters
    ----------
    image_dir : str
        Root directory containing class subdirectories.
    target_size : tuple
        (width, height) to resize all images to.
    max_images : int, optional
        Maximum images to load per class (for balanced sampling).
    extensions : tuple
        Accepted file extensions.

    Returns
    -------
    tuple
        (images_array, labels_array, class_names, file_paths)
    """
    image_dir = Path(image_dir)
    class_dirs = sorted([d for d in image_dir.iterdir() if d.is_dir()])
    class_names = [d.name for d in class_dirs]

    print(f"Found {len(class_names)} classes: {class_names}")

    images, labels, paths = [], [], []

    for class_idx, class_dir in enumerate(class_dirs):
        # Find all image files
        class_files = [
            f for f in class_dir.iterdir()
            if f.suffix.lower() in extensions
        ]

        if max_images:
            class_files = class_files[:max_images]

        print(f"  {class_dir.name}: {len(class_files)} images")

        for filepath in class_files:
            try:
                img = Image.open(filepath).convert("RGB")
                img = img.resize(target_size, Image.LANCZOS)
                images.append(np.array(img, dtype=np.uint8))
                labels.append(class_idx)
                paths.append(str(filepath))
            except Exception as e:
                print(f"    Skipped {filepath.name}: {e}")

    images_array = np.stack(images) if images else np.array([])
    labels_array = np.array(labels)

    print(f"\nDataset loaded: {len(images_array)} images")
    print(f"Array shape: {images_array.shape}")   # (N, H, W, 3)
    print(f"Class distribution: {pd.Series(labels_array).value_counts().to_dict()}")

    return images_array, labels_array, class_names, paths


# Example usage
# images, labels, class_names, paths = load_image_dataset(
#     "data/images/product_categories/",
#     target_size=(224, 224),
#     max_images=500  # 500 per class
# )

Classical Feature Extraction

Before deep learning, computer vision relied on hand-crafted features. These are still useful for small datasets, interpretability, and non-deep learning pipelines.

Color Histograms

Python
import cv2
import numpy as np
from sklearn.preprocessing import normalize

def extract_color_histogram(
    img_array: np.ndarray,
    bins: int = 32,
    color_space: str = "RGB"
) -> np.ndarray:
    """
    Extract a normalized color histogram as a fixed-length feature vector.

    For each color channel, compute a histogram of pixel values.
    Concatenate all channel histograms into one vector.

    Parameters
    ----------
    img_array : np.ndarray
        Image array of shape (H, W, 3), uint8.
    bins : int
        Number of histogram bins per channel.
    color_space : str
        'RGB', 'HSV', or 'LAB'.

    Returns
    -------
    np.ndarray
        Normalized feature vector of length (3 × bins).
    """
    if color_space == "HSV":
        img_cv = cv2.cvtColor(img_array, cv2.COLOR_RGB2HSV)
    elif color_space == "LAB":
        img_cv = cv2.cvtColor(img_array, cv2.COLOR_RGB2LAB)
    else:
        img_cv = img_array

    features = []
    for channel_idx in range(3):
        hist = cv2.calcHist([img_cv], [channel_idx], None,
                             [bins], [0, 256])
        features.append(hist.flatten())

    feature_vector = np.concatenate(features)
    # L2 normalize so magnitude doesn't dominate
    return feature_vector / (feature_vector.sum() + 1e-10)


# Extract histograms from a batch of images
def batch_color_histograms(images: np.ndarray, bins: int = 32) -> np.ndarray:
    features = np.array([
        extract_color_histogram(img, bins=bins) for img in images
    ])
    print(f"Color histogram features: {features.shape}")  # (N, 3*bins)
    return features

Edge Detection Features

Python
import cv2
import numpy as np

def extract_edge_features(img_array: np.ndarray) -> np.ndarray:
    """
    Extract Canny edge detection features.
    Returns edge density and directional statistics.
    """
    # Convert to grayscale if color
    if len(img_array.shape) == 3:
        gray = cv2.cvtColor(img_array, cv2.COLOR_RGB2GRAY)
    else:
        gray = img_array

    # Apply Gaussian blur to reduce noise before edge detection
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)

    # Canny edge detector
    edges = cv2.Canny(blurred, threshold1=50, threshold2=150)

    # Feature: fraction of pixels that are edges
    edge_density = edges.mean() / 255.0

    # Gradient magnitudes and directions via Sobel operators
    grad_x = cv2.Sobel(gray, cv2.CV_64F, 1, 0, ksize=3)
    grad_y = cv2.Sobel(gray, cv2.CV_64F, 0, 1, ksize=3)
    magnitude = np.sqrt(grad_x**2 + grad_y**2)

    return np.array([
        edge_density,
        magnitude.mean(),
        magnitude.std(),
        magnitude.max() / 255.0
    ])


def extract_texture_features(img_array: np.ndarray) -> np.ndarray:
    """Extract Local Binary Pattern (LBP) texture features."""
    if len(img_array.shape) == 3:
        gray = cv2.cvtColor(img_array, cv2.COLOR_RGB2GRAY)
    else:
        gray = img_array

    # GLCM-based texture: variance and entropy of pixel blocks
    h, w = gray.shape
    block_size = 16
    variances = []
    for i in range(0, h - block_size, block_size):
        for j in range(0, w - block_size, block_size):
            block = gray[i:i+block_size, j:j+block_size].astype(float)
            variances.append(block.var())

    return np.array([
        np.mean(variances),
        np.std(variances),
        np.percentile(variances, 75),
        np.percentile(variances, 25)
    ])

Deep Learning Feature Extraction with Pre-Trained Models

The most powerful and practical approach for most data science image tasks is to extract features using a pre-trained CNN (trained on ImageNet’s 1.2M images) and then train a simple classifier on those features. This approach — called transfer learning — works exceptionally well even with small datasets (hundreds to a few thousand images).

Python
import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
import numpy as np
import pandas as pd
from pathlib import Path

def build_feature_extractor(model_name: str = "resnet50") -> tuple:
    """
    Build a pre-trained CNN feature extractor by removing the final classification layer.

    Parameters
    ----------
    model_name : str
        Model architecture: 'resnet50', 'resnet18', 'efficientnet_b0', 'mobilenet_v3_small'.

    Returns
    -------
    tuple
        (model, transform, feature_dim)
    """
    if model_name == "resnet50":
        model  = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1)
        # Remove the final fully connected layer (classifier)
        model  = torch.nn.Sequential(*list(model.children())[:-1])
        feat_dim = 2048

    elif model_name == "resnet18":
        model  = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)
        model  = torch.nn.Sequential(*list(model.children())[:-1])
        feat_dim = 512

    elif model_name == "efficientnet_b0":
        model  = models.efficientnet_b0(weights=models.EfficientNet_B0_Weights.IMAGENET1K_V1)
        model.classifier = torch.nn.Identity()
        feat_dim = 1280

    elif model_name == "mobilenet_v3_small":
        model  = models.mobilenet_v3_small(
            weights=models.MobileNet_V3_Small_Weights.IMAGENET1K_V1
        )
        model.classifier = torch.nn.Identity()
        feat_dim = 576

    else:
        raise ValueError(f"Unknown model: {model_name}")

    model.eval()  # Set to evaluation mode (disables dropout/batchnorm training)

    # ImageNet preprocessing transform (must match what the model was trained on)
    transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),                         # (H,W,C) uint8 → (C,H,W) float32 [0,1]
        transforms.Normalize(                          # ImageNet normalization
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225]
        )
    ])

    print(f"Feature extractor: {model_name} | Feature dim: {feat_dim}")
    return model, transform, feat_dim


def extract_features_batch(
    image_paths: list,
    model,
    transform,
    feat_dim: int,
    batch_size: int = 32,
    device: str = None
) -> np.ndarray:
    """
    Extract deep features from a list of image file paths.

    Parameters
    ----------
    image_paths : list
        Paths to image files.
    model : torch.nn.Module
        Pre-trained feature extractor (final layer removed).
    transform : transforms.Compose
        Preprocessing transform matching the model's training.
    feat_dim : int
        Output feature dimension.
    batch_size : int
        Images per batch for GPU efficiency.
    device : str, optional
        'cuda', 'mps', or 'cpu'. Auto-detected if None.

    Returns
    -------
    np.ndarray
        Feature matrix of shape (n_images, feat_dim).
    """
    if device is None:
        device = "cuda" if torch.cuda.is_available() else \
                 "mps"  if torch.backends.mps.is_available() else "cpu"

    model = model.to(device)
    all_features = []

    n_batches = (len(image_paths) + batch_size - 1) // batch_size

    for batch_idx in range(n_batches):
        batch_paths = image_paths[batch_idx * batch_size :
                                   (batch_idx + 1) * batch_size]

        tensors = []
        for path in batch_paths:
            try:
                img = Image.open(path).convert("RGB")
                tensors.append(transform(img))
            except Exception as e:
                print(f"  Warning: could not load {path}: {e}")
                tensors.append(torch.zeros(3, 224, 224))  # Placeholder

        batch_tensor = torch.stack(tensors).to(device)

        with torch.no_grad():               # No gradients needed for inference
            features = model(batch_tensor)

        # Flatten to (batch_size, feat_dim)
        features = features.squeeze().cpu().numpy()
        if features.ndim == 1:
            features = features.reshape(1, -1)

        all_features.append(features)

        if (batch_idx + 1) % 10 == 0 or batch_idx == n_batches - 1:
            n_done = min((batch_idx + 1) * batch_size, len(image_paths))
            print(f"  Extracted features: {n_done:,}/{len(image_paths):,}")

    return np.vstack(all_features)

Training a Classifier on Extracted Features

Python
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import numpy as np

def train_image_classifier(
    features: np.ndarray,
    labels: np.ndarray,
    class_names: list,
    classifier: str = "logistic",
    cv_folds: int = 5
) -> dict:
    """
    Train a classifier on pre-extracted deep features.

    This is the transfer learning pattern: powerful pretrained
    CNN extracts features, simple linear model classifies.

    Parameters
    ----------
    features : np.ndarray
        Feature matrix (N, feat_dim) from extract_features_batch.
    labels : np.ndarray
        Class indices.
    class_names : list
        Human-readable class names.
    classifier : str
        'logistic', 'svm', or 'knn'.
    cv_folds : int
        Number of cross-validation folds.

    Returns
    -------
    dict
        Results including CV accuracy and trained model.
    """
    print(f"Training {classifier} classifier on {len(features):,} images "
          f"({len(class_names)} classes)...")

    if classifier == "logistic":
        clf = Pipeline([
            ("scaler",     StandardScaler()),
            ("classifier", LogisticRegression(
                C=1.0, max_iter=1000,
                multi_class="auto",
                random_state=42
            ))
        ])
    elif classifier == "svm":
        clf = Pipeline([
            ("scaler",     StandardScaler()),
            ("classifier", SVC(
                C=1.0, kernel="rbf",
                probability=True,
                random_state=42
            ))
        ])

    cv_scores = cross_val_score(clf, features, labels, cv=cv_folds,
                                  scoring="accuracy", n_jobs=-1)

    clf.fit(features, labels)  # Fit on all data after CV evaluation

    print(f"\nCross-validation accuracy: {cv_scores.mean():.3f} ± {cv_scores.std():.3f}")
    print(f"CV scores by fold: {np.round(cv_scores, 3)}")

    return {
        "model":     clf,
        "cv_scores": cv_scores,
        "cv_mean":   cv_scores.mean(),
        "cv_std":    cv_scores.std()
    }


# Complete pipeline for a new image classification task
def run_transfer_learning_pipeline(
    image_dir: str,
    model_name: str = "resnet50",
    target_size: int = 224,
    batch_size: int = 32,
    max_images_per_class: int = None
) -> dict:
    """
    Complete image classification pipeline using transfer learning.

    1. Load images from directory (class folders)
    2. Extract deep features with pre-trained CNN
    3. Train logistic regression classifier on features
    4. Evaluate with cross-validation

    Returns trained model and evaluation metrics.
    """
    # Step 1: Gather image paths and labels
    image_dir_path = Path(image_dir)
    class_dirs     = sorted([d for d in image_dir_path.iterdir() if d.is_dir()])
    class_names    = [d.name for d in class_dirs]

    all_paths, all_labels = [], []
    for class_idx, class_dir in enumerate(class_dirs):
        paths = list(class_dir.glob("*.jpg")) + list(class_dir.glob("*.png"))
        if max_images_per_class:
            paths = paths[:max_images_per_class]
        all_paths.extend(str(p) for p in paths)
        all_labels.extend([class_idx] * len(paths))

    labels = np.array(all_labels)
    print(f"Dataset: {len(all_paths)} images, {len(class_names)} classes")

    # Step 2: Build feature extractor
    model, transform, feat_dim = build_feature_extractor(model_name)

    # Step 3: Extract features
    print("\nExtracting CNN features...")
    features = extract_features_batch(all_paths, model, transform,
                                       feat_dim, batch_size=batch_size)

    # Step 4: Train classifier
    print("\nTraining classifier...")
    results = train_image_classifier(features, labels, class_names)

    print(f"\nPipeline complete!")
    print(f"  Feature dim: {feat_dim}")
    print(f"  CV accuracy: {results['cv_mean']:.3f}")

    return {
        "features":    features,
        "labels":      labels,
        "class_names": class_names,
        "model":       results["model"],
        "cv_accuracy": results["cv_mean"]
    }

Image Analysis Without Classification

Not all image tasks are about classification. Common analytical tasks include:

Image Similarity Search

Python
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def find_similar_images(
    query_feature: np.ndarray,
    database_features: np.ndarray,
    image_paths: list,
    top_k: int = 5
) -> list:
    """
    Find the most similar images to a query image using feature cosine similarity.

    Parameters
    ----------
    query_feature : np.ndarray
        Feature vector for the query image (1D or 1×D).
    database_features : np.ndarray
        Feature matrix for all database images (N×D).
    image_paths : list
        File paths corresponding to database_features rows.
    top_k : int
        Number of most similar images to return.

    Returns
    -------
    list of (similarity_score, image_path) tuples
    """
    if query_feature.ndim == 1:
        query_feature = query_feature.reshape(1, -1)

    similarities = cosine_similarity(query_feature, database_features)[0]
    top_indices  = np.argsort(similarities)[::-1][:top_k]

    return [(float(similarities[i]), image_paths[i]) for i in top_indices]


# Usage: find products visually similar to a query product image
# query_feat = features[query_idx]  # Feature vector from extract_features_batch
# similar = find_similar_images(query_feat, features, all_paths, top_k=10)
# print("Most similar images:")
# for score, path in similar:
#     print(f"  [{score:.3f}] {path}")

Anomaly Detection in Images

Python
import numpy as np
from sklearn.covariance import EllipticEnvelope
from sklearn.ensemble import IsolationForest

def detect_image_anomalies(
    features: np.ndarray,
    image_paths: list,
    method: str = "isolation_forest",
    contamination: float = 0.05
) -> pd.DataFrame:
    """
    Detect anomalous images (defective products, corrupted files, outliers)
    using outlier detection on extracted CNN features.

    Parameters
    ----------
    features : np.ndarray
        CNN feature matrix (N, feat_dim).
    image_paths : list
        File paths corresponding to rows.
    method : str
        'isolation_forest' or 'elliptic_envelope'.
    contamination : float
        Expected fraction of anomalies (0.05 = 5%).

    Returns
    -------
    pd.DataFrame
        Results with anomaly scores and labels.
    """
    if method == "isolation_forest":
        detector = IsolationForest(
            contamination=contamination,
            random_state=42,
            n_jobs=-1
        )
    else:
        detector = EllipticEnvelope(contamination=contamination)

    anomaly_labels  = detector.fit_predict(features)  # -1=anomaly, 1=normal
    anomaly_scores  = detector.score_samples(features)  # Higher = more normal

    results = pd.DataFrame({
        "path":          image_paths,
        "anomaly_score": anomaly_scores,
        "is_anomaly":    anomaly_labels == -1
    }).sort_values("anomaly_score")

    n_anomalies = results["is_anomaly"].sum()
    print(f"Detected {n_anomalies} anomalous images "
          f"({n_anomalies/len(results)*100:.1f}%)")
    print("\nTop 5 most anomalous:")
    print(results.head(5)[["path", "anomaly_score"]].to_string(index=False))

    return results

Dealing with Image Datasets at Scale

Python
import pandas as pd
import numpy as np
from pathlib import Path
from PIL import Image
import hashlib
import json

def audit_image_dataset(image_dir: str) -> pd.DataFrame:
    """
    Audit an image dataset for quality issues:
    - Missing / corrupted files
    - Duplicates (by file hash)
    - Extreme sizes (too small, too large)
    - Wrong aspect ratios
    - Corrupt files (can't be opened)
    """
    image_dir  = Path(image_dir)
    all_files  = list(image_dir.rglob("*.jpg")) + \
                 list(image_dir.rglob("*.jpeg")) + \
                 list(image_dir.rglob("*.png"))

    print(f"Found {len(all_files):,} image files")

    records = []
    for filepath in all_files:
        record = {
            "path":      str(filepath),
            "filename":  filepath.name,
            "class":     filepath.parent.name,
            "size_bytes":filepath.stat().st_size,
            "width":     None,
            "height":    None,
            "mode":      None,
            "file_hash": None,
            "is_corrupt": False,
            "issue":     None
        }

        try:
            img = Image.open(filepath)
            record["width"]  = img.width
            record["height"] = img.height
            record["mode"]   = img.mode

            # Detect issues
            if img.width < 32 or img.height < 32:
                record["issue"] = "too_small"
            elif img.width > 4096 or img.height > 4096:
                record["issue"] = "very_large"
            elif record["size_bytes"] < 1000:
                record["issue"] = "tiny_file"

            # Hash for duplicate detection (hash of raw bytes)
            record["file_hash"] = hashlib.md5(
                filepath.read_bytes()
            ).hexdigest()

        except Exception as e:
            record["is_corrupt"] = True
            record["issue"]      = f"corrupt: {type(e).__name__}"

        records.append(record)

    df = pd.DataFrame(records)

    # Flag duplicates
    hash_counts     = df["file_hash"].value_counts()
    df["is_duplicate"] = df["file_hash"].map(hash_counts) > 1

    # Summary
    print(f"\n{'='*50}")
    print(f"Image Dataset Audit: {image_dir}")
    print(f"{'='*50}")
    print(f"  Total files:     {len(df):,}")
    print(f"  Corrupt files:   {df['is_corrupt'].sum():,}")
    print(f"  Duplicates:      {df['is_duplicate'].sum():,}")
    print(f"  With issues:     {df['issue'].notna().sum():,}")

    if df["width"].notna().any():
        print(f"  Width  range: [{df['width'].min():.0f}, {df['width'].max():.0f}]")
        print(f"  Height range: [{df['height'].min():.0f}, {df['height'].max():.0f}]")

    issue_counts = df["issue"].value_counts()
    if len(issue_counts) > 0:
        print(f"\n  Issues found:")
        for issue, count in issue_counts.items():
            print(f"    {issue}: {count}")

    return df

Summary

Image data in Python is fundamentally NumPy arrays — understanding the shape conventions (H, W) for grayscale and (H, W, 3) for RGB, the dtype conventions (uint8 for raw images, float32 for normalized inputs), and the BGR vs. RGB difference between OpenCV and PIL are the foundational concepts that prevent the majority of bugs.

The practical hierarchy for image data science tasks: use Pillow for loading, saving, and basic manipulation; OpenCV for more sophisticated image processing operations; and PyTorch with pre-trained models for feature extraction and classification. The transfer learning pattern — extract features with a pre-trained ResNet or EfficientNet, train a simple logistic regression classifier on those features — is the most important technique to internalize: it works with small datasets, trains in minutes, and achieves strong performance across a wide range of visual recognition tasks without requiring a GPU for training (only for feature extraction).

For applications beyond classification — image similarity search, anomaly detection in product photos, or dataset auditing — the same CNN feature extraction approach applies, combined with standard sklearn tools for the downstream task.

Key Takeaways

  • Images are NumPy arrays: grayscale has shape (H, W), color RGB has shape (H, W, 3), batches have shape (N, H, W, 3) — all values are uint8 (0–255) for raw images and float32 for neural network inputs
  • Pillow uses RGB; OpenCV uses BGR — always call cv2.cvtColor(img, cv2.COLOR_BGR2RGB) before displaying or passing an OpenCV image to Pillow or matplotlib
  • Pillow uses (width, height); NumPy uses (height, width) — these flipped axis conventions cause silent bugs; always verify with .shape and .size when converting
  • Normalize images before passing to neural networks: divide by 255 for [0,1] range, then subtract ImageNet mean [0.485, 0.456, 0.406] and divide by std [0.229, 0.224, 0.225] for pre-trained models
  • Transfer learning (pre-trained CNN + linear classifier) is the most practical approach for image classification: build a feature extractor by removing the final layer from a pre-trained ResNet or EfficientNet, extract 512–2048 dimensional features from your images, then train a logistic regression or SVM on those features
  • Data augmentation (random flips, rotations, brightness/contrast jitter, random crops) is essential for preventing overfitting when training on small image datasets — always apply augmentation only to the training set, never to validation or test
  • torch.no_grad() context manager is required during feature extraction and inference to disable gradient computation and reduce memory usage by 2–3×
  • For datasets of thousands of images, always audit before training: check for corrupted files, duplicates (by file hash), extreme sizes, and class imbalance — silent data quality problems in image datasets are as common as in tabular data
Share:
Subscribe
Notify of
0 Comments

Discover More

Introduction to Gradient Descent Optimization

Introduction to Gradient Descent Optimization

Learn gradient descent, the optimization algorithm that trains machine learning models. Understand batch, stochastic, and…

Starlink Provides Emergency Internet Access to Venezuela Following Political Crisis

SpaceX’s Starlink provides free satellite internet access across Venezuela through February 3, 2026, bypassing infrastructure…

5 Types of Data Scientists and What They Actually Do

Discover the 5 main types of data scientists, from analytics-focused to engineering-focused roles. Learn what…

How Does Artificial Intelligence Work?

Explore how AI works, from training and learning techniques to ethical implications and industry applications.…

Alteryx and Collibra Integrate to Deliver Full Data Lineage Traceability for AI Analytics

Alteryx and Collibra Integrate to Deliver Full Data Lineage Traceability for AI Analytics

Alteryx and Collibra have launched a deep integration that gives enterprises complete data lineage visibility…

Why Arduino Changed Hobbyist Robotics Forever

Discover how Arduino revolutionized hobbyist robotics by making microcontrollers accessible to everyone. Learn why Arduino…

Click For More
0
Would love your thoughts, please comment.x
()
x