Python handles dates and times through three layers: the standard library’s datetime module (for creating, comparing, and doing arithmetic with individual date/time objects), the dateutil library (for flexible parsing of almost any date string format and relative date arithmetic), and pandas (for vectorized datetime operations on entire columns via pd.to_datetime() and the .dt accessor). The single most important rule: always work in UTC internally and convert to local time only for display, because naive datetimes — those without timezone information — cause silent bugs when your data spans multiple time zones or crosses a daylight saving time boundary.
Introduction
Dates and times are among the most common data types in real-world datasets, and they cause a disproportionate share of bugs, silent errors, and data quality problems in data science pipelines. An e-commerce company analyzing customer behavior needs to correctly attribute purchases to the right day in each customer’s local time zone. A financial analyst computing trading signals must handle market holidays and business day offsets precisely. A sensor data pipeline must align readings across devices in different time zones to the same UTC clock. A retention analyst computing “days since first purchase” must correctly handle the varying lengths of months and years.
Getting dates and times right requires understanding several distinct concepts: the difference between a date, a time, and a datetime; the crucial difference between timezone-aware and timezone-naive objects; how to parse dates from strings (the most common real-world task); how to perform arithmetic on dates (what is 30 days after this date? how many business days between these two dates?); and how to format dates back into strings for output.
This article covers all of it — from the fundamental Python datetime module through dateutil, pandas datetime operations, timezone handling, and the common pitfalls that trip up even experienced practitioners.
Python’s datetime Module: The Foundation
Python’s standard library datetime module provides four key classes:
| Class | Stores | Example |
|---|---|---|
date | Year, month, day | date(2024, 9, 15) |
time | Hour, minute, second, microsecond | time(14, 30, 0) |
datetime | Date + time combined | datetime(2024, 9, 15, 14, 30, 0) |
timedelta | A duration (difference between two datetimes) | timedelta(days=30, hours=6) |
Creating Date and Datetime Objects
from datetime import date, time, datetime, timedelta
# ── date objects ───────────────────────────────────────────────────
today = date.today()
print(today) # 2024-09-15
print(type(today)) # <class 'datetime.date'>
specific_date = date(2024, 9, 15) # year, month, day
print(specific_date.year) # 2024
print(specific_date.month) # 9
print(specific_date.day) # 15
print(specific_date.weekday()) # 6 (0=Monday, 6=Sunday)
print(specific_date.isoweekday()) # 7 (1=Monday, 7=Sunday — ISO standard)
print(specific_date.strftime("%A")) # "Sunday"
print(specific_date.isoformat()) # "2024-09-15"
# ── time objects ───────────────────────────────────────────────────
t = time(14, 30, 45, 123456) # hour, minute, second, microsecond
print(t) # 14:30:45.123456
print(t.hour) # 14
print(t.minute) # 30
print(t.second) # 45
print(t.microsecond) # 123456
# ── datetime objects ───────────────────────────────────────────────
now = datetime.now() # Local time (no timezone info)
utc_now = datetime.utcnow() # UTC time (also no timezone info — use with caution)
dt = datetime(2024, 9, 15, 14, 30, 45)
print(dt) # 2024-09-15 14:30:45
print(dt.date()) # 2024-09-15 (extracts the date component)
print(dt.time()) # 14:30:45 (extracts the time component)
print(dt.year) # 2024
print(dt.month) # 9
print(dt.day) # 15
print(dt.hour) # 14
print(dt.minute) # 30
print(dt.second) # 45
# Combined from date + time
d = date(2024, 9, 15)
t = time(14, 30, 0)
combined = datetime.combine(d, t)
print(combined) # 2024-09-15 14:30:00Parsing Dates from Strings: strptime
strptime() (“string parse time”) converts a string to a datetime using a format code:
from datetime import datetime
# strptime(string, format)
dt = datetime.strptime("2024-09-15", "%Y-%m-%d")
print(dt) # 2024-09-15 00:00:00
dt = datetime.strptime("15/09/2024 14:30:45", "%d/%m/%Y %H:%M:%S")
print(dt) # 2024-09-15 14:30:45
dt = datetime.strptime("Sep 15, 2024", "%b %d, %Y")
print(dt) # 2024-09-15 00:00:00
dt = datetime.strptime("Sunday, September 15, 2024", "%A, %B %d, %Y")
print(dt) # 2024-09-15 00:00:00
dt = datetime.strptime("2024-09-15T14:30:45", "%Y-%m-%dT%H:%M:%S")
print(dt) # 2024-09-15 14:30:45The Format Code Reference
| Code | Meaning | Example |
|---|---|---|
%Y | 4-digit year | 2024 |
%y | 2-digit year | 24 |
%m | Month (01–12) | 09 |
%d | Day (01–31) | 15 |
%H | Hour 24h (00–23) | 14 |
%I | Hour 12h (01–12) | 02 |
%M | Minute (00–59) | 30 |
%S | Second (00–59) | 45 |
%f | Microseconds | 123456 |
%p | AM or PM | PM |
%A | Full weekday name | Sunday |
%a | Abbreviated weekday | Sun |
%B | Full month name | September |
%b | Abbreviated month | Sep |
%j | Day of year (001–366) | 259 |
%W | Week number (Mon start) | 37 |
%Z | Timezone name | UTC, EST |
%z | UTC offset | +0000, -0500 |
Formatting Dates to Strings: strftime
strftime() (“string format time”) converts a datetime to a string:
from datetime import datetime
dt = datetime(2024, 9, 15, 14, 30, 45)
print(dt.strftime("%Y-%m-%d")) # "2024-09-15"
print(dt.strftime("%d/%m/%Y")) # "15/09/2024"
print(dt.strftime("%B %d, %Y")) # "September 15, 2024"
print(dt.strftime("%A, %B %d, %Y")) # "Sunday, September 15, 2024"
print(dt.strftime("%Y-%m-%dT%H:%M:%S")) # "2024-09-15T14:30:45" (ISO 8601)
print(dt.strftime("%m/%d/%Y %I:%M %p")) # "09/15/2024 02:30 PM"
print(dt.isoformat()) # "2024-09-15T14:30:45" (ISO 8601 shortcut)Date Arithmetic with timedelta
timedelta represents a duration — the difference between two datetimes, or an offset to add or subtract from a datetime.
from datetime import datetime, date, timedelta
# Creating timedelta objects
one_day = timedelta(days=1)
one_week = timedelta(weeks=1)
ninety_days = timedelta(days=90)
one_hour = timedelta(hours=1)
ninety_minutes = timedelta(minutes=90)
mixed = timedelta(days=2, hours=6, minutes=30)
# timedelta components
delta = timedelta(days=2, hours=6, minutes=30, seconds=15)
print(delta.days) # 2
print(delta.seconds) # 23415 (total seconds in the sub-day part: 6*3600 + 30*60 + 15)
print(delta.total_seconds()) # 196215.0 (total seconds including days)
# ── Arithmetic on dates ────────────────────────────────────────────
today = date.today()
tomorrow = today + timedelta(days=1)
yesterday = today - timedelta(days=1)
next_month = today + timedelta(days=30) # NOTE: not always the same month-day
next_year = today + timedelta(days=365) # NOTE: not always same calendar date
# ── Arithmetic on datetimes ────────────────────────────────────────
now = datetime(2024, 9, 15, 14, 30, 0)
later = now + timedelta(hours=3, minutes=45)
print(later) # 2024-09-15 18:15:00
deadline = datetime(2024, 12, 31, 23, 59, 59)
time_remaining = deadline - now
print(f"Days remaining: {time_remaining.days}")
print(f"Total hours remaining: {time_remaining.total_seconds() / 3600:.1f}")
# ── Difference between two dates ──────────────────────────────────
signup_date = date(2022, 3, 14)
today = date(2024, 9, 15)
days_active = (today - signup_date).days
print(f"Days since signup: {days_active}") # 916
# Months and years — timedelta doesn't do these directly
# Use relativedelta from dateutil (covered in next section)Comparing Dates and Datetimes
from datetime import datetime
dt1 = datetime(2024, 9, 15, 10, 0, 0)
dt2 = datetime(2024, 9, 15, 14, 30, 0)
dt3 = datetime(2024, 9, 16, 10, 0, 0)
print(dt1 < dt2) # True
print(dt1 > dt2) # False
print(dt1 == dt2) # False
# Comparing date to datetime: TypeError — they're different types
# Convert to same type first
from datetime import date
d = date(2024, 9, 15)
# dt1 == d → TypeError: can't compare datetime.datetime to datetime.date
# Fix:
print(dt1.date() == d) # Truedateutil: Flexible Parsing and Relative Deltas
The dateutil library fills two major gaps in the standard datetime module: intelligent parsing of almost any date string format without specifying the format, and relative date arithmetic (add 1 month, add 1 year, find next Friday).
pip install python-dateutildateutil.parser: Parse Any Date String
from dateutil import parser
# Parse virtually any date format without specifying format codes
dates = [
"2024-09-15",
"15/09/2024",
"September 15, 2024",
"Sep 15, 2024",
"15 Sep 2024",
"09/15/2024",
"2024.09.15",
"Sunday, September 15, 2024",
"2024-09-15T14:30:45",
"2024-09-15T14:30:45Z",
"2024-09-15T14:30:45+05:30",
"15-Sep-24",
"20240915", # Compact ISO format
]
for ds in dates:
try:
dt = parser.parse(ds)
print(f"{ds:45s} → {dt}")
except Exception as e:
print(f"{ds:45s} → ERROR: {e}")Important caveat: dateutil.parser.parse() uses heuristics and can guess wrong. The string "01/02/03" could be Jan 2, 2003 or Feb 1, 2003 or Feb 3, 2001 depending on locale assumptions. Use the dayfirst and yearfirst parameters to disambiguate:
from dateutil import parser
ambiguous = "01/02/03"
# Default (US convention: MM/DD/YY)
print(parser.parse(ambiguous)) # 2003-01-02
# European convention: DD/MM/YY
print(parser.parse(ambiguous, dayfirst=True)) # 2003-02-01
# Year first: YY/MM/DD
print(parser.parse(ambiguous, yearfirst=True)) # 2001-02-03For production code, prefer datetime.strptime() with an explicit format for maximum predictability. Use dateutil.parser.parse() for interactive work or when format varies.
relativedelta: Calendar-Aware Arithmetic
timedelta can’t add “one month” because months have variable lengths. relativedelta handles this correctly:
from dateutil.relativedelta import relativedelta
from datetime import datetime, date
dt = datetime(2024, 1, 31) # January 31
# Add 1 month with timedelta (WRONG — goes into March)
# timedelta(days=31) → 2024-03-02 (overshoots!)
# Add 1 month with relativedelta (CORRECT — snaps to valid date)
next_month = dt + relativedelta(months=1)
print(next_month) # 2024-02-29 (snaps to last day of Feb in leap year 2024)
# Add 1 year
next_year = dt + relativedelta(years=1)
print(next_year) # 2025-01-31
# Add 6 months and 15 days
future = dt + relativedelta(months=6, days=15)
print(future) # 2024-08-15
# Subtract 3 months
past = dt - relativedelta(months=3)
print(past) # 2023-10-31
# Calculate age from birthdate
birthdate = date(1990, 7, 4)
today = date(2024, 9, 15)
age = relativedelta(today, birthdate)
print(f"Age: {age.years} years, {age.months} months, {age.days} days")
# Age: 34 years, 2 months, 11 days
# Exact months between two dates
months_apart = (age.years * 12) + age.months
print(f"Months: {months_apart}")
# Jump to relative calendar positions
from dateutil.relativedelta import MO, TU, WE, TH, FR, SA, SU
# Next Friday from a given date
today = datetime(2024, 9, 15) # A Sunday
next_friday = today + relativedelta(weekday=FR)
print(next_friday) # 2024-09-20
# Last Monday on or before a date
last_monday = today + relativedelta(weekday=MO(-1))
print(last_monday) # 2024-09-09
# First day of next month
first_next_month = today + relativedelta(months=1, day=1)
print(first_next_month) # 2024-10-01
# Last day of current month
last_of_month = today + relativedelta(day=31)
print(last_of_month) # 2024-09-30Timezone Handling: The Critical Topic
Timezone bugs are among the most subtle and damaging errors in data systems. They produce results that look correct but are silently wrong by hours — sometimes placing events in the wrong day entirely.
Naive vs. Aware Datetimes
- Naive datetime: Has no timezone information.
datetime(2024, 9, 15, 14, 30, 0)— you don’t know if this is UTC, New York, Tokyo, or anywhere else. - Aware datetime: Has timezone information attached.
datetime(2024, 9, 15, 14, 30, 0, tzinfo=timezone.utc)— unambiguously UTC.
The golden rule: never mix naive and aware datetimes in the same computation. Python raises TypeError when you try, but pandas silently does the wrong thing in some cases.
Working with Timezones in Python
from datetime import datetime, timezone, timedelta
import zoneinfo # Python 3.9+ standard library
# ── Creating timezone-aware datetimes ──────────────────────────────
# UTC
utc_now = datetime.now(timezone.utc)
print(utc_now) # 2024-09-15 14:30:00+00:00
print(utc_now.tzinfo) # UTC
# Specific timezone with zoneinfo (Python 3.9+)
eastern_tz = zoneinfo.ZoneInfo("America/New_York")
pacific_tz = zoneinfo.ZoneInfo("America/Los_Angeles")
london_tz = zoneinfo.ZoneInfo("Europe/London")
tokyo_tz = zoneinfo.ZoneInfo("Asia/Tokyo")
mumbai_tz = zoneinfo.ZoneInfo("Asia/Kolkata")
# Create a datetime in a specific timezone
ny_now = datetime.now(eastern_tz)
print(ny_now) # 2024-09-15 10:30:00-04:00 (during EDT)
# Convert UTC datetime to a local timezone
utc_event = datetime(2024, 9, 15, 18, 0, 0, tzinfo=timezone.utc)
ny_time = utc_event.astimezone(eastern_tz)
tokyo_time = utc_event.astimezone(tokyo_tz)
print(f"UTC: {utc_event}") # 2024-09-15 18:00:00+00:00
print(f"NYC: {ny_time}") # 2024-09-15 14:00:00-04:00 (EDT)
print(f"Tokyo: {tokyo_time}") # 2024-09-16 03:00:00+09:00 (JST — next day!)
# Make a naive datetime timezone-aware (localize it)
naive_dt = datetime(2024, 9, 15, 14, 0, 0)
# Method: replace with tzinfo (OK for UTC, careful with DST for local zones)
utc_dt = naive_dt.replace(tzinfo=timezone.utc)
# Better for local timezones: use zoneinfo
ny_dt = naive_dt.replace(tzinfo=eastern_tz) # Assumes the time IS in Eastern
# Strip timezone info from an aware datetime (make it naive)
naive_again = utc_dt.replace(tzinfo=None)The Daylight Saving Time Trap
DST (Daylight Saving Time) is where timezone handling becomes truly dangerous:
from datetime import datetime
import zoneinfo
eastern = zoneinfo.ZoneInfo("America/New_York")
# 2024: clocks spring forward on March 10 at 2:00 AM
# The time 2:30 AM on March 10, 2024 in Eastern time doesn't exist!
# And 1:30 AM occurs TWICE on November 3, 2024 (fall back)
# A transaction timestamped 2:30 AM ET on March 10
# might actually be stored as 3:30 AM or 1:30 AM depending on the system
# Always store in UTC to avoid this ambiguity!
# ── The correct pattern: Store UTC, display local ─────────────────
# Store in data.base as UTC
event_utc = datetime(2024, 3, 10, 7, 30, 0, tzinfo=zoneinfo.ZoneInfo("UTC"))
print(f"Stored (UTC): {event_utc}") # 2024-03-10 07:30:00+00:00
# Display to a New York user
event_ny = event_utc.astimezone(eastern)
print(f"Display (NY): {event_ny}") # 2024-03-10 03:30:00-04:00 (after spring forward)
# Display to a Tokyo user
event_tk = event_utc.astimezone(zoneinfo.ZoneInfo("Asia/Tokyo"))
print(f"Display (TK): {event_tk}") # 2024-03-10 16:30:00+09:00Using pytz (Legacy — Pre-Python 3.9)
Many existing codebases use pytz instead of zoneinfo. The key difference: pytz requires using localize() instead of replace() for attaching timezones:
# pip install pytz (or use zoneinfo for Python 3.9+)
import pytz
from datetime import datetime
eastern_pytz = pytz.timezone("America/New_York")
utc = pytz.utc
# !! WRONG with pytz: replace() doesn't handle DST correctly
wrong = datetime(2024, 9, 15, 14, 0, 0).replace(tzinfo=eastern_pytz)
# CORRECT with pytz: use localize() for local timezones
correct = eastern_pytz.localize(datetime(2024, 9, 15, 14, 0, 0))
print(correct) # 2024-09-15 14:00:00-04:00
# Convert between timezones
utc_time = correct.astimezone(utc)
print(utc_time) # 2024-09-15 18:00:00+00:00
# For Python 3.9+, prefer zoneinfo over pytz — it's in the standard library,
# more accurate, and doesn't have the localize() footgun.pandas Datetime Operations
pandas is where most data scientists actually work with dates at scale. Its datetime support is built on NumPy’s datetime64 type and provides vectorized operations across entire columns.
pd.to_datetime(): Converting to Datetime
import pandas as pd
import numpy as np
# Convert a column of strings to datetime
df = pd.DataFrame({
"date_str": ["2024-01-15", "2024-02-20", "2024-03-08", "invalid", None],
"amount": [150.0, 89.5, 210.0, 45.0, 99.0]
})
# Basic conversion
df["date"] = pd.to_datetime(df["date_str"])
# With error handling — invalid values become NaT (Not a Time)
df["date"] = pd.to_datetime(df["date_str"], errors="coerce")
print(df["date"])
# 0 2024-01-15
# 1 2024-02-20
# 2 2024-03-08
# 3 NaT ← invalid string becomes NaT
# 4 NaT ← None becomes NaT
# Specify format for faster parsing and stricter validation
df["date"] = pd.to_datetime(df["date_str"], format="%Y-%m-%d", errors="coerce")
# Parse from multiple columns (year, month, day as separate columns)
df_parts = pd.DataFrame({
"year": [2024, 2024, 2023],
"month": [9, 10, 12],
"day": [15, 1, 31]
})
df_parts["date"] = pd.to_datetime(df_parts[["year", "month", "day"]])
# Parse Unix timestamps (seconds since epoch)
df_ts = pd.DataFrame({"unix_ts": [1726358400, 1726444800, 1726531200]})
df_ts["date"] = pd.to_datetime(df_ts["unix_ts"], unit="s")
# unit="ms" for milliseconds, "us" for microseconds, "ns" for nanoseconds
# Check for NaT values (like NaN but for datetimes)
print(df["date"].isna().sum()) # Count of NaT values
print(pd.isna(df["date"])) # Boolean maskThe .dt Accessor: Vectorized Datetime Operations
The .dt accessor provides vectorized access to datetime properties on a pandas Series:
import pandas as pd
# Create a Series of datetime values
dates = pd.Series(pd.date_range("2024-01-01", periods=365, freq="D"))
# ── Extract components ─────────────────────────────────────────────
dates.dt.year # 2024, 2024, ...
dates.dt.month # 1, 1, ..., 12
dates.dt.day # 1, 2, 3, ...
dates.dt.hour # 0, 0, ... (all 0 since no time component)
dates.dt.minute
dates.dt.second
dates.dt.microsecond
dates.dt.dayofweek # 0=Mon, 6=Sun
dates.dt.day_name() # "Monday", "Tuesday", ...
dates.dt.month_name() # "January", "February", ...
dates.dt.dayofyear # 1, 2, ..., 366
dates.dt.weekofyear # Deprecated; use .dt.isocalendar().week
dates.dt.isocalendar() # Returns DataFrame with year, week, day
dates.dt.quarter # 1, 1, ..., 4
dates.dt.days_in_month # 31, 29, 31, ... (days in each month)
# ── Boolean properties ─────────────────────────────────────────────
dates.dt.is_month_start # True on Jan 1, Feb 1, ...
dates.dt.is_month_end # True on Jan 31, Feb 29, ...
dates.dt.is_quarter_start # True on Jan 1, Apr 1, Jul 1, Oct 1
dates.dt.is_quarter_end # True on Mar 31, Jun 30, Sep 30, Dec 31
dates.dt.is_year_start # True on Jan 1 only
dates.dt.is_year_end # True on Dec 31 only
# ── Formatting ─────────────────────────────────────────────────────
dates.dt.strftime("%Y-%m-%d") # Format as strings
dates.dt.strftime("%B %Y") # "January 2024", "January 2024", ...
dates.dt.date # Extract just the date component
dates.dt.normalize() # Zero out the time component (floor to day)
# ── Useful derived features for ML ────────────────────────────────
df = pd.DataFrame({"date": pd.date_range("2024-01-01", periods=365, freq="D"),
"sales": np.random.normal(1000, 150, 365)})
df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["day_of_week"] = df["date"].dt.dayofweek
df["day_of_year"] = df["date"].dt.dayofyear
df["week_of_year"] = df["date"].dt.isocalendar().week.astype(int)
df["quarter"] = df["date"].dt.quarter
df["is_weekend"] = df["date"].dt.dayofweek.isin([5, 6])
df["is_month_end"] = df["date"].dt.is_month_end
# Cyclical encoding (prevents "December is far from January" problem)
df["month_sin"] = np.sin(2 * np.pi * df["month"] / 12)
df["month_cos"] = np.cos(2 * np.pi * df["month"] / 12)
df["dow_sin"] = np.sin(2 * np.pi * df["day_of_week"] / 7)
df["dow_cos"] = np.cos(2 * np.pi * df["day_of_week"] / 7)Date Arithmetic in pandas
import pandas as pd
df = pd.DataFrame({
"customer_id": ["CUST_001", "CUST_002", "CUST_003"],
"signup_date": pd.to_datetime(["2022-03-14", "2021-11-28", "2023-01-05"]),
"last_order": pd.to_datetime(["2024-09-10", "2024-09-01", "2024-09-15"])
})
reference_date = pd.Timestamp("2024-09-15")
# Days between two date columns
df["days_since_signup"] = (reference_date - df["signup_date"]).dt.days
df["days_since_last_order"] = (reference_date - df["last_order"]).dt.days
df["tenure_months"] = ((reference_date - df["signup_date"]).dt.days / 30.44).round(1)
print(df[["customer_id", "days_since_signup", "days_since_last_order", "tenure_months"]])
# Add a fixed timedelta to a column
df["trial_end"] = df["signup_date"] + pd.Timedelta(days=14)
df["next_renewal"] = df["last_order"] + pd.DateOffset(months=1) # Calendar month
# DateOffset for calendar-aware arithmetic (months, quarters, years)
df["signup_plus_1yr"] = df["signup_date"] + pd.DateOffset(years=1)
df["signup_plus_6mo"] = df["signup_date"] + pd.DateOffset(months=6)
# Period-over-period comparison
df["prev_year_same_day"] = df["last_order"] - pd.DateOffset(years=1)
# Clip dates to a valid range
min_date = pd.Timestamp("2022-01-01")
max_date = pd.Timestamp("2024-12-31")
df["signup_clipped"] = df["signup_date"].clip(lower=min_date, upper=max_date)Timezone Handling in pandas
import pandas as pd
# Create timezone-aware Series
ts = pd.Series(pd.date_range("2024-01-01", periods=5, freq="h", tz="UTC"))
print(ts.dt.tz) # UTC
# Convert to another timezone
ts_ny = ts.dt.tz_convert("America/New_York")
ts_tk = ts.dt.tz_convert("Asia/Tokyo")
# Localize a naive datetime Series (attach timezone to naive datetimes)
ts_naive = pd.Series(pd.date_range("2024-01-01", periods=5, freq="h"))
ts_utc = ts_naive.dt.tz_localize("UTC") # These naive times ARE UTC
ts_ny = ts_naive.dt.tz_localize("America/New_York") # These naive times ARE Eastern
# Remove timezone info (convert aware → naive)
ts_stripped = ts_utc.dt.tz_localize(None) # Remove tz info
# In a DataFrame column
df["event_time"] = pd.to_datetime(df["event_time_str"])
df["event_time"] = df["event_time"].dt.tz_localize("UTC")
df["event_time_ny"] = df["event_time"].dt.tz_convert("America/New_York")
# Group by date in a specific timezone
df["local_date"] = df["event_time"].dt.tz_convert("America/New_York").dt.date
daily_counts = df.groupby("local_date").size()Business Day Arithmetic
Data science work often requires business day awareness — excluding weekends and sometimes holidays from date calculations.
import pandas as pd
from pandas.tseries.offsets import BDay, BMonthEnd, BusinessDay
from pandas.tseries.holiday import USFederalHolidayCalendar
# ── Business days with pandas offsets ─────────────────────────────
date = pd.Timestamp("2024-09-13") # Friday
print(date + BDay(1)) # 2024-09-16 (Monday — skips weekend)
print(date + BDay(5)) # 2024-09-20 (Friday — 5 business days later)
print(date - BDay(3)) # 2024-09-10 (Tuesday — 3 business days back)
# Count business days between two dates
start = pd.Timestamp("2024-09-01")
end = pd.Timestamp("2024-09-30")
bdays = pd.bdate_range(start, end)
print(f"Business days in September 2024: {len(bdays)}") # 21
# Business day range (pandas equivalent of date_range for business days)
bday_range = pd.bdate_range("2024-09-01", "2024-09-30", freq="B")
print(bday_range[:5])
# ── Custom holiday calendar ────────────────────────────────────────
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday
from pandas.tseries.offsets import CustomBusinessDay
# US Federal Holidays
us_bd = CustomBusinessDay(calendar=USFederalHolidayCalendar())
# Custom company calendar
class CompanyCalendar(AbstractHolidayCalendar):
rules = [
Holiday("New Year", month=1, day=1),
Holiday("Company Day", month=6, day=15), # Company anniversary
Holiday("Thanksgiving", month=11, day=28), # Approximate
Holiday("Christmas", month=12, day=25),
]
company_bd = CustomBusinessDay(calendar=CompanyCalendar())
# Add 10 business days (excluding company holidays)
date = pd.Timestamp("2024-12-20")
deadline = date + 10 * company_bd
print(f"Deadline (10 company business days): {deadline.date()}")
# Count business days between dates with custom calendar
range_with_holidays = pd.date_range("2024-12-20", "2025-01-15", freq=company_bd)
print(f"Company business days Dec 20 – Jan 15: {len(range_with_holidays)}")Common Date/Time Operations in Data Science
Computing Cohort Labels
import pandas as pd
df = pd.DataFrame({
"customer_id": [f"CUST_{i:03d}" for i in range(1, 11)],
"signup_date": pd.date_range("2022-01-15", periods=10, freq="45D")
})
# Monthly cohort (YYYY-MM)
df["cohort_month"] = df["signup_date"].dt.to_period("M").astype(str)
# Quarterly cohort
df["cohort_quarter"] = df["signup_date"].dt.to_period("Q").astype(str)
# Annual cohort
df["cohort_year"] = df["signup_date"].dt.year.astype(str)
print(df[["customer_id", "signup_date", "cohort_month", "cohort_quarter"]])Computing Time-Since Features
import pandas as pd
import numpy as np
def compute_recency_features(df: pd.DataFrame,
date_col: str,
reference_date: pd.Timestamp = None) -> pd.DataFrame:
"""
Add recency features: days/months/years since an event date.
Parameters
----------
df : pd.DataFrame
DataFrame containing the date column.
date_col : str
Name of the datetime column.
reference_date : pd.Timestamp, optional
Reference point for computing recency. Defaults to today.
Returns
-------
pd.DataFrame
Original DataFrame with added recency columns.
"""
df = df.copy()
if reference_date is None:
reference_date = pd.Timestamp.today().normalize()
df[f"{date_col}_days_ago"] = (reference_date - df[date_col]).dt.days
df[f"{date_col}_weeks_ago"] = df[f"{date_col}_days_ago"] / 7
df[f"{date_col}_months_ago"] = df[f"{date_col}_days_ago"] / 30.44
df[f"{date_col}_years_ago"] = df[f"{date_col}_days_ago"] / 365.25
# Bin into categorical recency buckets
df[f"{date_col}_recency_bucket"] = pd.cut(
df[f"{date_col}_days_ago"],
bins=[0, 7, 30, 90, 180, 365, float("inf")],
labels=["0-7d", "8-30d", "31-90d", "91-180d", "181-365d", "365d+"]
)
return df
# Usage
df = pd.DataFrame({
"customer_id": ["CUST_001", "CUST_002", "CUST_003"],
"last_purchase": pd.to_datetime(["2024-08-15", "2024-06-01", "2023-12-25"])
})
df = compute_recency_features(df, "last_purchase",
reference_date=pd.Timestamp("2024-09-15"))
print(df[["customer_id", "last_purchase_days_ago", "last_purchase_recency_bucket"]])Detecting and Handling Timezone Inconsistencies
import pandas as pd
def audit_datetime_column(df: pd.DataFrame, col: str) -> dict:
"""
Audit a datetime column for timezone consistency and data quality issues.
Returns a summary of findings to guide cleaning decisions.
"""
series = df[col]
is_datetime = pd.api.types.is_datetime64_any_dtype(series)
if not is_datetime:
series = pd.to_datetime(series, errors="coerce")
null_count = series.isna().sum()
non_null = series.dropna()
tz_info = series.dt.tz if hasattr(series.dt, "tz") else None
min_date = non_null.min()
max_date = non_null.max()
date_range_days = (max_date - min_date).days if len(non_null) > 1 else 0
future_count = (non_null > pd.Timestamp.now(tz=tz_info)).sum()
ancient_count = (non_null < pd.Timestamp("1970-01-01", tz=tz_info)).sum()
report = {
"column": col,
"dtype": str(series.dtype),
"timezone": str(tz_info) if tz_info else "None (naive)",
"null_count": int(null_count),
"null_pct": round(null_count / len(series) * 100, 2),
"min_date": str(min_date.date()) if not pd.isna(min_date) else None,
"max_date": str(max_date.date()) if not pd.isna(max_date) else None,
"date_span_days": date_range_days,
"future_dates": int(future_count),
"pre_1970_dates": int(ancient_count),
}
print(f"\nDatetime Column Audit: {col}")
for k, v in report.items():
print(f" {k:20s}: {v}")
return reportParsing Mixed Date Formats in a Column
Real-world datasets often have inconsistent date formats within the same column:
import pandas as pd
from dateutil import parser as date_parser
mixed_dates = pd.Series([
"2024-09-15",
"15/09/2024",
"September 15, 2024",
"2024.09.15",
"09-15-2024",
None,
"not a date",
"2024-09-16T00:00:00"
])
def parse_mixed_dates(series: pd.Series) -> pd.Series:
"""
Parse a Series with mixed date formats using dateutil's flexible parser.
Invalid values become NaT.
"""
def try_parse(val):
if pd.isna(val) or not isinstance(val, str):
return pd.NaT
try:
return pd.Timestamp(date_parser.parse(str(val)))
except (ValueError, OverflowError):
return pd.NaT
result = series.apply(try_parse)
success = result.notna().sum()
failed = result.isna().sum() - series.isna().sum()
print(f"Parsed: {success} | Failed: {failed} | Already NaN: {series.isna().sum()}")
return result
parsed = parse_mixed_dates(mixed_dates)
print(parsed)Formatting Dates for Output
import pandas as pd
df = pd.DataFrame({
"event_time": pd.to_datetime([
"2024-09-15 14:30:00",
"2024-09-16 09:15:30",
"2024-09-17 22:45:00"
])
})
# Format for display in reports
df["date_display"] = df["event_time"].dt.strftime("%B %d, %Y") # "September 15, 2024"
df["date_iso"] = df["event_time"].dt.strftime("%Y-%m-%d") # "2024-09-15"
df["datetime_display"]= df["event_time"].dt.strftime("%b %d, %Y %I:%M %p") # "Sep 15, 2024 02:30 PM"
df["month_year"] = df["event_time"].dt.strftime("%b %Y") # "Sep 2024"
df["year_week"] = df["event_time"].dt.strftime("%Y-W%V") # "2024-W37"
print(df)
# Convert to Period for grouping/display
df["period_month"] = df["event_time"].dt.to_period("M") # Period('2024-09', 'M')
df["period_quarter"] = df["event_time"].dt.to_period("Q") # Period('2024Q3', 'Q-DEC')
df["period_year"] = df["event_time"].dt.to_period("A") # Period('2024', 'A-DEC')
# Period objects sort correctly and display as "2024-09", "2024Q3", "2024"
monthly_revenue = df.groupby("period_month")["event_time"].count()
print(monthly_revenue)Common Pitfalls and How to Avoid Them
Pitfall 1: Comparing date to datetime
from datetime import date, datetime
d = date(2024, 9, 15)
dt = datetime(2024, 9, 15, 0, 0, 0)
# WRONG: TypeError — can't compare date to datetime
# print(d == dt)
# CORRECT: Convert to the same type
print(d == dt.date()) # True
print(datetime.combine(d, datetime.min.time()) == dt) # TruePitfall 2: Mixing Naive and Aware Datetimes in pandas
import pandas as pd
naive_series = pd.to_datetime(["2024-01-01", "2024-02-01"])
aware_series = pd.to_datetime(["2024-01-15", "2024-02-15"]).tz_localize("UTC")
# WRONG: Cannot compare naive and aware
# naive_series < aware_series # Raises TypeError
# CORRECT: Make both aware or both naive first
naive_as_utc = naive_series.dt.tz_localize("UTC")
result = naive_as_utc < aware_series # Now comparablePitfall 3: Assuming “Local Time” is Consistent
import pandas as pd
# WRONG: Using datetime.now() for timestamps in a pipeline
# (gives different results depending on where/when code runs)
df["processed_at"] = pd.Timestamp.now() # Local time — ambiguous!
# CORRECT: Always use UTC for system timestamps
df["processed_at"] = pd.Timestamp.utcnow().tz_localize("UTC")
# Or equivalently:
import datetime
df["processed_at"] = datetime.datetime.now(datetime.timezone.utc)Pitfall 4: Integer Division Truncates Timedeltas
import pandas as pd
delta = pd.Timedelta(days=15, hours=12)
# WRONG: Integer division truncates
months_approx = delta.days // 30 # 0 (truncated!)
# CORRECT: Use float division
months_approx = delta.days / 30.44 # 0.51 months
# For precise calendar months, use relativedelta
from dateutil.relativedelta import relativedelta
from datetime import date
start = date(2024, 1, 15)
end = date(2024, 9, 15)
diff = relativedelta(end, start)
print(f"Exact difference: {diff.years} years, {diff.months} months, {diff.days} days")
# Exact difference: 0 years, 8 months, 0 daysPitfall 5: Year 2000/2038 / Unix Timestamp Overflow
import pandas as pd
# Unix timestamps stored as int32 overflow after January 19, 2038
# Always use int64 for Unix timestamps
# pandas uses nanosecond resolution by default:
# Maximum datetime pandas can represent (nanosecond resolution)
print(pd.Timestamp.max) # 2262-04-11 23:47:16.854775807
# If you see timestamps far in the future, check the unit
unix_ms = 1726358400000 # Milliseconds since epoch
dt_wrong = pd.to_datetime(unix_ms, unit="s") # Wrong: treats as seconds → year 56000+
dt_right = pd.to_datetime(unix_ms, unit="ms") # Right: treats as milliseconds → 2024
print(dt_right) # 2024-09-14 22:40:00Quick Reference: Choosing the Right Tool
| Task | Best Tool |
|---|---|
| Create a date/datetime from components | datetime(y, m, d, h, min, s) |
| Get today’s date | date.today() |
| Get current UTC time | datetime.now(timezone.utc) |
| Parse a known format string | datetime.strptime(s, fmt) |
| Parse any date string flexibly | dateutil.parser.parse(s) |
| Format datetime to string | dt.strftime(fmt) or dt.isoformat() |
| Add/subtract days/hours | dt + timedelta(days=N) |
| Add/subtract months or years | dt + relativedelta(months=N) |
| Find next weekday | dt + relativedelta(weekday=FR) |
| Convert string column to datetime | pd.to_datetime(col, errors="coerce") |
| Extract year/month/day from column | col.dt.year, col.dt.month, etc. |
| Difference between two date columns | (col_a - col_b).dt.days |
| Add months to a date column | col + pd.DateOffset(months=N) |
| Convert timezone | col.dt.tz_convert("America/New_York") |
| Localize naive datetime column | col.dt.tz_localize("UTC") |
| Business day arithmetic | date + BDay(N) |
| Count business days in a range | len(pd.bdate_range(start, end)) |
| Group by calendar period | col.dt.to_period("M") |
Summary
Working correctly with dates and times requires mastering three layers: the standard library datetime module for individual objects and arithmetic, dateutil for flexible parsing and calendar-aware relative deltas, and pandas for vectorized operations on entire columns. The fundamental rules that prevent the most bugs are: always store timestamps in UTC and convert to local time only for display, always use timezone-aware datetime objects rather than naive ones, use dateutil.relativedelta rather than timedelta for month/year arithmetic, and use pd.to_datetime() with errors="coerce" to safely handle invalid dates in real-world data.
The .dt accessor is the workhorse of date feature engineering in pandas — it provides fast, vectorized access to every component of a datetime, from .dt.year through .dt.is_month_end, and combined with cyclical sin/cos encoding for periodic features, it enables you to extract the full temporal signal from any date column for machine learning.
Key Takeaways
- Python’s
datetimemodule provides four classes:date(year/month/day),time(hour/min/sec),datetime(combined), andtimedelta(duration) — most data science work usesdatetimeandtimedelta strptime(string, format)parses a string to datetime using format codes;strftime(format)formats a datetime to string —%Y-%m-%dis ISO 8601 and should be your default format for storagedateutil.parser.parse()flexibly parses almost any date string without specifying a format;dateutil.relativedeltacorrectly adds months and years (unliketimedeltawhich only handles fixed-length durations)- Always use UTC for storing and computing timestamps; convert to local time only for display — use
zoneinfo.ZoneInfo("America/New_York")(Python 3.9+) orpytzfor timezone-aware operations - Naive datetimes (no timezone info) cause silent bugs when data spans multiple time zones — always check whether a datetime column is timezone-aware before computing differences or comparisons
pd.to_datetime(col, errors="coerce")converts string columns to datetime, turning invalid values intoNaT(Not a Time) instead of raising errors — essential for real-world dirty data- The
.dtaccessor provides vectorized extraction of every datetime component (col.dt.year,col.dt.dayofweek,col.dt.is_month_end) and formatting (col.dt.strftime()) on pandas Series col.dt.tz_localize("UTC")attaches timezone info to a naive column;col.dt.tz_convert("America/New_York")converts an aware column to a different timezone — in that order, never reversed








