There are only two hard things in computer science: cache invalidation and naming things. This skill covers the first one at every layer of the stack.

HTTP caching

Set proper headers so browsers and CDNs cache for you:

# Static assets with content-hash filenames
Cache-Control: public, max-age=31536000, immutable

# API responses that change occasionally
Cache-Control: public, max-age=60, stale-while-revalidate=300

# Never cache
Cache-Control: no-store

stale-while-revalidate is the most underused header. It serves the cached response immediately while fetching the fresh version in the background. Users get instant responses, data stays reasonably fresh.

CDN caching

Cache static assets at the edge:

Use content hashes in filenames (app.3f8a2c.js) for cache-busting
Set long max-age with immutable for hashed assets
Use Surrogate-Key headers for selective purge on dynamic content

For API responses at the edge:

Cache GET responses with consistent query string ordering
Use Vary: Authorization for personalized content
Purge by tag/key when data changes

Application-level caching

Cache-aside (lazy loading):

def get_user(user_id):
    # Check cache first
    cached = redis.get(f"user:{user_id}")
    if cached:
        return deserialize(cached)
    
    # Cache miss — fetch from DB
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)
    redis.setex(f"user:{user_id}", 300, serialize(user))
    return user

Write-through:

def update_user(user_id, data):
    db.update("UPDATE users SET ... WHERE id = ?", user_id, data)
    redis.setex(f"user:{user_id}", 300, serialize(data))

Write-behind (write-back): Buffer writes and flush to the database asynchronously. Use when write throughput matters more than immediate consistency.

Cache invalidation strategies

TTL-based: Set an expiration time. Simple but may serve stale data up to the TTL duration.

Event-based: Invalidate when the source data changes:

def on_user_update(user_id):
    redis.delete(f"user:{user_id}")

Version-based: Include a version in the cache key:

cache_key = f"user:{user_id}:v{user.version}"

Database query caching

Use materialized views for expensive, read-heavy queries:

CREATE MATERIALIZED VIEW user_stats AS
SELECT user_id, COUNT(*) as order_count, SUM(total) as total_spent
FROM orders
GROUP BY user_id;

Refresh on a schedule or after specific events, not on every write.

What NOT to cache

Data that changes every request (real-time stock prices)
Data with very low read volume (caching costs more than re-fetching)
User-specific data in shared caches without proper key namespacing
Auth tokens or sensitive data in caches without encryption

See references/cache-patterns.md for language-specific implementations.

When it triggers

adding caching to an application
cache is serving stale data
improving response times
deciding what to cache

Caching Strategies