← Catalog

No. 012 · databases

SQL Optimization

Queries that scale instead of stall

Version 1.0.0 License MIT Format SKILL.md

Most database performance problems are indexing problems. A missing index on a frequently queried column turns a 2ms query into a 2-second full table scan.

Indexing fundamentals

Create indexes for WHERE clauses:

-- Query filters on email
SELECT * FROM users WHERE email = 'jane@example.com';

-- Add an index
CREATE INDEX idx_users_email ON users (email);

Create composite indexes for multi-column queries:

-- Query filters on status AND created_at
SELECT * FROM orders WHERE status = 'pending' AND created_at > NOW() - INTERVAL '7 days';

-- Composite index (order matters!)
CREATE INDEX idx_orders_status_created ON orders (status, created_at);

Index column order rule: Put the most selective (filtered) column first. If you always query by status, the status column goes first.

Don’t over-index:

  • Each index slows down INSERT/UPDATE/DELETE
  • A table with 15 indexes will be slow to write
  • Monitor with pg_stat_user_indexes to find unused indexes:
    SELECT indexrelname, idx_scan 
    FROM pg_stat_user_indexes 
    WHERE idx_scan = 0 AND indexrelname NOT LIKE 'pg_%';

Query plan analysis

Use EXPLAIN ANALYZE to see what the database actually does:

EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id;

Red flags in query plans:

  • Seq Scan on large tables → missing index
  • Sort using filesort → missing index for ORDER BY
  • Nested Loop with high row counts → consider a hash join
  • Hash Join with high memory usage → consider adding indexes

N+1 detection

The N+1 problem: one query to fetch a list, then N queries to fetch related data for each item.

# Bad — 1 + N queries
users = db.query("SELECT * FROM users")
for user in users:
    orders = db.query("SELECT * FROM orders WHERE user_id = ?", user.id)

# Good — 2 queries
users = db.query("SELECT * FROM users")
user_ids = [u.id for u in users]
orders = db.query("SELECT * FROM orders WHERE user_id IN (?, ?)", user_ids)

# Best — 1 query with JOIN
results = db.query("""
    SELECT u.*, o.id as order_id, o.total
    FROM users u
    LEFT JOIN orders o ON o.user_id = u.id
""")

Use ORMs carefully — they often hide N+1 queries behind convenient syntax. Enable query logging in development.

Connection pooling

Don’t create a new connection per query:

# Bad — connection leak risk, slow
def get_user(user_id):
    conn = psycopg2.connect(DATABASE_URL)
    result = conn.execute("SELECT ...", user_id)
    conn.close()
    return result

# Good — use a pool
from psycopg2 import pool
connection_pool = pool.SimpleConnectionPool(1, 20, DATABASE_URL)

def get_user(user_id):
    conn = connection_pool.getconn()
    try:
        return conn.execute("SELECT ...", user_id)
    finally:
        connection_pool.putconn(conn)

Most ORMs and query builders handle pooling. Configure pool size based on your database’s max_connections setting.

Migration safety

Safe migrations:

  • Adding a column with a default
  • Adding an index (use CONCURRENTLY in PostgreSQL)
  • Renaming a column (in three steps: add new, copy data, drop old)

Dangerous migrations:

  • Dropping a column (check for active queries first)
  • Changing column type (can lock the table)
  • Adding a NOT NULL column to a large table (add as nullable, backfill, then add constraint)

Always test migrations against a copy of production data, not an empty database.

See references/indexing-guide.md for database-specific indexing.

When it triggers

  • slow database query
  • adding an index
  • query plan analysis
  • database migration strategy