Most database performance problems are indexing problems. A missing index on a frequently queried column turns a 2ms query into a 2-second full table scan.
Indexing fundamentals
Create indexes for WHERE clauses:
-- Query filters on email
SELECT * FROM users WHERE email = 'jane@example.com';
-- Add an index
CREATE INDEX idx_users_email ON users (email);
Create composite indexes for multi-column queries:
-- Query filters on status AND created_at
SELECT * FROM orders WHERE status = 'pending' AND created_at > NOW() - INTERVAL '7 days';
-- Composite index (order matters!)
CREATE INDEX idx_orders_status_created ON orders (status, created_at);
Index column order rule: Put the most selective (filtered) column
first. If you always query by status, the status column goes first.
Don’t over-index:
- Each index slows down INSERT/UPDATE/DELETE
- A table with 15 indexes will be slow to write
- Monitor with
pg_stat_user_indexesto find unused indexes:SELECT indexrelname, idx_scan FROM pg_stat_user_indexes WHERE idx_scan = 0 AND indexrelname NOT LIKE 'pg_%';
Query plan analysis
Use EXPLAIN ANALYZE to see what the database actually does:
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id;
Red flags in query plans:
Seq Scanon large tables → missing indexSortusing filesort → missing index for ORDER BYNested Loopwith high row counts → consider a hash joinHash Joinwith high memory usage → consider adding indexes
N+1 detection
The N+1 problem: one query to fetch a list, then N queries to fetch related data for each item.
# Bad — 1 + N queries
users = db.query("SELECT * FROM users")
for user in users:
orders = db.query("SELECT * FROM orders WHERE user_id = ?", user.id)
# Good — 2 queries
users = db.query("SELECT * FROM users")
user_ids = [u.id for u in users]
orders = db.query("SELECT * FROM orders WHERE user_id IN (?, ?)", user_ids)
# Best — 1 query with JOIN
results = db.query("""
SELECT u.*, o.id as order_id, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
""")
Use ORMs carefully — they often hide N+1 queries behind convenient syntax. Enable query logging in development.
Connection pooling
Don’t create a new connection per query:
# Bad — connection leak risk, slow
def get_user(user_id):
conn = psycopg2.connect(DATABASE_URL)
result = conn.execute("SELECT ...", user_id)
conn.close()
return result
# Good — use a pool
from psycopg2 import pool
connection_pool = pool.SimpleConnectionPool(1, 20, DATABASE_URL)
def get_user(user_id):
conn = connection_pool.getconn()
try:
return conn.execute("SELECT ...", user_id)
finally:
connection_pool.putconn(conn)
Most ORMs and query builders handle pooling. Configure pool size based
on your database’s max_connections setting.
Migration safety
Safe migrations:
- Adding a column with a default
- Adding an index (use
CONCURRENTLYin PostgreSQL) - Renaming a column (in three steps: add new, copy data, drop old)
Dangerous migrations:
- Dropping a column (check for active queries first)
- Changing column type (can lock the table)
- Adding a NOT NULL column to a large table (add as nullable, backfill, then add constraint)
Always test migrations against a copy of production data, not an empty database.
See references/indexing-guide.md for database-specific indexing.
When it triggers
- slow database query
- adding an index
- query plan analysis
- database migration strategy