Back to notes
MIGRATIONS

The migration that took six months

We thought it would take two weeks. We were off by a factor of twelve.

The Plan

Move from PostgreSQL to... PostgreSQL. Different host. Better hardware. Should be simple: dump, restore, flip DNS.

The First Problem

The dump was 2TB. Restore took 18 hours. We couldn't afford 18 hours of downtime.

The Second Problem

Logical replication. Set it up. Worked great. Except for the tables with no primary key. 340 tables with no primary key.

The Third Problem

Adding primary keys to 340 tables in production. Each one a migration. Each migration a risk. Each risk a conversation.

The Fourth Problem

Replication lag. During peak hours, we'd fall behind. During off-peak, we'd catch up. We needed to cut over during off-peak. Our off-peak was someone else's peak.

The Solution

Six months of: adding primary keys, optimizing queries, reducing write load, testing failover, practicing recovery, documenting everything, and waiting for the right moment.

The Lesson

Every migration is a negotiation with your past self. And your past self made some questionable decisions.