Migrating a PostgreSQL Enum using SQLAlchemy and Alembic

Jorge Lopez, Senior Software Engineer

Published August 1st, 2023

Database migrations... even after 5+ years together, some days it still feels like we just met. When following normal routines, like adding and removing unrelated columns, I can forget their capriciousness and at-times volatile temper. Using alembic, I can usually:

  1. Run

    alembic revision -m "Add checkbox column"
  2. Make mild tweaks to the revision file, and

  3. Finish with a flourish

    alembic upgrade head

Then I finish my coffee, smile at my database migration, and naively assume it will always have my back.

But it doesn't always have my back. I was recently blasted by a hot dose of erratic database migration silliness when I tried something outside our normal routine. I boldly decided to migrate... a PostgreSQL enum. Of my trials, tribulations, and epiphanies, let me enumerate...

Background

My recent pain involves using Alembic to migrate an enum column in PostgreSQL in a database managed by the SQLAlchemy ORM. Let's first break down PostgreSQL enums, the SQLAlchemy ORM, and alembic.

PostgreSQL Enums

According to the PostgreSQL documentation, an enum "comprises a static, ordered set of values". This datatype is useful for representing values that you can enumerate, like size categories ('small', 'medium', and 'large') or temperature feel ('cold', 'mild', 'hot'). Kepler is a digital marketing agency, so Enums are particularly useful for abstracting concepts like ad placement types and copy size for display ads. An example enum for ad type size category might look like this:

CREATE size_category AS ENUM ('RECTANGLE', 'POPUP', 'BANNER', 'BUTTON', 'SKYSCRAPER');

Don't worry too much about what those particular size categories mean, we're more interested in their usage in database operations. By assigning size_category as an enum, we restrict database users (eg, our api server) from uploading a value that is not in that enum. This protects our database integrity at the database level and clarifies the purpose of size_category more than an unstructured String type.

SQLAlchemy ORM

At Kepler, we write most of our API servers with Python and manage our databases using the SQLAlchemy ORM. A simple table might be described (and created) like this:

from enum import Enum from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column class SizeCategory(Enum): RECTANGLE = "RECTANGLE" POPUP = "POPUP" BANNER = "BANNER" BUTTON = "BUTTON" SKYSCRAPER = "SKYSCRAPER" class Base(DeclarativeBase): """The SQLAlchemy base model.""" class Ad(Base): """Ad model, stripped down for clarity.""" __tablename__ = "ad" id: Mapped[int] = mapped_column(primary_key=True) name: Mapped[str] = mapped_column() size_category: Mapped[SizeCategory] = mapped_column()

Under the hood, SQLAlchemy will generate the relevant queries for a particular database engine to help us create, query, and populate the table.

Alembic

While SQLAlchemy helps us create, query, and populate our database tables, it does not directly help us manage changes to our table structure. Eg, if we add/remove/modify a column from our database, SQLAlchemy has no opinion (or tooling) to help us manage the migration of our data. For this purpose, we turn to Alembic. Let's say we wanted to modify our Ad table to include a new column called "description". An alembic-based workflow might include the following steps:

  1. Update the database model.

    class Ad(Base): """Ad model, stripped down for clarity.""" __tablename__ = "ad" id: Mapped[int] = mapped_column(primary_key=True) name: Mapped[str] = mapped_column() description: Mapped[str] = mapped_column(nullable=True) # <- new size_category: Mapped[SizeCategory] = mapped_column()
  2. Generate an alembic migration.

    alembic revision -m "Add description column to ad database model."
  3. Examine / update the generated alembic file to ensure the migration runs smoothly.

    """Add description column to ad database model. Revision ID: <revision-hash> Revises: <previous-revision-hash> Create Date: <created date> """ import sqlalchemy as sa from alembic import op # revision identifiers, used by Alembic. revision = "68ec69f61faa" down_revision = "ab96b4b96c49" branch_labels = None depends_on = None def upgrade(): op.add_column("ad", sa.Column("description", sa.String(), nullable=True)) def downgrade(): op.drop_column("ad", "description")
  4. Run the migration

    alembic upgrade head

As you can see, with alembic and SQLAlchemy, adding a new column with a simple type is fairly straightforward. Things get a bit more exciting when modifying a new column.

For brevity, I will remove the downgrade() function from the remainder of this blog post.

The Problem

tl;dr An enum column's fields were mistakenly created as lower-case values. They needed to be made upper-case. I was tasked with making this change. The database migration required more creativity than I anticipated.

The orignal SQL Alchemy code looked like this

class CampaignPlacement(Enum): """Placement Options.""" placement_top = "placement_top" placement_product_page = "placement_product_page" placement_rest_of_search = "placement_rest_of_search" class CampaignTable(Base): __tablename__ = "campaign" placement = Column(Enum(CampaignPlacement), nullable=True)

Its accompanying Alembic upward migration looked like this:

def upgrade(): campaign_placement = postgresql.ENUM( "placement_top", "placement_product_page", "placement_rest_of_search", name="campaignplacement", ) campaign_placement.create(op.get_bind()) op.add_column( "campaign_table", sa.Column( "placement", campaign_placement, nullable=True, ), )

We then populated the database column with lowercase values. Sometime later, we learned that placement values should be upper-case. One option: we could always perform transformations after reading from the database, but that's not very elegant. Instead, we decided to migrate our database to reflect the values we actually want to work with. My first approach to solve this problem:

def upgrade(): # 1. Alter the enum by adding the uppercase values op.execute( "ALTER TYPE campaignplacement ADD VALUE 'PLACEMENT_TOP';" "ALTER TYPE campaignplacement ADD VALUE 'PLACEMENT_PRODUCT_PAGE';" "ALTER TYPE campaignplacement ADD VALUE 'PLACEMENT_REST_OF_SEARCH';" ) op.execute("COMMIT;") # <- NON-TRANSACTIONAL!!! # 2. Migrate any lowercase value that is already # in the table to its uppercase value op.execute( "UPDATE campaign_table set placement = 'PLACEMENT_TOP'" " WHERE placement = 'placement_top';" ) op.execute( "UPDATE campaign_table set placement = 'PLACEMENT_PRODUCT_PAGE'" " WHERE placement = 'placement_product_page';" ) op.execute( "UPDATE campaign_table set placement = 'PLACEMENT_REST_OF_SEARCH'" " WHERE placement = 'placement_rest_of_search';" ) # 3. Rename the enum op.execute("ALTER TYPE campaignplacement RENAME TO temp_campaignplacement") # 4. Create a new enum with only the uppercase # values and the original enum name. campaign_placement = postgresql.ENUM( "PLACEMENT_TOP", "PLACEMENT_PRODUCT_PAGE", "PLACEMENT_REST_OF_SEARCH", name="campaignplacement", ) campaign_placement.create(op.get_bind()) # 5. Alter the column type to the new enum. op.execute( "ALTER TABLE campaign_table ALTER COLUMN placement" " TYPE campaignplacement USING placement::text::campaignplacement;" ) # 6. Finally remove the old enum. op.execute("DROP TYPE temp_campaignplacement")

During code-review, we noticed a glaring issue: this migration was non-transactional! A non-transactional migration is bad because it cannot be automatically rolled back if anything fails halfway through. Unfortunately, this approach forced us to execute a commit in the middle of our database migration because a PostgreSQL enum type needs to exist before it can be used in an UPDATE statement for an existing enum column. Realizing this limitation, I changed my approach by relying on a non-enum type conversion to TEXT.

def upgrade(): # 1. Alter the column type to "text" op.execute( "ALTER TABLE campaign_table ALTER COLUMN placement" " TYPE text USING placement::text;" ) # 2. Convert any lowercase value that is # already in the table to uppercase value. op.execute( "UPDATE campaign_table set placement = 'PLACEMENT_TOP' " "WHERE placement = 'placement_top';" "UPDATE campaign_table set placement = 'PLACEMENT_PRODUCT_PAGE' " "WHERE placement = 'placement_product_page';" "UPDATE campaign_table set placement = 'PLACEMENT_REST_OF_SEARCH' " "WHERE placement = 'placement_rest_of_search';" ) # 3. Update the existing enum values to # their uppercase values. op.execute( "ALTER TYPE campaignplacement RENAME " "VALUE 'placement_top' to 'PLACEMENT_TOP';" "ALTER TYPE campaignplacement RENAME " "VALUE 'placement_product_page' to 'PLACEMENT_PRODUCT_PAGE';" "ALTER TYPE campaignplacement RENAME " "VALUE 'placement_rest_of_search' to 'PLACEMENT_REST_OF_SEARCH';" ) # 4. Alter the column type again to the enum. op.execute( "ALTER TABLE campaign_table ALTER COLUMN placement" " TYPE campaignplacement USING placement::text::campaignplacement;" )

And it worked! By relaxing our column type temporarily from enum to text, we were able to make this database migration transactional. It's also (arguably) easier to understand, if I do say so myself.

Conclusion

Like other tasks involving stateful environments, migrating a database can be challenging. If you find yourself trying to migrate a PostgreSQL enum column using Alembic, hopefully you're able to spend a bit less time wrestling with the task than I did. Until next time: may your migrations run error-free, and your highly-queried columns be properly indexed!