Learn Python with Talk Python's 270 hours of courses

Optimizing PostgreSQL DB Queries with pgMustard

Episode #366, published Fri, May 20, 2022, recorded Wed, May 11, 2022

Does your app have a database? Does that database play an important role in how the app operations and users perceive its quality? Most of you probably said yes to the first, and definitely to the second. But what if your DB isn't doing as well as it should? How would you know? And once you know, what do you do about it?

On this episode, we're joined by Michael Christofides, co-creator of pgMustard, to discuss and explore the EXPLAIN command for Postgres and other databases as well as all the recommendations you might dig into as a result of understanding exactly what's happening with you queries.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guests Introduction and Background

Michael Christofides is the co-creator of pgMustard and has a rich background in database tooling. He started his career at Redgate, working on popular solutions for SQL Server, MySQL, and Oracle, among others. Later, Michael became deeply involved with Postgres, especially its performance and query optimization side. His passion for making databases more approachable led him to build pgMustard, a tool that transforms Postgres query plans into actionable insights for developers and DBAs alike.

What to Know If You're New to Python

Here are a few notes to help newcomers get the most out of this episode:

  • Understanding how to install and interact with Python packages (e.g., pip or Poetry) helps when installing database connectors and ORMs.
  • Basic knowledge of SQL and how Python can talk to databases (for example, using libraries like psycopg2 or SQLAlchemy) will make the discussion about query optimization easier to follow.
  • Familiarity with relational database concepts (tables, indexes, queries) will help you grasp the performance techniques mentioned during the episode.
  • If you’re looking for a solid foundation in Python itself, check out the “Learning Resources” section below.

Key Points and Takeaways

  1. Optimizing Postgres Queries with pgMustard
    pgMustard converts raw Postgres “EXPLAIN” plans into friendly, visual insights. Rather than just color-coding the slow parts, it offers specific advice on tuning, indexing, or adjusting configuration. This guidance can be a significant time-saver for developers who only occasionally dive into DB internals.

  2. Why Database Performance Matters
    A laggy database can bottleneck the entire application experience. Whether you run a massive online platform or a hobby project, small performance gains often translate into a noticeably snappier, more reliable user experience. A well-optimized DB also reduces infrastructure costs by making more efficient use of hardware.

  3. EXPLAIN vs. EXPLAIN ANALYZE
    Postgres’s EXPLAIN command shows how the database plans to run your query, while EXPLAIN ANALYZE actually runs the query and reveals real performance metrics like total time. If you see operations that are unexpectedly expensive, that’s usually your first clue about missing indexes or mismatched row estimates.

  4. GUI Tools vs. Command Line
    For quick one-off checks, the command-line tool psql is handy, but graphical tools can help visualize schema and query plans faster. Tools like dBeaver, DataGrip, Beekeeper, and Archetype give you features such as autocomplete, ER diagrams, and direct plan visualizations, which can be crucial for complex queries.

  5. Generating Realistic Test Data
    It’s crucial to test against a realistic data volume. Tools like Makaroo or Python’s Faker library help generate sufficiently large and varied data sets to match real-world usage. This ensures the query plan chosen in development closely resembles that in production.

  6. Reading Query Plans “Inside Out”
    Postgres reports its plan from the “outer” operation down, but execution generally starts with the innermost operations. Understanding this indentation is vital: look for big row counts and time spent at the most nested steps. Tools like PEV2 can highlight time-consuming parts in color-coded diagrams.

  7. Common Pitfalls: N+1 Queries, Over-Selecting Columns, and Missing Indexes
    The famous N+1 pitfall arises when you make one query to get a list of items, followed by a new query for each item. Also, pulling more columns than needed (like a SELECT *) often forces Postgres to do extra sorting or scanning. Missing indexes show up as large, sequential scans when you only need a fraction of the rows.

  8. Tuning Postgres Configurations (e.g., WorkMem)
    Postgres has conservative defaults for memory and parallel settings, especially work_mem. If your queries sort or hash large data sets, you might see them spill to disk. Tweaking memory-related settings can drastically reduce disk I/O, boosting performance. Always verify changes with EXPLAIN ANALYZE to ensure no negative side effects.

  9. Keeping Statistics Fresh
    The ANALYZE command updates table statistics that guide the query planner. Stale or incomplete stats make Postgres choose suboptimal plans. If you upgrade or import large data sets, run ANALYZE (or rely on the auto-vacuum system) to keep row estimates in sync.

  10. Advanced Extensions and Logging
    Beyond pgMustard, Michael recommended turning on pg_stat_statements for global query insights, plus auto_explain for capturing plans of especially slow queries. These extensions can help identify whether a particular query is globally problematic before you jump into the details.

Interesting Quotes and Stories

  • Slow Websites and Missing Indexes: “Remain calm—somebody probably forgot an index somewhere. It’ll eventually finish.” This quip captures how small mistakes can lead to big slowdowns.
  • Local vs. Production: The story of discovering that a local toy data set is fine but production is crawling is a reminder that scale changes everything. “If you’ve got ten rows, you’ll never see that your query is actually doing thousands of sorts on disk.”

Key Definitions and Terms

  • EXPLAIN: A Postgres command that shows the intended query plan without actually running it.
  • EXPLAIN ANALYZE: Runs the query and measures the actual performance details, including time and row counts.
  • N+1 Problem: A common query issue where you run one main query, then loop with an additional query for each result item.
  • WorkMem: A Postgres setting for how much RAM each operation can use before spilling to disk.
  • pg_stat_statements: An extension that captures statistics about every executed query, useful for spotting slow or frequent statements.
  • auto_explain: An extension to automatically log the execution plans of queries exceeding a specified threshold.

Learning Resources

Overall Takeaway

Database optimization may feel daunting, but with the right tools and strategies, even occasional DB users can quickly improve query performance. Paying attention to realistic testing, indexing strategies, and reading your query plans (especially with a helper like pgMustard or other visualization tools) provides a clear path to speeding up your Postgres-backed apps. By focusing on understanding the underlying operations—rather than treating the database as a black box—you’ll deliver faster, more reliable software to your users.

Links from the show

Michael Christofides: @michristofides
Datagrip: jetbrains.com
pgMustard: pgmustard.com
pgMustard example 1: app.pgmustard.com
pgMustard example 2: app.pgmustard.com
pgMustard example 3: app.pgmustard.com
Arctype: arctype.com
Postico: eggerapps.at/postico
Laetitia Avrot Secrets of 'psql'— Video: youtube.com
Beekeeper Studio: beekeeperstudio.io
DBeaver: dbeaver.io
SQLite Browser: sqlitebrowser.org

Michael's new Up and Running with Git course: talkpython.fm/git
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon