Learn Python with Talk Python's 270 hours of courses

Memray: The endgame Python memory profiler

Episode #425, published Fri, Aug 4, 2023, recorded Tue, Jun 20, 2023

Understanding how your Python application is using memory can be tough. First, Python has it's own layer of reused memory (arenas, pools, and blocks) to help it be more efficient. And many important Python packages are built in natively compiled languages like C and Rust often times making that section of your memory opaque. But with Memray, you can way deeper insight into your memory usage. We have Pablo Galindo Salgado and Matt Wozniski back on the show to dive into Memray, the sister project to their pystack one we recently covered.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guests Introduction and Background

  • Pablo Galindo Salgado and Matt Wozniski both work on the Python Infrastructure team at Bloomberg.
  • Pablo serves as a CPython core developer and a Python release manager. He’s deeply involved in Python’s evolution, including work on Python 3.10 and 3.11, plus continuous contributions to the language.
  • Matt co-maintains Memray and PyStack alongside Pablo. He also moderates Python Discord. Together, they share a passion for improving Python’s tooling, performance, and observability.

What to Know If You're New to Python

If you're just starting with Python and want to follow along with the episode’s deeper technical details, here are a few essentials to help:

  1. Profiling Basics: Tools like cProfile (part of Python’s standard library) measure how much time your code spends in various functions.
  2. Memory Management: Python manages memory largely via reference counting and a cycle garbage collector. Just remember that an object is freed when its references drop to zero (with some corner cases for reference cycles).
  3. Native Extensions: Many Python packages wrap C or Rust code for performance. This sometimes hides what’s really happening with memory or CPU usage.

Key Points and Takeaways

1. Memray: A New Level of Memory Profiling

Memray is a tracing profiler focused on memory usage. Unlike many other memory profilers, it captures all allocations, not just a sampling. This enables deeper analysis into Python and native layers (C/C++, Rust).

2. Tracing vs. Sampling Profilers

Sampling profilers periodically capture a snapshot of your program’s state, whereas tracing profilers record every allocation or call. Tracing can be more accurate but often carries a higher performance overhead. Memray manages to keep that overhead as low as possible while capturing rich data.

3. Profiling Native Extensions (C, C++, Rust)

Many popular Python packages (like NumPy) have compiled components. Memray can see into these layers to help diagnose hidden memory usage that pure Python profilers might miss. This is especially helpful for data science workloads that offload computations to C/C++ or Rust.

4. Flame Graph Visualizations

Memray offers flame graphs that show the memory impact of each call stack over time. You can zoom into a particular function chain or focus on peak memory usage. This is extremely useful for pinpointing which path of execution is hogging memory.

5. Temporary Allocation Analysis

Memray can detect fast allocate-deallocate patterns, helping developers spot inefficient object churn. For instance, continuously appending to a list might lead to frequent resizing and copying, which can hurt performance.

6. Attaching to Running Processes

One of Memray’s advanced features is the ability to attach to a running Python process without prior instrumentation. This requires “dark linker magic,” as Pablo and Matt jokingly describe it, but it offers tremendous debugging advantages for production systems.

7. Inside PyMalloc and System Allocators

Python’s PyMalloc allocates memory in larger blocks (“arenas”) rather than calling the system malloc for every new object. This can mask how memory is truly requested from the OS. Memray can show you whether PyMalloc’s caching is impacting memory usage or leaks.

8. Detecting Memory Leaks (Leaks Mode)

For memory leak detection, you can configure Memray to track each object’s lifecycle. In some scenarios, you must disable PyMalloc (PYTHONMALLOC=malloc) for exact leak-tracking, since PyMalloc reuses memory. Otherwise, Memray might report all usage as “still in use.”

9. Python 3.12 and PEP 669

Pablo highlighted PEP 669 (Low Impact Monitoring) as part of Python 3.12’s improvements to streamline profiling and debugging by adding event-based APIs. These let tools like coverage and debuggers reduce performance overhead by subscribing only to the events they need.

10. Sister Project PyStack

Pablo and Matt also maintain PyStack, a separate tool focusing on stack traces and crash debugging. The synergy between PyStack and Memray can give developers deeper insight into both CPU locks and memory usage, especially for big or long-running services.

Interesting Quotes and Stories

  • “We found bugs in CPython that were there for 15 years!” – The Memray team's thorough tracing approach uncovered memory issues in Python’s core that went undetected for over a decade.
  • “It’s about understanding what actually happened when I create an object, not just the final memory total.” – Pablo emphasizing why capturing every allocation matters.
  • “You have no idea how wild it can get under the hood with linkers.” – Matt describing the complexity of enabling a live attachment mode for Memray.

Key Definitions and Terms

  • Tracing Profiler: Records every function call or memory allocation rather than taking periodic samples.
  • Sampling Profiler: Periodically captures program state, typically less overhead but lower fidelity.
  • PyMalloc: Python’s specialized allocator for small objects, caching memory blocks to speed up repeated allocations.
  • PEP 669: Proposed enhancements to Python’s internal tracing, improving the profiling experience with minimal overhead.
  • Flame Graph: A visual representation of hierarchical data such as call stacks, where the X-axis shows the cumulative cost (time or memory).

Learning Resources

Here are some ways to dive deeper into understanding Python memory, performance, and tooling:

Overall Takeaway

Memray is a powerful new entrant in the Python memory-profiling space, granting unprecedented visibility into both Python and native-layer allocations. By capturing every allocation, developers can spot memory leaks, analyze performance bottlenecks, and even optimize fundamental usage patterns. Whether you’re building enterprise-scale apps, deep learning pipelines, or just exploring performance quirks in your side project, Memray’s thorough insights—and synergy with the broader Python ecosystem—can help you write faster, more memory-efficient software.

Links from the show

Pablo Galindo Salgado: @pyblogsal
Matt Wozniski: github.com
pytest-memray: github.com
PEP 669 – Low Impact Monitoring for CPython: peps.python.org
Memray discussions: github.com
Mandlebrot Flamegraph example: bloomberg.github.io
Python allocators: bloomberg.github.io
Profiling in Python: docs.python.org
PEP 693 – Python 3.12 Release Schedule: peps.python.org
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon