Memray: The endgame Python memory profiler
Episode Deep Dive
Guests Introduction and Background
- Pablo Galindo Salgado and Matt Wozniski both work on the Python Infrastructure team at Bloomberg.
- Pablo serves as a CPython core developer and a Python release manager. He’s deeply involved in Python’s evolution, including work on Python 3.10 and 3.11, plus continuous contributions to the language.
- Matt co-maintains Memray and PyStack alongside Pablo. He also moderates Python Discord. Together, they share a passion for improving Python’s tooling, performance, and observability.
What to Know If You're New to Python
If you're just starting with Python and want to follow along with the episode’s deeper technical details, here are a few essentials to help:
- Profiling Basics: Tools like
cProfile
(part of Python’s standard library) measure how much time your code spends in various functions. - Memory Management: Python manages memory largely via reference counting and a cycle garbage collector. Just remember that an object is freed when its references drop to zero (with some corner cases for reference cycles).
- Native Extensions: Many Python packages wrap C or Rust code for performance. This sometimes hides what’s really happening with memory or CPU usage.
Key Points and Takeaways
1. Memray: A New Level of Memory Profiling
Memray is a tracing profiler focused on memory usage. Unlike many other memory profilers, it captures all allocations, not just a sampling. This enables deeper analysis into Python and native layers (C/C++, Rust).
- Links and Tools:
2. Tracing vs. Sampling Profilers
Sampling profilers periodically capture a snapshot of your program’s state, whereas tracing profilers record every allocation or call. Tracing can be more accurate but often carries a higher performance overhead. Memray manages to keep that overhead as low as possible while capturing rich data.
- Links and Tools:
- cProfile in Python docs (CPU-time based, built-in)
3. Profiling Native Extensions (C, C++, Rust)
Many popular Python packages (like NumPy) have compiled components. Memray can see into these layers to help diagnose hidden memory usage that pure Python profilers might miss. This is especially helpful for data science workloads that offload computations to C/C++ or Rust.
- Links and Tools:
4. Flame Graph Visualizations
Memray offers flame graphs that show the memory impact of each call stack over time. You can zoom into a particular function chain or focus on peak memory usage. This is extremely useful for pinpointing which path of execution is hogging memory.
- Links and Tools:
5. Temporary Allocation Analysis
Memray can detect fast allocate-deallocate patterns, helping developers spot inefficient object churn. For instance, continuously appending to a list might lead to frequent resizing and copying, which can hurt performance.
- Links and Tools:
6. Attaching to Running Processes
One of Memray’s advanced features is the ability to attach to a running Python process without prior instrumentation. This requires “dark linker magic,” as Pablo and Matt jokingly describe it, but it offers tremendous debugging advantages for production systems.
- Links and Tools:
7. Inside PyMalloc and System Allocators
Python’s PyMalloc allocates memory in larger blocks (“arenas”) rather than calling the system malloc
for every new object. This can mask how memory is truly requested from the OS. Memray can show you whether PyMalloc’s caching is impacting memory usage or leaks.
- Links and Tools:
8. Detecting Memory Leaks (Leaks Mode)
For memory leak detection, you can configure Memray to track each object’s lifecycle. In some scenarios, you must disable PyMalloc (PYTHONMALLOC=malloc
) for exact leak-tracking, since PyMalloc reuses memory. Otherwise, Memray might report all usage as “still in use.”
- Links and Tools:
9. Python 3.12 and PEP 669
Pablo highlighted PEP 669 (Low Impact Monitoring) as part of Python 3.12’s improvements to streamline profiling and debugging by adding event-based APIs. These let tools like coverage and debuggers reduce performance overhead by subscribing only to the events they need.
- Links and Tools:
10. Sister Project PyStack
Pablo and Matt also maintain PyStack, a separate tool focusing on stack traces and crash debugging. The synergy between PyStack and Memray can give developers deeper insight into both CPU locks and memory usage, especially for big or long-running services.
- Links and Tools:
Interesting Quotes and Stories
- “We found bugs in CPython that were there for 15 years!” – The Memray team's thorough tracing approach uncovered memory issues in Python’s core that went undetected for over a decade.
- “It’s about understanding what actually happened when I create an object, not just the final memory total.” – Pablo emphasizing why capturing every allocation matters.
- “You have no idea how wild it can get under the hood with linkers.” – Matt describing the complexity of enabling a live attachment mode for Memray.
Key Definitions and Terms
- Tracing Profiler: Records every function call or memory allocation rather than taking periodic samples.
- Sampling Profiler: Periodically captures program state, typically less overhead but lower fidelity.
- PyMalloc: Python’s specialized allocator for small objects, caching memory blocks to speed up repeated allocations.
- PEP 669: Proposed enhancements to Python’s internal tracing, improving the profiling experience with minimal overhead.
- Flame Graph: A visual representation of hierarchical data such as call stacks, where the X-axis shows the cumulative cost (time or memory).
Learning Resources
Here are some ways to dive deeper into understanding Python memory, performance, and tooling:
- Python Memory Management and Tips (Talk Python Training): Explore reference counting, garbage collection, and performance best practices.
- Memray GitHub Repo: Get the code, documentation, and examples for Memray.
- PyStack GitHub Repo: Companion tool for diagnosing lockups and crashes.
Overall Takeaway
Memray is a powerful new entrant in the Python memory-profiling space, granting unprecedented visibility into both Python and native-layer allocations. By capturing every allocation, developers can spot memory leaks, analyze performance bottlenecks, and even optimize fundamental usage patterns. Whether you’re building enterprise-scale apps, deep learning pipelines, or just exploring performance quirks in your side project, Memray’s thorough insights—and synergy with the broader Python ecosystem—can help you write faster, more memory-efficient software.
Links from the show
Matt Wozniski: github.com
pytest-memray: github.com
PEP 669 – Low Impact Monitoring for CPython: peps.python.org
Memray discussions: github.com
Mandlebrot Flamegraph example: bloomberg.github.io
Python allocators: bloomberg.github.io
Profiling in Python: docs.python.org
PEP 693 – Python 3.12 Release Schedule: peps.python.org
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm
--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy