Learn Python with Talk Python's 270 hours of courses

Testing in Radio Astronomy with Python and pytest

Episode #405, published Fri, Mar 3, 2023, recorded Mon, Feb 13, 2023

So you know about dependencies and testing, right? If you're talking to a DB in your app, you have to decide how to approach that with your tests. There are lots of solid options you might pick and they vary by goals. Do you mock out the DB layer for isolation or do you use a test DB to make it as real as possible? Do you just punt and use the real DB for expediency? What if your dependency was a huge array of radio telescopes and a rack of hundreds of bespoke servers? That's the challenge on deck today were we discuss testing radio astronomy with pytest with our guest James Smith. He's a Digital Signal Processing engineer at the South African Radio Astronomy Observatory and has some great stories and tips to share.


Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guests Introduction and Background

James Smith is a Digital Signal Processing (DSP) engineer at the South African Radio Astronomy Observatory (SARAO). He works on real-time signal processing for the Meerkat radio telescope array, which is part of the Square Kilometre Array (SKA) radio telescope project. James started programming with C in his school days, then moved to Python for its flexibility and readability. His passion lies at the intersection of engineering, Python, and radio astronomy, where he helps design and test massive data pipelines that must run reliably in production on specialized hardware like FPGAs and GPUs.

What to Know If You're New to Python

Here are a few tips to get the most out of this episode, especially if you’re less experienced with Python:

  • Know the basics of Python functions and modules, so you can follow how James uses pytest to test large, multifaceted projects.
  • Some familiarity with pip and installing libraries will help when references are made to packages like PyCUDA or other scientific libraries.
  • Understanding async and await in Python is useful, since James discusses asynchronous data processing and pipelines.
  • A simple sense of how Python interacts with hardware and network I/O will help you appreciate the complexity of testing in real-time, high-throughput systems.

Key Points and Takeaways

  1. Real-Time Radio Astronomy and the Need for Testing In radio astronomy, data arrives in massive quantities from physically distributed antenna arrays. Testing is essential to ensure that every subsystem, from receiving signals to generating final data products, stays accurate. If a bug creeps in when working with real-time data at gigabits or even terabits per second, the entire pipeline can break silently, leading to invalid scientific results. Robust tests confirm that signals maintain expected fidelity despite complex transformations.
  2. Correlators and Data Reduction A correlator is specialized hardware or software that multiplies signals from different antennas to create a combined dataset with higher resolution. This step reduces raw data (on the order of terabits per second) into a smaller but still substantial data stream. Without automated tests to check correlation accuracy, astronomers would have no assurance that the processed signals are scientifically valid.
  3. Moving from FPGAs to GPUs James described how the original correlator hardware, called Scarab, relied heavily on FPGAs for fast data processing. With modern GPUs offering higher memory bandwidth and powerful parallel computation libraries (e.g., CUDA), the team can shift to more flexible off-the-shelf platforms. Testing ensures that Python wrappers and memory transfers to the GPU still preserve data integrity.
  4. High-Throughput Data Pipelines in Python Handling real-time data at 35 Gb/s per antenna means you have to be both memory- and network-efficient in your Python code. Using asynchronous I/O (async/await) helps by orchestrating data flows in a non-blocking manner. This architecture requires thorough tests to confirm that streams of data stay in sync and that no packet loss or data corruption occurs.
  5. Extensive Integration Testing with pytest Rather than just testing in-memory functions, James’s team spins up actual correlator hardware or simulated FPGA boards to run full end-to-end tests. They use pytest fixtures to provision the environment, feed in deterministic signals, and capture results. This method helps validate both the code and the hardware configurations in one integrated step.
  6. Custom Reporting and Scientific Validation Traditional test outputs (pass/fail) lack the nuance to verify scientific data pipelines, which often need numerical ranges or noise thresholds. James uses pytest report logs and even LaTeX (via PyLaTeX) to generate PDFs, showing not just if the pipeline “passed,” but how close the results match expected signals. This format is extremely valuable for peer review and compliance with scientific specifications.
  7. Simulation versus Real Hardware Some tests are run via FPGA-based simulators or specialized code that emulates telescope data to confirm system performance before actual telescope time is used. Others use the real hardware but feed in known test signals. Both strategies prevent expensive telescope downtime, while ensuring the final system is scientifically accurate.
  8. Adapting Testing Culture to Scientific Projects Many scientific software efforts historically focus on quick prototypes rather than robust engineering. James highlighted the cultural shift toward adopting formal testing at SARAO, ensuring data products meet rigorous reliability. This fosters transparency across astronomy teams, allowing them to trust the pipeline and interpret the results with higher confidence.
  9. Performance Tuning with Confidence Because real-time systems are so sensitive to timing, code optimization can introduce subtle errors. By having comprehensive automated tests, engineers can refactor or optimize GPU memory transfers, network buffers, and concurrency without fretting over correctness regressions. If the test suite still passes, they can safely move ahead with production deployments.
  10. Data Management and Storage After correlation, the data rate might drop to four or five gigabits per second—still huge. SARAO employs a tiered storage approach, archiving final data products in Cape Town for further analysis and sharing. Thorough system tests help confirm that the pipeline reliably generates expected data products before they hit long-term storage.

Interesting Quotes and Stories

  • “I have a big telescope at work; I don’t need a small one at home.” James’s humorous take on why citizen science radio telescopes might be exciting but aren’t his personal hobby.
  • “With real-time data, if something breaks, you might not even know until you realize the science results don’t make sense.” Highlights why advanced testing is essential in high-stakes scientific applications.

Key Definitions and Terms

  • Correlator: Hardware or software that multiplies signals from an array of antennas to form a unified signal for higher resolution and data reduction.
  • FPGA: Field Programmable Gate Array, a reconfigurable integrated circuit used for specialized, high-speed computations.
  • PyCUDA: A Python wrapper for NVIDIA’s CUDA libraries, making it easier to write GPU-accelerated code in Python.
  • pytest: A popular Python testing framework that simplifies writing and organizing tests, offering fixtures, parametric testing, and plugins for advanced use cases.

Learning Resources

Below are a few courses and materials if you’d like to go deeper into Python testing and related techniques:

Overall Takeaway

Radio astronomy demands extraordinarily reliable software pipelines due to the massive data rates and real-time nature of the science. Using Python for both on-site hardware control and data correlation is highly effective—provided robust testing is in place. From end-to-end integration tests on bespoke FPGA boards and GPUs to generating LaTeX reports that show numerical performance, James’s team demonstrates how to bring software engineering rigor into a cutting-edge scientific environment. Their work stands as an inspiring example of blending open-source tools, testing culture, and scientific research for high-impact results.

Links from the show

GPU-based correlator for MeerKAT: github.com
Meerkat: sarao.ac.za
SARAO: sarao.ac.za
Skarab server: peralex.com
pycuda: documen.tician.de
Commercial Telescopes: telescope.com
PyLaTeX: github.com
Linearity Test Code: talkpython.fm
Correlator Context: talkpython.fm
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon