Learn Python with Talk Python's 270 hours of courses

20 Recommended Packages in Review

Episode #346, published Tue, Dec 21, 2021, recorded Wed, Nov 24, 2021

Do you enjoy the "final 2 questions" I always ask at the end of the show? I think it's a great way to track the currents of the Python community. This episode focuses in on one of those questions: "What notable PyPI package have you come across recently? Not necessarily the most popular one but something that delighted you and people should know about?"

Our guest, Antonio Andrade put together a GitHub repository cataloging guests' response to this question over the past couple of years. So I invited him to come share the packages covered there. We touch on over 40 packages during this episode so I'm sure you'll learn a few new gems to incorporate into your workflow.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guest Introduction and Background

Antonio Andrade is a passionate Python developer deeply involved in the Python community. He enjoys tinkering with data and automation workflows, which led him to create a GitHub repository cataloging every notable PyPI package mentioned by past Talk Python to Me guests. Antonio joins the show to share highlights from these community-sourced recommendations and discuss how he uses Python for everything from data science projects to automating personal workflows. His enthusiasm for discovering and celebrating the small-yet-impactful packages in the Python ecosystem shines throughout this episode.

What to Know If You're New to Python

If you’re newer to Python and want to follow all the package and tooling discussions in this episode, here are a few essentials:

  • Familiarize yourself with the concept of virtual environments (e.g., venv or conda) so that installing these packages doesn’t conflict across projects.
  • Understand the basics of web frameworks (Flask or FastAPI) and how Python interacts with data tools (like pandas or SQLite).
  • Recognize that Python’s packaging ecosystem (e.g., PyPI, pip) is central to discovering and installing these libraries.

Key Points and Takeaways

  1. Antonio’s GitHub Repo of Package Recommendations Antonio built a GitHub repository to aggregate and track the “notable PyPI package” answers from Talk Python guests. Over time, these answers create a snapshot of emerging trends and hidden gems in the Python world. The repo makes it easy to revisit previously mentioned tools and even submit new ones from recent episodes.
  2. Tortoise ORM and Beanie (Async Database Libraries) Tortoise ORM simplifies async database interactions, aiming for a clean active-record style API. Beanie extends that idea to MongoDB, pairing with Pydantic models for an asynchronous NoSQL workflow. Both highlight how Python async capabilities unlock performance and cleaner code for database-driven apps.
  3. UMAP for Dimensionality Reduction UMAP (Uniform Manifold Approximation and Projection) helps data scientists reduce high-dimensional data into more approachable 2D or 3D forms. It’s popular for visualization of clustering or similarity across large datasets, and it’s especially powerful in combination with pandas or scikit-learn.
  4. Plotext: Terminal-Based Plotting Plotext allows you to generate textual plots directly in your terminal, making it handy for quick data inspections or logging. It closely mimics matplotlib syntax but outputs ASCII or character-based charts, so you don’t have to leave the command line to visualize data.
  5. FSSpec and Dynaconf FSSpec standardizes file I/O across multiple backends like local systems, S3, or other remote file stores with a uniform API. Dynaconf streamlines Python project configuration, enabling environment-specific settings and secrets management in a single, flexible system.
  6. AWS CDK (Cloud Development Kit) and Automation The AWS Cloud Development Kit (CDK) lets you define your cloud infrastructure in Python code, helping you script AWS resources rather than configuring them manually. This approach is especially powerful for those doing repeated deployments or advanced DevOps workflows in Python.
  7. Luigi for Workflow Management Luigi, developed at Spotify, orchestrates complex pipelines by modeling tasks and their dependencies in Python. It’s widely used in data engineering to chain together multiple steps, ensuring each step completes successfully before triggering the next.
  8. Pydantic for Data Validation Pydantic leverages Python type hints for fast data parsing and validation. It’s essential when receiving unstructured or user-generated data, automatically converting types or returning detailed error messages if the data doesn’t fit expected schemas.
  9. PipX for Python Command-Line Tools PipX is a tool-focused package manager letting you install and run Python CLI apps in isolated environments system-wide. It’s perfect for commands like black, glances, or pyjokes that you want to always have at hand without conflicting with other dependencies.
  10. Rich and Black for Code Formatting and Display Rich produces beautiful CLI elements (tables, syntax highlighting, and more), while Black is the “uncompromising” code formatter that removes style debates from your team. Combined, they make for a more pleasant Python development experience, from code consistency (Black) to visually rich terminal output (Rich).
  1. Seaborn for Data Visualization Seaborn builds on matplotlib to produce attractive statistical plots with less boilerplate. It automatically handles many aesthetic decisions, making it especially popular in data science for quick or publication-worthy charts.
  1. Stevedore for Plugin Management Stevedore makes building a plugin system in Python straightforward by allowing dynamic loading and management of separately distributed extensions. This is great for applications that want to let third-party packages “plug in” new behaviors at runtime.

Interesting Quotes and Stories

  • Antonio on the Inspiration behind the Repo: “I think from my point of view, it’s a way to celebrate the people and celebrate those small packages. They deserve a place where everyone can contribute.”
  • On the Value of Python: “What I think is most important is the time you save. If you want to prove value, Python is probably the best way to do it.”

Key Definitions and Terms

  • Async/await: A programming paradigm in Python enabling non-blocking operations, crucial for scaling I/O-driven apps.
  • ORM (Object Relational Mapper): A library that maps database rows to Python objects, removing much manual SQL writing.
  • CLI (Command-Line Interface): Textual interface to interact with software; many Python tools, like PipX or Rich, enhance CLI workflows.
  • Dimensionality Reduction: Techniques like UMAP that compress large, high-dimensional datasets into fewer dimensions for analysis or visualization.

Learning Resources

Here are a few resources to help you delve deeper into Python:

Overall Takeaway

This episode highlights how the Python community continually discovers and elevates smaller yet innovative tools in the ecosystem. From specialized databases and advanced CLI plotting libraries, to robust code formatting solutions, each package solves real-world challenges in a lightweight and Pythonic way. By curating these recommendations in a single GitHub repository, Antonio provides a snapshot of the best (and often lesser-known) tools for Python developers of all levels. Ultimately, the conversation reminds us that the Python ecosystem thrives on collaboration, open-source innovation, and a willingness to share knowledge to help each other succeed.

Links from the show

Antonio on Twitter: @AntonioAndrade
Notable PyPI Package Repo: github.com/xandrade/talkpython.fm-notable-packages

Antonio's recommended packages from this episode:
Sumy: Extract summary from HTML pages or plain texts: github.com
gTTS (Google Text-to-Speech): github.com

Packages discussed during the episode

1. FastAPI - A-W-E-S-O-M-E web framework for building APIs: fastapi.tiangolo.com

2. Pythonic - Graphical automation tool: github.com

3. umap-learn - Uniform Manifold Approximation and Projection: readthedocs.io

4. Tortoise ORM - Easy async ORM for python, built with relations in mind: tortoise.github.io

5. Beanie - Asynchronous Python ODM for MongoDB: github.com

6. Hathi - SQL host scanner and dictionary attack tool: github.com

7. Plotext - Plots data directly on terminal: github.com

8. Dynaconf - Configuration Management for Python: dynaconf.com

9. Objexplore - Interactive Python Object Explorer: github.com

10. AWS Cloud Development Kit (AWS CDK): docs.aws.amazon.com

11. Luigi - Workflow mgmt + task scheduling + dependency resolution: github.com

12. Seaborn - Statistical Data Visualization: pydata.org

13. CuPy - NumPy & SciPy for GPU: cupy.dev

14. Stevedore - Manage dynamic plugins for Python applications: docs.openstack.org

15. Pydantic - Data validation and settings management: github.com

16. pipx - Install and Run Python Applications in Isolated Environments: pypa.github.io

17. openpyxl - A Python library to read/write Excel 2010 xlsx/xlsm files: readthedocs.io

18. HttpPy - More comfortable requests with python: github.com

19. rich - Render rich text, tables, progress bars, syntax highlighting, markdown and more to the terminal: readthedocs.io

20. PyO3 - Using Python from Rust: pyo3.rs

21. fastai - Making neural nets uncool again: fast.ai

22. Numba - Accelerate Python Functions by compiling Python code using LLVM: numba.pydata.org

23. NetworkML - Device Functional Role ID via Machine Learning and Network Traffic Analysis: github.com

24. Flask-SQLAlchemy - Adds SQLAlchemy support to your Flask application: palletsprojects.com

25. AutoInvent - Libraries for generating GraphQL API and UI from data: autoinvent.dev

26. trio - A friendly Python library for async concurrency and I/O: readthedocs.io

27. Flake8-docstrings - Extension for flake8 which uses pydocstyle to check docstrings: github.com

28. Hotwire-django - Integrate Hotwire in your Django app: github.com

29. Starlette - The little ASGI library that shines: github.com

30. tenacity - Retry code until it succeeds: readthedocs.io

31. pySerial - Python Serial Port Extension: github.com

32. Click - Composable command line interface toolkit: palletsprojects.com

33. Pytest - Simple powerful testing with Python: docs.pytest.org

34. testcontainers-python - Test almost anything that can run in a Docker container: github.com

35. cibuildwheel - Build Python wheels on CI with minimal configuration: readthedocs.io

36. async-rediscache - An easy to use asynchronous Redis cache: github.com

37. seinfeld - Query a Seinfeld quote database: github.com

38. notebook - A web-based notebook environment for interactive computing: readthedocs.io

39. dagster - A data orchestrator for machine learning, analytics, and ETL: dagster.io

40. bleach - An easy safelist-based HTML-sanitizing tool: github.com

41. flynt - string formatting converter: github.com
 
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Talk Python's Mastodon Michael Kennedy's Mastodon