Monorepos in Python

Episode #399, published Wed, Jan 18, 2023, recorded Fri, Jan 13, 2023

Episode Deep Dive Links Transcript

Monorepos are contrary to how many of us have been taught to use source control. To start a project or app, the first thing we do is create a git repo for it. This leads to many focused and small repositories. A quick check of my GitHub account shows there are 179 non-fork repositories. That's a lot but I think many of us work that way.

But it's not like this with monorepos. There you create one (or a couple) repositories for your entire company. This might have 100s or 1,000s of employees working on multiple projects within the single repo. Famously, Google, Meta, Microsoft, and Airbnb all employ very large monorepos with varying strategies of coordination.

On this episode, we have David Vujic here to give us his perspective on monorepos as well as highlight an architectural pattern and set of tools for accomplishing this in Python.

Play on YouTube

Watch the live stream version

Episode Deep Dive

Guests introduction and background

David Vilyek is a seasoned Python developer and contributor to open-source projects, particularly around monorepo tooling. He has worked in various teams and companies spanning design, web, Clojure, and Python back-ends. His expertise in functional programming and architecture led him to explore how monorepos could be effectively managed in Python—particularly adapting the Polylith architecture from Clojure into the Python ecosystem. David has a natural curiosity for how code can be structured, deployed, and integrated seamlessly across multiple projects.

What to Know If You’re New to Python

If you’re just getting started with Python and want to understand monorepos and advanced architecture patterns, the core language concepts such as modules, packages, and virtual environments will really help. Knowing how Python imports and organizes files on disk, as well as having some experience working with pip or poetry, will make following discussions about managing dependencies and building code in a monorepo setting much clearer.

It’s also helpful to have a basic understanding of source control (especially Git) so you can relate to the conversation about partial clones, shallow clones, and other advanced Git operations.

Key points and takeaways

1. Why Monorepos Matter Monorepos gather all of a team’s or company’s code into a single repository, providing a unified way to manage shared libraries, dependencies, and inter-project changes. This is notably different from housing every service or library in its own isolated repo. The conversation highlighted major tech companies like Google, Meta, Microsoft, and Airbnb using giant monorepos successfully and how they handle massive scale.

Links and tools:
- Meta (Facebook) Engineering
- Google Open Source

2. Microservices vs. Monolith vs. Monorepo Monolithic applications bundle everything into one application deployed at once, whereas microservices split them into standalone services. Monorepos, however, are about unified code storage rather than single or multiple deployments. The discussion clarified that having a monorepo does not mean you are building a monolithic application—small services can still exist but share a single source-control home.

Links and tools:
- GitHub (for hosting many monorepos)

3. The Polylith Architecture for Python Originally popularized in the Clojure world, Polylith organizes code into small “components” that can be composed into different “projects” and “bases.” Components are reusable Lego-like bricks (functions or sets of related functions) so that you can share them across multiple deployments in one repository. This architecture encourages a clean separation of concerns and drastically reduces code duplication.

Links and tools:
- Polylith Docs

4. Poetry Polylith Plugin David built a plugin for Poetry (the Python package and dependency manager) to handle monorepos in a Polylith style. This plugin helps with building artifacts (wheels and source distributions) that assemble only the relevant “components” per project without dragging the entire monorepo along. It also includes CLI commands to display a workspace overview, show dependency usage, and more.

Links and tools:
- Poetry
- David’s Poetry Polylith Plugin (GitHub)

5. Partial Clones, Shallow Clones, and Sparse Checkouts Managing large repositories often requires advanced Git operations. Partial clones (git clone --filter=blob:none), shallow clones (git clone --depth=1), and sparse checkouts let developers avoid downloading unneeded history or directories. This can speed up CI builds and make local development more efficient.

Links and tools:
- Git sparse checkout documentation
- Up and Running with Git course

6. Versioning in Monorepos A key advantage of monorepos is the ability to synchronize changes across projects. Rather than tagging multiple separate repos, you maintain a single version history. Changes affecting common libraries are caught in continuous integration, ensuring no project is left behind on incompatible versions. Still, teams can isolate older components or create backward-compatible new ones if the entire codebase can’t be updated immediately.

Links and tools:
- Semantic Versioning (semver)

7. Dependency Management for Multiple Apps With monorepos, you can standardize dependency versions across many services while still selectively installing or building only what each app needs. Tools like Poetry or Pants can define per-project constraints without losing track of shared modules. Properly structuring your code means minimal duplication and robust update paths.

Links and tools:
- Pants Build

8. Editor and Dev Environment Setup Monorepos can overwhelm some editors if they index an entire codebase. Techniques like partial clones and dedicated workspace settings help. David emphasized using advanced editors such as PyCharm or Emacs configured for large Python projects, which automatically detect how files interrelate and provide better refactoring tools.

Links and tools:
- PyCharm IDE
- Emacs

9. Polylith vs. Other Solutions From user feedback, some companies attempt submodules or other microservice patterns to share code, but these can become complicated to maintain. Polylith aims to strike a balance between microservices and monolithic structures by emphasizing composability. While other approaches like Nx or Lerna (in the JavaScript world) exist, Polylith’s method is particularly Python-friendly and encourages pure Python packaging standards.

Links and tools:
- Nx (for JS)
- Lerna (for JS)

10. Functional Programming Influences on Python Architecture David’s background with Clojure and functional programming shaped how he sees code organization. Concepts like stateless functions, minimized shared mutable state, and building up applications from “pure” components work cleanly even in Python. Many monorepo best practices—such as small, composable blocks—mirror functional programming approaches.

Links and tools:
- Clojure Official Site

Interesting quotes and stories

“If you're going to build a new service, you just pick the components you already have, combine them in a base, and that’s it. You don’t need to copy-paste code anymore.” — David explaining how Polylith drastically reduces duplication in monorepos.

“People often confuse the idea of a monorepo with a monolith, but they’re not the same at all.” — Emphasizing that a single repository can still release multiple independent services or microservices.

"When you want to change a function signature, you can instantly see every place it's used. That's the beauty of a monorepo." — Showing how easy cross-repository refactoring can be in a unified codebase.

Key definitions and terms

Monorepo: A single repository containing multiple projects or services, often with shared libraries.
Monolith vs. Microservices vs. Monorepos: Monoliths bundle all features into one deployment; microservices break them into smaller deployable units; monorepos are about shared source control rather than deployment boundaries.
Polylith: A software architecture approach using reusable “components” that can be composed into multiple “projects” and “bases,” especially effective in monorepos.
Partial Clone: A Git clone that omits certain file content (like large blobs in history) until needed.
Shallow Clone: A Git clone with limited commit history (e.g., depth=1 for just the latest commit).
Sparse Checkout: A Git feature allowing you to check out only specific directories or files from a repository.

Learning resources

If you want to go deeper into Python fundamentals, monorepos, and code organization:

Python for Absolute Beginners – Great if you’re new to Python and want to quickly become confident writing packages and modules.
Up and Running with Git – Master Git concepts such as partial clones, shallow clones, and advanced merges—all crucial for large monorepos.
Poetry Documentation – Official docs for managing Python dependencies and packaging.
Polylith Docs – Detailed explanation of the Polylith architecture approach and tooling.

Overall takeaway

Monorepos offer significant advantages for teams that need to share code, maintain consistent dependencies, and refactor quickly. While many think monorepos imply one big monolithic app, the conversation clarifies that it’s more about unified code storage than deployment style. Tools like Polylith, partial/sparse Git clones, and plugins for Poetry can streamline the developer experience even in large codebases. The episode underscores how small, composable building blocks, plus automated checking and CI, enable a cleaner, more maintainable Python codebase—regardless of how many apps you ultimately deploy.

Links from the show

David on Twitter: @davidvujic
David on Mastodon: @davidvujic@mastodon.nu
Monorepo definition: wikipedia.org
git-sizer tool for large repos: github.com
git partial clones: docs.gitlab.com
git sparse checkout: git-scm.com
Polylith architecture: polylith.gitbook.io
Article: A simple & scalable Python project structure: davidvujic.blogspot.com
The last Python Architecture you will ever need?: davidvujic.blogspot.com
python-polylith plugin for poetry: github.com
Watch this episode on YouTube: youtube.com
Episode #399 deep-dive: talkpython.fm/399
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #399 deep-dive: talkpython.fm/399

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Monorepos are contrary to how many of us have been taught to use source control.

00:04 To start a project or app, the first thing we do is create a Git repository for it.

00:09 This leads to many focused and small repositories.

00:12 A quick check on my GitHub account shows that I have 179 non-fork repositories.

00:18 That's a lot, but I think many of us work that way.

00:21 It's not like this with monorepos.

00:23 With monorepos, you create one or a couple of repositories for your entire company.

00:28 This might have hundreds or thousands of employees working on multiple projects within a single repository.

00:34 Famously, Google, Meta, Microsoft, and Airbnb, amongst others, all employ very large monorepos with varying strategies for coordination.

00:44 On this episode, we have David Vilyek here to give us his perspective on monorepos,

00:49 as well as highlight an architectural pattern and set of tools for accomplishing this in Python.

00:55 This is Talk Python To Me, episode 399, recorded January 13th, 2023.

01:14 Welcome to Talk Python To Me, a weekly podcast on Python.

01:17 This is your host, Michael Kennedy.

01:19 Follow me on Mastodon, where I'm @mkennedy, and follow the podcast using @talkpython, both on fosstodon.org.

01:26 Be careful with impersonating accounts on other instances.

01:29 There are many.

01:30 Keep up with the show and listen to over seven years of past episodes at talkpython.fm.

01:36 We've started streaming most of our episodes live on YouTube.

01:39 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming shows and be part of that episode.

01:46 This episode is brought to you by Microsoft for Startups Founders Hub.

01:51 Get early stage support for your startup without the requirement to be VC-backed or verified at talkpython.fm/foundershub.

01:59 It's also brought to you by Brilliant.org.

02:01 Stay on top of technology and raise your value to employers or just learn something fun in STEM at Brilliant.org.

02:09 Visit talkpython.fm/brilliant to get 20% off an annual premium subscription.

02:14 David, welcome to Talk Python To Me.

02:17 Thank you.

02:18 I'm really excited to be on the podcast.

02:21 I'm really excited to have you on the podcast.

02:23 And we get to talk about a couple of interesting ideas.

02:27 We get to talk about software architecture.

02:30 People may know I'm a big fan of architecture.

02:33 I think putting your software together bright makes all the difference.

02:37 We're going to talk about some ideas that are new to me, this polyleth idea that you're an advocate and fan of and how it applies to Python.

02:45 We're also going to focus a good portion of this conversation on monorepos.

02:50 And what the heck are monorepos, right?

02:52 Yeah.

02:53 That'll be a lot of fun.

02:54 And I'm really looking forward to it.

02:56 But let's hear your story first.

02:59 How did you get into programming and Python?

03:01 Yeah.

03:01 How did I get into programming?

03:02 Well, so I guess it was when I was a kid.

03:05 My dad bought me Commodore 64.

03:08 It was like way back in the 1980s.

03:11 So that's when I started learning basic programming language and how to write things that are code and not pure text.

03:24 But then I went on a different path.

03:28 I was working mostly with design and things like that.

03:34 And the first thing I knew was a job in our business was a web designer.

03:41 So that was what I wanted to be at first.

03:44 So I started to learn JavaScript and copy and pasted some snippets of code and stuff like that.

03:51 Yeah.

03:51 And that's the era.

03:52 I'm just guessing from the starting computer where you were sort of in time when this must have happened.

03:58 And that was probably before all these crazy JavaScript front-end frameworks and all that.

04:02 And it was more of how do you visually design this page with graphic and more art and more focused on that, right?

04:09 Yeah.

04:09 Before like jQuery and before like all these React and all of that good stuff that we have today.

04:17 Sure.

04:18 Yeah.

04:19 And with Python, I think I started about 2015 with Python.

04:23 I started with Python 2.7 and then learning all I think in that period, I also learned started to think in more like functional programming style, more like functional-ish.

04:37 Yeah.

04:37 And that's coming from Python actually.

04:40 And I was very much into Node.js at that time too.

04:44 So I think around 2015, and I've been like jumping back and forth between different languages.

04:50 I'm a really huge fan of Clojure 2, which is like 100% functional.

04:55 So I've been like visiting some different kinds of programming languages and styles and things like that.

05:04 And now I'm back to full-time Python on my day job and almost full-time Python during nights because I really love to code on my spare time too.

05:15 When I assume there is any chance to code.

05:19 So I'll take that one.

05:21 Yeah, absolutely.

05:22 I mean, how much of open source is there's programmers and we have this project we've got to work on, but we really want to build this other thing and we're super passionate about it.

05:30 We just end up building it and sharing it and it takes off, you know?

05:33 And I think that's a very common story for sure.

05:35 So before we move on, how has working in Python influenced something like Clojure or Clojure influenced your Python thinking?

05:46 They're fairly different languages, right?

05:48 Yeah.

05:48 Functional being less stateful, which is a really big different way of programming.

05:52 Yeah.

05:53 Well, I think what I learned from Python was like the elegance and the importance of writing elegant and simplistic code.

06:04 I was really impressed by the Zen of Python.

06:08 You know, if you type import this in a shell, you get this nice list of how to write your code.

06:14 I really liked that idea.

06:16 So I guess that's where I started to think about keeping it simple, clean and short.

06:22 Yeah.

06:22 And with Clojure, it was like that's kind of a different, total different syntax, but also digging into a lot of functional aspects and how to think about state, how to separate the calculations from actions and data and things like that.

06:43 And I think I've brought a lot of those ideas back to when I'm back at Python, how to separate different kinds of code that you write.

06:53 Yeah, I can see that.

06:54 I feel like Python is really interesting because you can choose to only focus on little parts of it.

07:00 That's good for beginners because they only have to learn a little part, but it's also good for people who have particular styles that they like to work.

07:07 If you want to write functional Python, you don't have to create any global variables or any classes or any, you can just, you can write it that way, but you could completely go really deep OO patterns and can do that in Python too if you want.

07:21 Right.

07:22 It's completely up to you.

07:23 Yeah.

07:23 And I like that kind of freedom.

07:25 It's, you can't, you don't, you're not the force to do either this or that you, you can learn an experiment.

07:34 And especially when you, if you use libraries, they are like designed in different ways too.

07:41 So you can, you don't have to limit yourself to only use that kind of library or this kind of library.

07:46 So I really like Python, the capabilities of Python when it comes to that, but it's not very strict in any kind of format.

07:55 I like that too.

07:56 And I think other languages are seeing that and adopting that as well.

08:00 You know, you see Swift with their playgrounds and well, Swift in general and .NET with their, you know, maybe we don't need namespaces and classes and static main void for everything to get started.

08:09 And they're adopting those types of things.

08:12 You know, we're talking about this like, well, functional people might want to write this way and more OO oriented people that way, but that also could just be you in different situations.

08:20 You know, right now, this is the right tool to solve it.

08:23 And other times here's a different way to solve a different problem, but you can just stay in the same tools and the same editors and the same ecosystem.

08:30 It's cool.

08:31 Yeah, definitely.

08:32 Yeah.

08:32 All right.

08:33 Well, let's start in on the first half of our, our main topic here is the mono repo.

08:41 Now, yeah, it's really easy to confuse what a mono repo is with a mono lith versus same microservices.

08:48 Yeah.

08:49 Those are not really at all the same thing.

08:51 In fact, they might actually be opposites in a sense, a mono repo and a mono lith to some degree.

08:56 So maybe kick us off by telling us what is a mono repo here?

09:01 A mono repo is, I don't think it's that complicated, but I actually also, before I started to dig into this thing more,

09:10 I also had almost put like an equal sign between mono lift and mono repo because that's the way I have used the writing code.

09:18 I was in the dot net and C# world a lot.

09:23 And you like building your website and you have a data later and you have a domain layer and everything was in a repo.

09:28 So I guess Microsoft was like a reaction to that to separate code into isolated environments.

09:38 And you can have this nice and clean little code base and you have that does one thing and you have this other code base that does a different thing.

09:49 So if instead of just having the user authentication bit completely just woven into the code, we can make a little API that we call over JSON that does the authentication.

09:58 And then here's the one has the catalog.

09:59 Yeah.

10:00 And we could write just a little bit of code and, you know, the, I guess the benefit, right, is that whoever's working on the catalog bit, they theoretically can just stay focused on that little bit of code and not the entire system.

10:11 Right.

10:12 Yeah.

10:12 With a microservice story.

10:14 Yeah.

10:14 A monorepo is, I think, from the way I see it, it's like, it's a Git or any version control repo that has basically all of your code in the same repo, same repository.

10:31 And that doesn't necessarily mean that it's one program or one app that you are going to build or compile into.

10:39 You can have several projects or artifacts in that repo.

10:44 And I guess that's why it's called monorepo because you can have multiple things in it.

10:50 So I guess that's the difference between a monolith where you have, where you actually build one app and deploy it to one place.

10:57 Right, right.

10:59 The monolith is the opposite of the microservice style.

11:04 Yeah.

11:04 Whereas the monorepo is just a way of organizing your code and sharing how do you propagate changes, look at dependencies across either libraries or, there are companies that take this really far.

11:18 Crazy, crazy far.

11:20 Like Google and Facebook, I believe, haven't worked on it, but I hear that they have one repo.

11:26 Yeah.

11:28 Or like all of it.

11:29 What, one?

11:30 Really?

11:31 Just like, what's the checkout story look like on that?

11:34 It's got to be a lot of code.

11:35 Yeah.

11:36 It's got to be a lot of code.

11:37 I believe Google, I'm probably going to misassociate this, but I think Google uses Bazel and there's different tools that allow them that are kind of not just Git, but something that can handle that scale of code.

11:54 So it really, when I think about organizing my code, it's either me or me and a couple of people working on the code and it's pretty contained.

12:03 But when you start to think about hundreds or thousands of people across projects, it starts to get really wild, right?

12:09 Yeah.

12:10 That has to be a completely different story.

12:14 We're really fun to see how they do, how they work in the teams.

12:19 Yeah, absolutely.

12:20 So maybe we could talk about some of the, you know, why, if you're not doing this, you know, why would you do that?

12:29 Like it seems, you know, you highlighted that there's kind of these two trends that you saw out there in some of your articles.

12:38 And we'll talk about the articles and link to them.

12:40 You talked about seeing a trend of more people trending towards this monorepo and more people or other groups of people trending towards having more small repos.

12:52 Yeah.

12:52 For you take my little microservice example, the user access service might be its own repository separate from the catalog service.

13:01 Whereas others might say, we're going to put all that together and all the utilities and that other data reporting project.

13:08 And all of that goes into one giant repo, even though there's a big team on it, right?

13:11 Yeah.

13:12 Well, I think from my experience, what I've seen joining different teams and different companies that I've seen exactly that some quite recently, I joined a team at a company with several teams.

13:28 And they actually migrated from a monorepo to several repositories.

13:34 And it was part of their microservice journey, as they call it, because they had one repo with all their code.

13:44 But that code base was so difficult to work with.

13:47 So they kind of wanted to extract one app at a time into a separate repository just to be able to deploy that one and work with it in a reasonable way with tooling support and things like that.

14:02 So before that, I was at a different company joining a different team.

14:07 And we went the total opposite way.

14:10 We had a couple of microservices that were quite easy to work with.

14:20 But we identified issues or problems with it because maybe there's one service that has outdated dependencies.

14:29 So the biggest problem was the actual code duplication because we had one service that we had developed one thing for.

14:37 And we had another service that we needed code that was very much like the thing we had in that other service.

14:47 So I guess the solution could be to extract that one into a library, but then you have three repositories.

14:53 And I guess there's difficult to find that good balance between one or the other.

15:00 I agree.

15:00 And that extracting, I mean, that's certainly one of the possibilities.

15:05 As you say, well, we're going to, you know what, we now have a third repository and we have the data access repository and package.

15:13 And, you know, that's probably not the type of thing you publish to PyPI, but it's very likely something you would publish to some kind of internal dependency artifact.

15:23 System that you would depend upon, right?

15:25 But the problem is, if it's used in just these two places, it's, and it sounds like that sort of description, the kind of the team is probably working on both sides of those microservices and they understand the broader system.

15:39 But as it grows and more people depend upon it, it's harder to understand this little standalone project.

15:45 Who is using it and what ways are they completely?

15:48 Can we make a change here?

15:50 If we refactor this, who do we talk to about changing even just the signature of a function?

15:56 How do we reach out to the other parts of code or other stakeholders and say, look, we need to change this function, but we got to, you know, we're changing the data model and you're going to have to figure out how to go along.

16:12 On the other hand, if all of those projects were together in a giant mono repo, we have tooling that understands, well, what functions call this function or what thing imports this class or who's using it?

16:26 Is it used at all?

16:27 Actually, maybe you could delete it.

16:29 You thought someone was using it and no one's using it, right?

16:31 There's a lot of understanding of the broader integration if it's all there with you, right?

16:38 Definitely.

16:39 This portion of Talk Python To Me is brought to you by Microsoft for Startups Founders Hub.

16:46 Starting a business is hard.

16:48 By some estimates, over 90% of startups will go out of business in just their first year.

16:54 With that in mind, Microsoft for Startups set out to understand what startups need to be successful and to create a digital platform to help them overcome those challenges.

17:02 Microsoft for Startups Founders Hub was born.

17:05 Founders Hub provides all founders at any stage with free resources to solve their startup challenges.

17:12 The platform provides technology benefits, access to expert guidance and skilled resources, mentorship and networking connections, and much more.

17:21 Unlike others in the industry, Microsoft for Startups Founders Hub doesn't require startups to be investor-backed or third-party validated to participate.

17:30 Founders Hub is truly open to all.

17:33 So what do you get if you join them?

17:34 You speed up your development with free access to GitHub and Microsoft Cloud computing resources and the ability to unlock more credits over time.

17:42 To help your startup innovate, Founders Hub is partnering with innovative companies like OpenAI, a global leader in AI research and development, to provide exclusive benefits and discounts.

17:52 Through Microsoft for Startups Founders Hub, becoming a founder is no longer about who you know.

17:57 You'll have access to their mentorship network, giving you a pool of hundreds of mentors across a range of disciplines and areas like idea validation, fundraising, management and coaching, sales and marketing, as well as specific technical stress points.

18:11 You'll be able to book a one-on-one meeting with the mentors, many of whom are former founders themselves.

18:16 Make your idea a reality today with the critical support you'll get from Founders Hub.

18:21 To join the program, just visit talkpython.fm/foundershub, all one word, no links in your show notes.

18:27 Thank you to Microsoft for supporting the show.

18:32 Yeah, I totally agree.

18:33 And our editors are so smart and they can find usages and if you have this function signature isn't really correct and stuff like that.

18:46 And that's so much easier when you have your source code in a folder that is like right next to the one using it.

18:55 So that's a huge benefit, like editor-wise to the developer experience, I guess.

19:00 Yeah, absolutely.

19:02 And I think one of, this is both a benefit and a challenge.

19:06 You know, I'll maybe link to the Monorepo Wikipedia page and it says, here are some of the advantages.

19:10 One of the number one advantage list is ease of code reuse.

19:15 So it's possible, not necessarily suggested, but possible that you say, well, the data access functions and classes that we need on this side, we need some of them over here.

19:25 But if you have the whole Monorepo, you could just say, well, import them in both projects and deploy, you know, a larger piece of code to your server.

19:33 But who cares?

19:34 The servers have a lot of storage and they'll be fine, right?

19:36 The challenge, I think, is going to be the, you're going to end up with a tightly coupled architecture pretty badly if you just say, well, I see way over there, there's that file and that's the one I want.

19:47 And we're just going to grab that.

19:48 And, you know, it doesn't necessarily encourage good behavior, but it does make reusing code and understand how it's being used easy, right?

19:56 Yeah.

19:56 And also you have, you're like in the risk zone of actually building a Monoly again.

20:02 Yeah, exactly.

20:04 You're just part of the, part of the API points run there and part of the API points run there, but they're, they're effectively just one giant thing, right?

20:11 So I guess what I think is that if you are using a Mono, if you want to have your code in a Mono repo, I guess you would need some sort of tooling or ideas about how to separate your code into separate artifacts that don't have the entire code base in its package.

20:33 Only the code that is actually needed for this artifact.

20:38 So, again, I guess that's, that's part of the challenge, having a Mono repo.

20:42 I would say so.

20:43 I've been thinking about this a little bit leading up to our conversation today.

20:48 And certainly you can use packages in the, you know, we have this problem in Python or this challenge where packages mean different things, but it has the same word.

21:01 So a package could be just a grouping of modules into a directory that has a Dunder init, or it could be something on PyPI that you ship and you deploy and you version on its own.

21:11 And I mean, in just the on disk, the Dunder init sort of local grouping, right?

21:18 So you could create these, these sort of groups within your Mono repo and say, we're going to import that, but have a little bit of a formal separation and say, look, we're not necessarily going to deploy it through some versioning story and let other people pull it in.

21:34 Because then we lose track of who's using it and how they're using it.

21:37 Are they on the right version?

21:39 But we'll still maybe think of them as a Python package, in a sense.

21:44 Do you have any experience with doing it one way or the other?

21:47 Any preference?

21:48 Yeah.

21:49 What I was thinking of from the company that I joined that were migrating from their Mono repo, they had done a couple of attempts to do this code sharing thing.

22:02 with the likes, I think it was like Git sub modules or, or similar things.

22:08 But all of that ended up into in, it was became too complicated to, to understand what was going on.

22:17 And, and I think even the editor support where it wasn't really, really perfect when you had like these kind of dynamic linking.

22:26 So I guess that's, that's why they, they chose to, to abandon that idea.

22:31 Yeah.

22:33 Yeah.

22:34 It sounds a little bit like with the sub modules that it was not a pure Mono repo, but kind of a, let's have different sections on, on our repository, but bring it together.

22:46 Yeah.

22:49 Yeah.

22:49 like, we're going to sort of put these files and it's, this is a sub module, that's a sub module and they're kind of separate.

22:56 But then once they're all checked out and linked up, then our tool thinks of it as one giant thing.

23:02 Like the Mono repo would be right.

23:03 Yeah.

23:04 So it's kind of a, an intermediate, I also thought about this as well.

23:07 Like maybe, maybe you could put together the get tools like that.

23:10 I do want to highlight a couple of get tools because maybe I'll, I'll take a quick bit of audience feedback real quick.

23:19 But I do think that, you know, when it's five people, 10 people, you just check the thing out and it's going to be fine.

23:25 But as it gets larger and larger, both over time and lines of code and number of people, it's, it's going to be a thing where it almost becomes unmanageable to just do a, a get clone URL and see what happens.

23:39 Right.

23:40 You can go grab, grab a coffee.

23:41 And when it got back, it's not, it hasn't downloaded.

23:44 We've probably seen that XKCD where there's people like fake sword fighting on a chair.

23:49 It's like, get back to work.

23:49 Like, you know, we're doing a get clone, leave us alone.

23:53 Okay.

23:54 Sure.

23:54 Fine.

23:54 Gotcha.

23:55 It's going to be a while.

23:56 Quick bit of audience feedback says model repos are okay.

23:59 If you have a dedicated team that manages the advanced tooling required to deal with them.

24:04 Yeah, absolutely.

24:04 And I sort of related Lucas asks, they would use Bazel for your projects or rather have make files or similar in case of lints and builds.

24:14 So yeah, there's the different tools that like Facebook and Google and those folks use.

24:20 There's also pants.

24:22 Benji Weinberger has talked a lot about it.

24:24 I've had him on the show.

24:25 I've had him on the show before.

24:26 And pants is one of these tools that can kind of help pants build.

24:31 But David, how about you?

24:32 What were you all using in terms of more advanced tooling or was there anything special?

24:38 Back then it was not really more advanced than actually make files to make things happen.

24:45 But the other place, the team that I joined actually started to use this, I guess we're going to talk about it in the architecture called polyleth.

24:55 And there's also tooling for that.

24:57 Yeah.

24:58 And there's also tooling for that kind of offers a solution to many of these headaches with having a monorepo.

25:06 Yeah, absolutely.

25:08 And back then it was because polyleth originates from closure.

25:13 So we were actually writing closure code.

25:15 And then for Python, I started to look around for a solution.

25:23 I actually read a little bit about pants.

25:26 So I think that can solve a lot of problems too.

25:30 It seems like a really great tool with a lot of useful functionality.

25:37 And then there's also poetry.

25:41 I don't think that it's not really about monorepos, but you can, I believe that you can use pure poetry and have your dependencies,

25:52 like the third party libraries, your own or the one at PyPy in sort of an, not the third party, but your own in an editable mode.

26:02 So they will, as soon as you change something, it will be updated.

26:05 So I guess there are some tools that can help you along the way, but I guess there's still a lot of frustration with having that smooth and really joyful monorepo experience that you would like to have.

26:19 So that's what led me to, to, to start working on this project.

26:24 I do think that Python, the way that its dependencies and its understanding of linking, you know, files together through directories and things like that makes it a little bit more challenging than other systems.

26:37 Like if I was doing C++, I could open up Visual Studio Code and create a broader project and say, these three libraries are what I want to see as my project.

26:46 And it doesn't really matter where they come from.

26:48 You build it and they link together and there's sort of a build the Delta only type of thing.

26:53 Whereas in Python, you kind of need to bring on a little bit more tooling to say, I know it looks like there's some giant Python thing here, but just these two pieces.

27:01 That's what I want to think of as the thing, you know, what we're going to talk about with some of the stuff that you've done with poetry with polyleth and others certainly make that relevant.

27:11 I do want to talk about the Git tools, but it's also interesting.

27:14 This comment from David Poole, it says we use sub modules for legal licensing reasons.

27:20 That is to have GPL code separate from our proprietary code rather than just dropping it in, which obviously has different implications.

27:29 Oh, that was very interesting to learn about.

27:31 Yeah.

27:32 Yeah.

27:32 I hadn't really thought about that either, but yes, you definitely want to think about it.

27:36 So let's just talk Git for a moment.

27:38 Now, one of the big challenges is if we're going to put this all into one giant GitHub repository, which I hinted at, it could get really large.

27:48 Especially if you put binary files, like some of your build tooling or other assets, you might put it in there.

27:54 And then that makes it extra tricky.

27:56 The less something can diff, the more it kind of piles up quick.

27:59 As I was thinking about this thing, I learned about a couple cool ideas.

28:03 Let's talk about this one first.

28:04 Partial clone.

28:05 This is something that was totally new to me.

28:09 So normally it's Git clone the URL to the Git repository.

28:14 However, you can say things like filter, --filter equals blob.

28:21 Have you seen this before, David?

28:22 No, this is totally new to me.

28:24 So, but it looks really interesting.

28:27 Yeah, so what happens here is if you, the blob is like a binary file, right?

28:33 And what you're saying when you say filter blob is it'll check out all of the Git history.

28:39 And normally when you do a clone, you get, at least for the branch you're on, you get every

28:43 version of the file.

28:44 So you get clone, you disconnect from the network and you've got everything, right?

28:48 Which is the beauty of Git.

28:50 But if you've got a really huge repository, it also might be the drawback of Git.

28:55 So you can filter out these blobs in the historical sense.

29:00 And if you say this, what you see in your hard drive for the working directory is identical.

29:06 But the .git folder with the history only has the working version, not all copies of the

29:12 history of the blob.

29:13 This has like a really huge effect.

29:16 So I did this on Talk Python training, my courses website.

29:19 And if I just say git clone, the repo, it pulled down 118,000 objects.

29:26 It resolved 71,000 deltas and it updated 10,000 files and it was a gig on disk.

29:32 If I just say filter, --filter equals blob colon none, it goes from 118,000 to 10,000

29:38 downloads.

29:39 It goes, it's less than half the size.

29:42 And the resulting files on disks, those were the same, but the intermediate deltas were

29:49 like 170th or 150th.

29:52 Really a big difference.

29:54 And this is, you know, it's a pretty old repo.

29:59 It's got a lot of stuff.

30:00 It's nothing compared to what a lot of people have.

30:01 So one, there's one problem where like, okay, if I'm going to try to git clone a mono repo,

30:06 there's just no way.

30:08 Right?

30:09 So adding this aspect here, I think actually would be really valuable.

30:12 Yeah, definitely.

30:13 because it's, I guess in the normal, the use case is that you want to work with the latest

30:20 version of the source code.

30:21 You want to develop something new.

30:23 So I guess that's what you want on disk.

30:27 It makes a lot of time.

30:28 And what happens is if you say, well, actually we need to switch branches or we need to go

30:34 back three months in time.

30:36 It just goes back to the network and clones a little bit more.

30:39 It's like an incremental clone as it needs it.

30:42 So I think actually this, this would help a lot of people who don't know about it, working

30:46 with mono repos that turn out to have a lot of files and a lot of historical, especially

30:51 binaries that grow over time.

30:52 Yeah.

30:52 Because those are the ones that are huge, you know, it's not the text files usually that

30:56 are the problem.

30:56 So I have to bookmark this.

30:58 Yeah, yeah, yeah.

31:00 This portion of Talk Python To Me is brought to you by Brilliant.org.

31:06 You are a curious person who loves to learn about technology.

31:09 I know because you're listening to my show.

31:11 That's why you would also be interested in this episode's sponsor, Brilliant.org.

31:15 Brilliant.org is entertaining, engaging, and effective.

31:19 If you're like me and feel that binging yet another sitcom series is kind of missing out

31:23 on life, then how about spending 30 minutes a day getting better at programming or deepening

31:28 your knowledge and foundations of topics you've always wanted to learn better, like chemistry

31:33 or biology over on Brilliant.

31:34 Brilliant has thousands of lessons from foundational and advanced math to data science, algorithms,

31:41 neural networks, and more with new lessons added monthly.

31:44 When you sign up for a free trial, they ask a couple of questions about what you're interested

31:48 in as well as your background knowledge.

31:50 Then you're presented with a cool learning path to get you started right where you should

31:53 be.

31:54 Personally, I'm going back to some science foundations.

31:57 I love chemistry and physics, but haven't touched them for 20 years.

32:01 So I'm looking forward to playing with PV equals NRT, you know, the ideal gas law, and all the

32:07 other foundations of our world.

32:08 With Brilliant, you'll get hands-on on a whole universe of concepts in math, science, computer

32:14 science, and solve fun problems while growing your critical thinking skills.

32:18 Of course, you could just visit brilliant.org directly.

32:20 Its URL is right there in the name, isn't it?

32:22 But please use our link because you'll get something extra.

32:25 20% off an annual premium subscription.

32:28 So sign up today at talkpython.fm/brilliant and start a seven-day free trial.

32:33 That's talkpython.fm/brilliant.

32:36 The link is in your podcast player show notes.

32:38 Thank you to brilliant.org for supporting the show.

32:43 Related to that, so quirky ads, wouldn't a shallow clone be more predictable?

32:47 So this is also interesting.

32:49 So shallow clones is a older way to do this in GitHub.

32:54 The problem is with shallow clones, you don't get the full history and changelog.

33:01 With these partial clones, you have all of the history, commit history and details.

33:06 You just don't have the files and they're incrementally pulled in.

33:10 So you could do a shallow clone.

33:11 And then there's another one, what was it called?

33:15 A sparse clone.

33:17 So a sparse clone is another tool that you can bring in here for advanced Git usage, where

33:24 you say, I know I've got this huge directory structure, but I just want to get these three

33:29 directories or this subdirectory structure.

33:33 And you clone only part of the files, right?

33:37 So we were talking about how Python understands just like the whole thing as one giant project.

33:41 And maybe even you check it out and try to open it.

33:43 Your editor will just say they're indexing, indexing, and autocomplete won't really work

33:48 and very well and go crazy.

33:50 So you can just say, I want these three directories and I want them partially cloned.

33:55 So they only have like the recent history and they're not so insane.

33:59 And you can kind of combine these to get really focused views into a monorepo, which I thought

34:05 was pretty interesting.

34:06 Yeah.

34:07 So anyway, when I think back to the story you told about how you guys were using submodules,

34:14 I kind of feel like these partial clones plus sparse clones might be a better fit than trying

34:20 to, you know, symlink things together.

34:22 Because it really just is the same thing.

34:25 If you want to clone the whole thing, you do.

34:26 But then you can kind of just, as you clone it, filter out.

34:30 And you can also, with those sparse clones, you can retroactively add in.

34:34 You go, oh, I also need that directory.

34:35 You can say like, get sparse ad.

34:37 Oh, cool.

34:38 This now needs to be this piece to come in as well.

34:40 And there's some interesting ways to put these together.

34:42 So I think these tools are going to be, for people who are working with monorepos, I think

34:46 those advanced Git features that I called out might be really helpful.

34:51 What do you think, David?

34:52 Yeah, I totally agree.

34:53 Especially when you have a monorepo that is a lot of code.

34:57 So it seems like this is, you wouldn't live with, want to live without it, I guess.

35:03 Because it's probably not helpful.

35:05 Yeah, I think so too.

35:06 So sparse checkout, I believe, is actually the first.

35:10 Sparse checkout.

35:10 Okay.

35:11 I think sparse checkout is the term.

35:13 I'll link to it as well.

35:14 It's partial clone and sparse checkout.

35:16 There we go.

35:16 Nice.

35:18 Okay.

35:18 There's so many good features in Git that I guess most of us don't use.

35:22 Yeah.

35:23 I think so too.

35:24 Like I've been doing Git for a really long time and this sparse checkout is completely

35:29 new to me.

35:30 I only learned about it because I was trying to research a little bit more of some like,

35:34 well, how do you do actually manage with these monorepos as we are preparing for our chat

35:38 today?

35:38 So yeah, I think there's a lot of tools and flexibility that are not obvious or not apparent

35:45 that people can use to make monorepos work really, really well.

35:49 There's still a lot of interesting ways to structure your code and put it together and

35:53 use it once you get it checked out.

35:55 So maybe let's, what do you want to start?

35:59 You want to start with a fresh take on monorepos?

36:00 This is one of your articles.

36:02 Yeah.

36:02 Why not?

36:03 Yeah.

36:03 So tell us the story here.

36:05 I wrote this article about a year ago, almost a year ago.

36:09 Before that, I was trying to figure out how to work and have this nice developer experience

36:15 in a monorepos.

36:17 And coming from closure and having learned new things and have some new fresh ideas on how

36:24 you can solve things, I wanted to give it a try in Python too.

36:31 And also at the same time, I was actually doing work with microservices, but in several

36:38 repos.

36:39 And I kind of found myself, there was not a huge thing, but it was like a logger, sort

36:45 of a logger module or a package.

36:48 I knew that I had done it in the other microservice just a couple of weeks ago.

36:54 So, okay, what should I do?

36:56 Should I create a library?

36:57 No, this is way too small to create a library.

37:00 And it wasn't even open source.

37:03 It's like a proprietary system.

37:06 So when you would need a lot of private repo servers, things like that.

37:12 So I just ended up in copying some code.

37:14 And, you know, while people would go, of course, you should never do that.

37:17 But sometimes it's just not complicated enough or important enough or big enough to justify

37:23 all the change management and dependencies.

37:26 And you know what?

37:27 That file, it goes into this project.

37:29 And usually it's fine until they get out of sync or there's some weird, you want to upgrade

37:35 one and then, oh, where else is it, right?

37:37 Or you discover a bug in that part and you forget about that you have copied it a couple of times

37:44 and now with the other repo.

37:45 So then you have a lot of work to do.

37:47 There's probably a whole section of cybersecurity history and like breaches where they thought they fixed a problem in some system.

37:56 Yeah.

37:56 And it turns out someone else found a copy of it that wasn't fixed and broken.

38:00 And yes, this is not ideal.

38:02 I really wanted to give the polyleth in Python a try because I really enjoyed the way things are structured

38:10 and a lot of these like headaches are solved there.

38:15 Polyleth is really, really new to me.

38:17 Maybe tell people about polyleth before we go on because I suspect a lot of people don't know about this.

38:21 Yeah, maybe we should begin.

38:22 Yeah, let's begin there.

38:23 Then we'll come back to it.

38:24 Well, polyleth, it's an architecture, but it's also a tool or something with a tooling support.

38:31 And it's open source and it's developed by a fellow Swede, Joachim Tengstrand.

38:37 And I was fortunate to actually work in the same team as him.

38:43 So he was done introducing this.

38:46 We decided to give it a try.

38:49 He was new in our team.

38:51 And I have to confess, I was a little bit skeptical at the beginning because skeptical of monorepos in general too, because based on previous bad experiences.

39:03 So the polyleth is the main idea is that you have, when you write code, you're supposed to, you aim to write them in small parts.

39:12 And that's what polyleth called components.

39:15 And a component, polyleth uses the idea of Lego, but for code.

39:21 So a component could be a piece, a Lego brick that you can reuse in several ways.

39:28 And a component can be everything from a small tech, something that you normally would put in a utils folder, like functions that do maybe some parsing or something.

39:41 But it can also be a combination of other components.

39:44 They don't have to be of the same size.

39:46 It's the idea of composability and reusability that is the important thing.

39:52 So the big parts of polyleth are our components.

39:57 And then we have something called bases.

40:00 And that is also a component, but a kind of special kind of component.

40:06 If you think about Lego, if you're going to build like a house, you often have a base plate where you put your Lego bricks on it.

40:15 So a base is sort of that part.

40:17 And in code, that could be like, if you have like a FastAPI app, maybe a base plate could be where you define the endpoints.

40:28 Like you use the API decorator side or something like that.

40:33 So that could be a base.

40:35 And then the code that actually does something could be a combination of different components.

40:42 Yeah, that's an interesting way to think of it.

40:44 Whenever you talk about stuff, what are the things that's difficult as to understand?

40:47 What is the scale or how are these different?

40:50 So one way, well, our functions components, our modules components, our packages components, like, how do I identify that since it's not a formal language runtime term?

41:04 Yeah. That's really, really good. Help me understand how I make these things in Python. Yeah. Yeah. That's really interesting. Because you can see a component. It's not a full blown feature, like maybe a library that you publish in on PyPy would be. It's smaller than that. And I guess it could be a single function. But it's probably one or more functions that kind of relates to it. Let's say that you...

41:32 And what should I have prepared with an example, but let's say that you want to parse a CSV file or something, then you would probably separate the different things you want to do with that CSV file into functions already.

41:52 And the component is where you kind of group the functions that kind of relate to each other or that make sense to have in a Python package.

42:02 Then I mean a namespace with a dunder in it.

42:05 So that could be a component.

42:07 Yeah.

42:07 Although it could be modeled in one of these sub packages.

42:11 Those sub packages often have multiple jobs and roles and you're like, let's stay really focused on the one thing that it does, right?

42:18 Yeah.

42:19 And all of this lives in what Polylyph calls a workspace.

42:23 And that is basically a repository with a configuration about how your repository looks like.

42:35 So you have your components in namespace packages, basically.

42:39 And you have your basis, the entry points of your apps.

42:43 Then you have something called projects or a project.

42:47 And that is the artifacts or artifacts that you want to build.

42:52 So you can have only one project if you're going to build one thing, maybe a FastAPI service.

42:59 But the benefit comes when you are about to build something new.

43:03 Then you have your project infrastructure, like the project configuration and what it is defined in a folder called projects.

43:14 And then with the code you use, you pick the code from the components and bases folder.

43:19 So you will reuse the same source code.

43:23 And then you package it into different artifacts.

43:26 So it sounds a little bit like we've got this monorepo with all of this stuff.

43:30 And the polylyph, its job is to say, well, we're going to look into these little parts of this monolith.

43:38 And I need this part and this part and this part.

43:40 And it's some tooling and some concepts to help you manage some artifact.

43:47 We don't usually have exact build artifacts often if you're not shipping out separate packages.

43:54 But maybe these three pieces here make up the FastAPI service that we're going to host over there.

44:00 And maybe these two services make up the data science tools we're going to give to the data scientists for their notebooks.

44:06 And there could be some overlap in those, right?

44:09 Yeah, exactly.

44:10 Yeah.

44:10 Okay.

44:11 And another good thing is with the workspace is that you don't really do much work in the project folder or something like that.

44:20 Because the main idea is that you have a development environment that includes all your bases, all your components.

44:29 So the good thing with that is that you can experiment and try out code without worrying about if you have imported the correct stuff.

44:39 So you just, you have a top project folder containing all of your dependencies and packages.

44:45 And then you can take it from there.

44:48 Once you're ready to build a project, build something out of it, an app or whatever it is, then you can start constructing that project-specific configuration.

44:58 You can choose where you want to start.

45:00 But I usually start from the development workspace.

45:04 And I really like a way of working called REPL-driven development.

45:08 But I also learned from Closure, which is they try out things in the REPL, basically.

45:12 Yeah.

45:13 That's a really nice developer experience that you get from having the entire source code.

45:19 You can try out things, combine components, and develop new features.

45:23 Yeah.

45:23 I'm doing more and more of that as well, this REPL-driven development.

45:27 Or I'd say not necessarily development, but exploration.

45:30 I kind of want to understand.

45:32 I'm not really sure.

45:33 Is this going to click together right?

45:35 Or is this, rather than putting a lot of structure in place, because I'm not even sure I really want to stick with it, fire up a REPL.

45:42 For those of you who don't know, if you just type Python, what you get, redevelop print loop, that's the REPL.

45:47 I do it in PyCharm these days, because PyCharm has a Python console, but it gives you autocomplete and tab completion.

45:55 Like, of the things that are in your project when you're playing in the REPL, but, you know, still, same idea.

46:00 Yeah, that's really great.

46:01 Yeah, yeah, absolutely.

46:02 So are you guys using this on your projects right now?

46:06 Or what do you do with it these days?

46:08 Yeah, I'm fairly new to the team that I join.

46:11 So I've introduced them to the ideas, but they have, like, already code and stuff in place.

46:17 So my hopes are that we will give it, once we have something new to develop or include an existing microservice, maybe we could give this idea a try.

46:28 So that's basically because I'm coming.

46:31 Yeah, that's always the problem is, even if you yourself are not new, the ideas may be new to you, and you've done a bunch of previous work.

46:37 Like, for me, I was showing you that repo before.

46:40 I'm just, I'm thinking, there's a lot of cool stuff I could do about how I restructure this and reuse it and make it available, you know, sort of bring more of the mono repo stuff to some of the things I'm doing.

46:50 I'm like, then I got to update the continuous deployment changes, and I've got to update where the web server, I'm just like, you know, it's just like, there's all this stuff that's there.

46:59 And it's, you know, do you kind of pause what you're doing to try some new big organization of code here?

47:06 That's how it goes, right?

47:07 We still learned the polylet.

47:09 That's where we actually used it in production.

47:11 We had several kinds of different services and apps where we had everything in a polylet, not a repo.

47:18 But that was Clojure.

47:19 And Clojure is a compiled language, like C++ or C#.

47:22 Is that right?

47:23 Yeah, it's on top of the JVM.

47:26 So that is compiled through that engine.

47:31 Yeah, I do feel like things are just a little, the deliverable artifacts are slightly more obvious and easy to distinguish when you're talking about something that compiles.

47:43 And like, here's the library that drops into the bin folder, and here's the executable binary that drops.

47:49 You know, there's an output folder that has all the pieces that were selected.

47:53 Whereas Python, you've got to be a little more careful how you put that together.

47:57 So what I came up to was how can this idea be used in Python?

48:03 And then that was actually what led me to poetry, which I think is a really nice tool.

48:09 Well, because poetry, I think it's a lot of nice ways of handling projects and dependencies and structure and stuff like that.

48:18 There were a couple of things missing to make this idea work.

48:23 Because when you have a project configuration, you actually include components from a relative path.

48:30 So you navigate up and navigate down to the actual component.

48:34 And if you would just build a wheel or a source distribution from that, that wouldn't be a valid package.

48:42 Because then you would need to ship the entire monorepo structure.

48:47 And you don't want to do that.

48:49 So what I did was I developed a plugin to poetry that actually allows for having relative includes.

48:57 And that will build the code.

49:00 That will build a wheel and a source distribution with the kind of correct path.

49:05 So it takes all the package dependencies and puts them in the same folder, basically, before it does the wheel.

49:13 And then you have a valid distribution that you can use.

49:16 So it does a little bit of copying and stuff like that.

49:21 All right.

49:21 So the actual output here, it's a couple of wheels that we could say pip install into a virtual environment.

49:29 And they work together.

49:30 Is that right?

49:30 Yeah.

49:31 Okay.

49:31 Try this idea with like services, like FastAPI services.

49:37 Instead of including the source code like as a tree, installing it with pip from a wheel or a source distribution.

49:47 Preferably be from a wheel if you don't have any like operating system specific stuff.

49:53 So I think that works really.

49:55 And it's the end result.

49:57 If you do it in a Docker container, you can like have the full control of what's in there.

50:03 Yeah.

50:03 So in your Docker file that builds the Docker image, you can just say, you know, copy these three wheels over, pip install them into my Python environment I have over there.

50:14 And it's just taken the, what do you call them, workspaces that you need over there.

50:20 What is the terminology that you call the artifacts here?

50:23 I mean, I know they're wheels and packages, but is there a polyleth term that matches over here?

50:28 Oh, I don't think so.

50:30 Maybe it's like a built artifact perhaps.

50:34 Yeah.

50:35 And it's probably the most simplistic scenario is that you have like an app, like an API endpoint or maybe a CLI app or even a library.

50:46 And you probably want to install them in different places, maybe even on AWS Lambda.

50:53 So you can have the control over the deployment in your CI saying that I want to deploy this Lambda here and I want to deploy this FastAPI over there.

51:04 So, and with polyleth, you can build these wheels differently.

51:08 Nice.

51:08 Yeah.

51:09 I feel like the, if you think of microservices and monolith, the AWS Lambda or any serverless functions as a service story is like the most extreme version of this.

51:21 Yeah.

51:21 Here's a single function that gets deployed.

51:23 Here's a single function that gets deployed, like just one after another, right?

51:27 It's kind of out of control.

51:29 And you have all of them in separate repositories.

51:32 Oh, please.

51:33 No.

51:33 Yes.

51:34 That would be definitely tricky.

51:35 I want to come back and talk more about this poetry plugin because it's really cool.

51:39 But let's address this question from Lucas here.

51:41 How would you approach versioning in a monorepo, like of these different services and of the different pieces?

51:50 So if I'm going to have that FastAPI thing that builds over there, I'm going to have some other projects that are built

51:56 with some overlap that are shared over to, say, my data science team.

52:00 They're going to analyze data in some other way, but reuse some of the code.

52:04 I've got some thoughts, but what are your thoughts on versioning, say, in the repository or how you deploy them?

52:09 Yeah, that's a really good question.

52:11 If I'm looking at it from a polyleth perspective, I would suggest a very simplistic solution.

52:18 Let's say that you have your this project depends on something with this version and the other project is still on an earlier version.

52:27 And I think that can be solved with the components itself because all source code is made up of all these components.

52:36 So if you're going to build a new version of something and if that version uses a new third-party dependency or that is incompatible,

52:45 I would suggest you to add it as a new separate component.

52:50 So your new projects that will use that one will pick that component instead of the old.

52:57 I think it's a good practice if these components have the same or as long as it's possible, the same API.

53:05 So it should be easy to switch from the old to the new one.

53:10 So that would be my solution to versioning, at least when using polyleth.

53:16 If we look at not the monorepo style, but you build an artifact like a wheel from one repo

53:22 and you put it up there and someone else depends upon that, maybe through an internal artifact management, private PYPI,

53:30 you would pin your version in the requirements file for that other one, right?

53:34 Because that repo is changing at a different rate and a different cadence than maybe the library that it depends upon.

53:41 And that's really natural for us as Python people because we already have a great long list of things that are open source

53:49 that we don't build that we depend on, right? FastAPI, Pydantic, and Starlette would be an example from what we've been talking, right?

53:56 Those things you don't control and you depend on them.

53:59 So you pin the versions and upgrade them as you see fit.

54:01 But one of the advantages of the monorepo to me, as far as I see it at least, is the whole system,

54:08 not just your part of the system, but the whole system is consistent all the time.

54:15 The main branch or the production branch or whatever the shipping branch is.

54:20 So I think you would maybe branch, do some of your work, merge that back in.

54:25 And at that point, you could ship everything if you need to, right?

54:29 Yeah.

54:29 Because you're continuously keeping it together as a whole system, not like, well, that library built and that library built

54:36 because they're separate repos, but you put them together and who knows what's going to happen.

54:40 I think this is an advantage of the monorepo in a sense.

54:43 Yeah, me too.

54:44 Definitely.

54:44 I think it makes it easier to notice about using some parts of the code.

54:51 It's using a certain version of a dependency.

54:54 You will learn about it quite quickly because if you would install it and try to run the code.

55:00 it's easier to note, and hopefully it's easier to also update that code.

55:04 But if you are in a situation where there's so much code that you need to refactor,

55:09 maybe there's a breaking change that kind of has rethought the entire idea of that happens.

55:16 Maybe you need to do some sort of separation and keep the oldest until you have the time to refactor that.

55:23 I guess in most cases, it's pretty straightforward to update just everything in the monorepo.

55:28 Right. Yeah, exactly.

55:29 And if you've got some sort of continuous integration or some kind of automated check,

55:34 you're going to find out pretty quickly this change you made has a consequence over there.

55:39 And I think that's why people who are psyched about monorepos are excited about it.

55:45 It also feels to me like if there was going to be a breaking change, it's going to happen either way.

55:51 It's just, is it going to happen in small little pieces or is it going to happen in one terrible, huge,

55:56 oh, you got the new one?

55:58 Well, let me tell you, the new one's really different.

56:00 It doesn't work anymore.

56:01 Like, oh no.

56:02 You know, just like you want to merge more often or you want to try to integrate things more often

56:08 and not just wait some long period of time and go, now do they go together?

56:12 Why are there a hundred or a thousand, you know, merge conflicts?

56:16 I don't know, right?

56:16 Like the more you do these little continuous checkbacks and integrations, it's just going to be so much easier.

56:25 Oh yeah.

56:25 Totally agree.

56:26 Yeah.

56:27 I mean, the only scenario where you don't have to go back and pay that penalty is where the other service

56:34 that you're versioning against says, we're never going to go to the new version.

56:39 And if this is internal code, it's unlikely that it's never going to go to the new version

56:43 unless it becomes just dead.

56:45 And then who cares?

56:45 You're going to need to integrate them eventually.

56:47 Just keep doing it continuously.

56:50 So yeah, I'm starting to really come around to the idea of these things.

56:52 Yeah.

56:53 Yeah.

56:53 So this polyleth plugin for poetry, it's super cool.

56:58 So for example, here on your example, you say poetry, space poly, space info on the terminal,

57:04 on the command prompt.

57:05 And it'll say, hey, look, in here we have two projects made of two components and two bases,

57:10 the Lambda project and the FastAPI project.

57:13 And they're made up of these different elements.

57:17 And it really shows what part of your code is depending on the other ones.

57:22 It even gives you a pretty table.

57:24 Is this made with rich?

57:25 Yeah.

57:26 Yeah, it's rich.

57:27 I love that tool.

57:28 It's so great.

57:29 Yeah.

57:30 It's been my new favorite tool.

57:32 It's so good.

57:34 Yeah.

57:35 So it's a really nice looking UI that you put together here as well.

57:38 Yeah.

57:38 Thanks.

57:38 I'm really happy to hear that.

57:41 Well, how easy for people to...

57:42 Yeah, go ahead.

57:42 Sorry.

57:43 Don't mean to just talk over you.

57:44 Yeah, I just want to mention that what we see here is the tooling support for the polyleth thing in Python.

57:51 And I decided to make it as a poetry plugin.

57:53 I have some plans to break it out of poetry to make it separate.

57:59 CNI 2, maybe I'll do that in the future.

58:01 But I thought it was a good fit for poetry since I'm relying on a lot of poetry features.

58:07 And the command there, polyinfo, is showing you an overview of your monoree.

58:13 And this is my example project with...

58:16 So it's not a lot of code, but the idea is that you will list all your components and bases.

58:23 The command there for these are bricks.

58:25 Then you get a sort of an overview of what's in that monoree.

58:30 You can sort of get an idea of what's in there.

58:33 What does this thing do?

58:35 And then it's listed per project.

58:38 So you can see which project is actually using which brick.

58:42 Yeah, let me just give a little bit of visual information for people listening.

58:47 So under the brick column, it says we have the logging, we have the messaging, we have the greet API, and we have the messages for Lambda.

58:54 And then you've got, in different columns, the different projects that might be consuming them in little check marks or dashes to say using or not using it.

59:02 It makes it really visually clear how your elements fit together, right?

59:07 I'm continuing adding commands to this tooling.

59:11 There is also, you can use this information, the information about the workspace and the individual project to, in your CI to determine if, let's say that you change the message component, you do something in it.

59:27 And then you would want to have the projects that are affected built for that.

59:34 So the tooling will help your CI to make decisions, should build this project or if it should skip building because nothing has changed.

59:44 So that's part of the tooling to work.

59:46 That's pretty interesting because we've had attempts and they've always been like an awesome 80% solution that never quite works.

59:53 But really good solution, really good tool, ideas, I guess, to say, if you change this code, what actually needs to be tested again or what needs to be analyzed again?

01:00:04 If this other part of your system doesn't depend on it, you don't need to run those tests, right?

01:00:10 If just the file changes, that doesn't tell you anything.

01:00:13 You need to look at things like code coverage.

01:00:16 What part of the system was touched if this part, you know, by affected by this at all, right?

01:00:22 Those are always really tricky.

01:00:24 And how do you keep like a history of code coverage to know what to do?

01:00:27 And all those attempts I've seen just kind of like, we tried, but we don't really do that.

01:00:32 We'll just run the files that change, which is never enough.

01:00:35 But this kind of is a natural way to express dependencies in that tree to say, okay, if we change the greet API, we see that the Lambda thing doesn't work with it.

01:00:48 So we don't need to test anything to do with the Lambda stuff.

01:00:51 We only need to test the FastAPI aspect, right?

01:00:54 Yeah, exactly.

01:00:54 Is that tooling in place now?

01:00:55 Yeah, that's in place now.

01:00:57 Okay.

01:00:57 The poly info command, I think it was like the first command that I actually added.

01:01:03 Then there's a...

01:01:04 diff command and what did I add?

01:01:07 Yeah, the latest addition to commands is a check command.

01:01:12 Because since you are...

01:01:15 I was talking about development experience that you're working at in a development project where you have everything.

01:01:22 And then you might not touch the project that much.

01:01:24 And that means that you could potentially forget to include dependencies.

01:01:30 Because as of today, there's no automatic thing yet.

01:01:34 I'm planning to add that later.

01:01:36 But so far, you need to keep track of your dependencies and stuff like that.

01:01:41 So I added a check command that actually does performance analysis on the source code.

01:01:48 So if you...

01:01:49 Let's say that one of the components uses the requests library or something like that.

01:01:54 And you don't have it in your dependencies.

01:01:56 Then you would be notified for that particular project.

01:02:00 It's very likely that you would discover it anyway in your development environment.

01:02:05 But this is an extra check to just make sure once you're about to build something,

01:02:11 can I really build this specific project?

01:02:14 So it's a few commands, but it will be more commands that are more and more useful, I guess.

01:02:20 Yeah, it looks great.

01:02:21 And you have some examples over in the Python polyleth examples repository that people can check out.

01:02:29 Yeah.

01:02:29 So is the poetry plugin ready for people if they wanted to use it?

01:02:34 I think so.

01:02:35 I know I haven't seen any stats yet, but I have a couple of users that have contacted me through the GitHub repo and social media.

01:02:46 Maybe some of them are just experimented with it and others have.

01:02:50 I think they actually are working with it in their daily work.

01:02:55 And I think it's useful.

01:02:57 I have to remind myself to contact them regularly to just check out how it goes.

01:03:06 And hopefully they will come to this repo and let me know if something doesn't work as intended.

01:03:14 But it's a new tool and I probably need some more work on it, of course.

01:03:21 Yeah.

01:03:22 Well, a lot of people will hear about it now.

01:03:24 They can come check it out and play with it.

01:03:27 Oh, that would be great.

01:03:27 Yeah.

01:03:28 And I'm sure you're taking contributions and PRs and you wouldn't mind if people had some additions.

01:03:34 Oh, yeah, definitely.

01:03:34 Yeah.

01:03:35 I would love to have that.

01:03:37 Cool.

01:03:38 So contributions are very welcome.

01:03:40 Yeah.

01:03:41 And you also have really nice examples here.

01:03:43 Like you have two videos that show how it works.

01:03:47 You know, 15 minute YouTube videos.

01:03:48 You've got some pictures and, you know, well done on that.

01:03:51 It makes it really easy for people to come and just see like, okay, is this interesting?

01:03:54 Does it apply to me?

01:03:55 So good work.

01:03:58 I do want to make one really quick follow-up.

01:04:01 Corky out there had mentioned that maybe shallow clones were a better, more predictable choice than the partial clones with the filter equals blob.

01:04:11 In general, the people at GitHub are recommending not to use the shallow clones anymore, but to use instead these partial clones because it keeps the history and it can incrementally go back and pull the stuff in as needed.

01:04:25 However, there is one time where you may really want those shallow clones.

01:04:29 And the reason I'm thinking of this is you talked about builds and using this to make builds run faster and only focusing on the parts that have changed.

01:04:37 If you have a CI, the CI doesn't care about the history of your GitHub project.

01:04:42 It just wants the working files, right?

01:04:45 So you can do a shallow clone and say, just get me only the files on the tip of this branch and then build it.

01:04:52 And that could be dramatically faster than saying, give me all five years of history of every single file and reassociate that.

01:05:00 So if you're thinking about CI, this shallow clone idea that I was dismissing a little bit is exactly a good choice, I think, because you don't care about version history if you're trying to see if the current version builds or not, right?

01:05:14 So anyway, just a quick follow-up on that.

01:05:16 All right, David, I think we're probably out of time.

01:05:19 I definitely encourage people to go check out your poetry plugin.

01:05:22 They can check out polylith at polylith.getbook.io.

01:05:28 Of course, I'll link to it in the show notes.

01:05:30 Now, before you get out of here, I've got the two final questions to ask you, of course.

01:05:35 If you're going to write some code, if you're going to work on the poetry polylith plugin, whatever, what editor are you using these days?

01:05:44 Well, these days I use Emacs.

01:05:45 I really like the two final things.

01:05:48 Before Emacs, I really liked PyCharm too.

01:05:53 But then I decided to learn Emacs and I'm stuck.

01:05:57 Okay.

01:05:59 Every programming language, I'm going to code everything in Emacs.

01:06:03 Nice.

01:06:03 Yeah.

01:06:04 Long, long ago, that was my very first editor for programming.

01:06:08 Oh, cool.

01:06:09 And then, yeah.

01:06:10 It brings me back to working on Silicon Graphics mainframes, computers doing C++.

01:06:15 So, notable PyPI package.

01:06:17 Something we simply talked about.

01:06:20 Something else that you want to just tell people about that you thought was awesome.

01:06:22 Ran across recently.

01:06:24 I have to say Rich because it's such an awesome tool.

01:06:27 And when you're going to develop a CLI and want it to look nice, and yeah, Rich is a really, really good tool.

01:06:37 There were a lot of visualization features and stuff like that.

01:06:41 So, that's a fantastic tool.

01:06:42 Yeah.

01:06:42 Good recommendation.

01:06:43 There's so much momentum behind Rich these days.

01:06:46 And if you're making some CLI developer-oriented tool, just give it a little color.

01:06:53 Give it a little structure.

01:06:54 And something like Rich, even just a little bit of color or a little bit of distinguishing one line of text from the other makes such a big difference in being able to use it really quickly and easily.

01:07:06 And Rich is probably the best way to do that by far, right?

01:07:09 Yeah.

01:07:09 You should check out my new latest command, PolyCheck, because it uses a rich feature that I'm really happy about.

01:07:16 It's silly, but I'm really happy.

01:07:17 It uses an emoji while you're waiting.

01:07:20 Fantastic.

01:07:21 Oh, yeah.

01:07:22 I love emojis.

01:07:23 I love emojis and CLIs as well.

01:07:26 All right.

01:07:27 Final call to action.

01:07:28 People want to get started with monorepos, with polyleth, with some of these ideas we've talked about.

01:07:34 What do you tell them?

01:07:34 Head over to the polyleth git repo or the official polyleth docs and read about it and see.

01:07:42 Also, if you're interested in monorepos in general, check out the other solutions that are out there, because there are a lot of different approaches with different kind of focuses that maybe fit your situation best.

01:07:55 I'm, of course, pro polyleth because I, yeah, develop a tool and really like that.

01:08:02 But there's probably tools that are better for a difference to situation.

01:08:06 So just explore, I would say.

01:08:09 Right, right.

01:08:10 The tools that maybe Google chooses to manage its code base might be the wrong tools that you choose for yours because the scale is so different.

01:08:18 Right.

01:08:19 Yeah.

01:08:19 You might add so much complexity that it's not relevant.

01:08:22 You know, it makes it really hard, but you don't need that complexity because you've got five projects, not 5,000 projects.

01:08:27 Right.

01:08:28 So, yeah, absolutely.

01:08:29 Look around is good advice.

01:08:31 Okay, David.

01:08:32 Thank you for being here.

01:08:33 It's been a really fun chat.

01:08:34 I learned a bunch.

01:08:35 Yeah.

01:08:35 Thank you.

01:08:36 So we really have fun to be on the show.

01:08:39 Thank you.

01:08:40 Yeah.

01:08:40 You bet.

01:08:40 Bye-bye.

01:08:41 Bye.

01:08:41 This has been another episode of Talk Python To Me.

01:08:45 Thank you to our sponsors.

01:08:47 Be sure to check out what they're offering.

01:08:48 It really helps support the show.

01:08:50 Starting a business is hard.

01:08:52 Microsoft for Startups Founders Hub provides all founders at any stage with free resources and connections to solve startup challenges.

01:09:01 Apply for free today at talkpython.fm/founders hub.

01:09:06 Stay on top of technology and raise your value to employers or just learn something fun in STEM at brilliant.org.

01:09:13 Visit talkpython.fm/brilliant to get 20% off an annual premium subscription.

01:09:20 Want to level up your Python?

01:09:22 We have one of the largest catalogs of Python video courses over at Talk Python.

01:09:26 Our content ranges from true beginners to deeply advanced topics like memory and async.

01:09:31 And best of all, there's not a subscription in sight.

01:09:34 Check it out for yourself at training.talkpython.fm.

01:09:37 Be sure to subscribe to the show, open your favorite podcast app, and search for Python.

01:09:41 We should be right at the top.

01:09:42 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.

01:09:52 We're live streaming most of our recordings these days.

01:09:55 If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm/youtube.

01:10:04 This is your host, Michael Kennedy.

01:10:05 Thanks so much for listening.

01:10:06 I really appreciate it.

01:10:07 Now get out there and write some Python code.

01:10:09 Thank you.

01:10:09 Bye.

01:10:10 Bye.

01:10:11 Bye.

01:10:12 Bye.

01:10:13 Bye.

01:10:14 Bye.

01:10:15 Bye.