Monitor performance issues & errors in your code

#399: Monorepos in Python Transcript

Recorded on Friday, Jan 13, 2023.

00:00 Mono repos are contrary to how many of us have been taught to use source control. To start a project or app, the first thing we do is create a git repository for it. This leads to many focused and small repositories. A quick check on my GitHub account shows that I have 179 non fork repositories. That's a lot. And I think many of us work that way. It's not like this with mono repos. With mono repos, you create one or a couple of repositories for your entire company. This might have hundreds or 1000s of employees working on multiple projects within a single repository. Famously, Google Meta, Microsoft, and Airbnb, amongst others. All employ very large mono repos with varying strategies or coordination. On this episode, we have David Vujic. Here to give us his perspective on mono repos, as well as highlight an architectural pattern and set of tools for accomplishing this in Python. This is talk Python to me Episode 399, recorded January 13, 2023.

01:14 Welcome to Talk Python to Me a weekly podcast on Python. This is your host, Michael Kennedy, follow me on Mastodon where I'm @mkennedy and follow the podcast using at talk Python, both on Bostondon.org. Be careful within preceding accounts on other instances, there are many keep up with the show and listen to over seven years of past episodes at talkpython.fm We've started streaming most of our episodes live on YouTube, subscribe to our YouTube channel over at talk python.fm/youtube to get notified about upcoming shows and be part of that episode. This episode is brought to you by Microsoft for startups founders hub, get early stage support for your startup without the requirement to be VC backed or verified at talkpython.fm/foundershub is also brought to you by brilliant.org. Stay on top of technology and raise your value to employers or just learn something fun in stem@brilliant.org. Visit talk python.fm/brilliant to get 20% off an annual premium subscription. David, welcome to talk Python to me, thank you, I really enjoyed to be on the podcast. I'm really excited to have you on the podcast. And we get to talk about a couple of interesting ideas, we get to talk about software architecture, people may know I'm a big fan of architecture, I think putting your software together bright makes all the difference. When I talk about some ideas that are new to me this polylith idea that you're an advocate and fan of and how it applies to Python. We're also going to focus a good portion of this conversation on mono repos. And what the heck are mono repos? Right? Yeah, that'll be a lot of fun, and I'm really looking forward to it. But let's hear your story. First, how did you get into programming and Python? Yeah, how did I get into programming? Well, so I guess it was when I was a kid, my dad bought me a Commodore 64 It was like way back in the 1980s. So that's when I started learning basic, the basic programming language and how to, like think in how to do the right thing, things that are code and not like your text, and, but then that kind of,

03:27 I want to know, different path when I was working with, mostly with design and things like that. And the first thing, what I knew wasn't what was the job in our business was the web designers. That was what I wanted to be at first. So I started to learn JavaScript and copy and pasted some snippets of code and tell them interact and stuff like that. Yeah. And that's the era I'm I'm just guessing from the the starting computer where you were sort of in time when this must have happened. And that was probably before all these crazy JavaScript front end frameworks and all that. And it was it was more of how do you visually design this page with graphic and more art and more focused on that, right. Yeah, before like jQuery and before and like all these all react and how all of that good stuff they will have today.

04:18 Yeah, with Python, I think I started about the 2015 with Python by the started with python 2.7 and then learning all I think in that period, I also learned started to think in more like functional programming style more like functional ish. That's coming from Python, actually. And I was very much into no Jas by the at that time too. So I think around 2015, and I've been like jumping back and forth between different different languages. I'm a really huge fan of closure two, which is like 100% functional. So I'm been like, visiting some different

05:00 kinds of programming languages and styles, and things like that. And now I'm back to full time, Python on my day job and have almost full time, Python during nights because I really love to code on my spare time to when, as soon as there is any chance to to code. So I'll take that one. Yeah, absolutely. I mean, how much of open source is, there's programmers and we have this project we got to work on, but we really want to build this other thing. And we're super passionate about it, we just end up building it and sharing it and it takes off, you know, and I think that's a very common story for sure. So before we move on, How has working in Python influenced something like closure or closure influence your Python thinking? They're, they're fairly different languages, right? Yeah. Functional being less stateful, which is a really big different way of programming. Yeah. Well, I think what I learned from the Python was like the elegance and the importance of writing elegant and simplistic code, I was really impressed by the Zen of Python, you know, if you type import this in our shell, you get this nice list of how to write your code. I really liked that idea. So I guess that's where I started to think about keeping it simple, clean and short. Yeah, and with closure, it was like, that's kind of a different total different syntax. But also digging into a lot of functional aspects and how to think about state how to separate the like calculations from actions and data and things like that, then I think I brought a lot of those ideas to back to when I'm back at Python, how to separate different kinds of code that you write, I can see that I feel like Python is really interesting, because you can choose to only focus on little parts of it. That's good for beginners, because you only have to learn a little part. But it's also good for people who have particular styles that they like to work, right, if you want to write functional Python, you don't have to create any global variables or any classes or anything, you can just just, you can write it that way. But you could completely go really deep Oh, patterns, and can do that in python two, if you want, right, it's completely up to you. Yeah. And I like that kind of freedom. It's,

07:27 you can't you don't, you're not

07:30 forced to do either this or that you you can learn an experience experiment. And especially when you if you use libraries, they are like designed in different ways, too. So you can you don't have to limit yourself to only use that kind of library or this kind of library. So I really liked Python, the capabilities. So Python, when it comes to that, though, it's not very strict in any kind of format. I like that too. And I think other languages are seeing that and adopting that as well. You know, you see swift with their playgrounds, and well, swift in general, and dotnet, with their, you know, maybe you don't need namespaces and classes and static main void for everything to get started. And they're adopting those types of things. You know, we're talking about this i Well, functional people might want to write this way and Morrow or any people that way. But that also could just be you in different situations, you know, right now, this is the right tool to solve it. And other times, here's a different way to solve a different problem. But you can just stay in the same tools and the same editors in the same ecosystem. School. Yeah, definitely. Yeah. All right. Well, let's start in on the first half of our main topic here is the mono repo. Now, yeah, it's really easy to confuse what a mono repo is with a mono Lith. versus say microservices. Yeah, those are not really at all the same thing. In fact, they might actually be opposites, in a sense, a mono repo in a monolith to some degree. So maybe, kick us off by telling us what is a mono repo here, a mono repo, it's I don't think it's that complicated. But I actually also before I started to dig into this thing more, I also had a bonus put like an equal sign between mono liths and mono repo, because that's the way I have used the writing code. I was in the dotnet, and C# world a lot. And you like building your website, and you have a data layer, and you have all domain layer, and everything was in a repo. So I guess, Microsoft's but it wasn't like a reaction to that to separate the code into isolated environments. And you can have this nice and clean little code base and have her that does one thing, and you have this other code code base that does the different things. So if instead of just having the user authentication bit completely just woven into the code, we can make a little API that we call over JSON that that does the authentication and then here's the one has the catalog.

10:00 Yeah, and we could write a little bit of code. And, you know, the, I guess the benefit, right is that whoever's working on the catalog, but they theoretically can just stay focused on that little bit of code and not the entire system. Right? Yeah. But with a micro service story, yeah. A mono repo is, I think it's from the way I see this, like, It's our gift gifts or any version control repo that has basically all of your code in the same repo, same repository. And that doesn't necessarily mean that it's one program or one app that you're going to build or compile until you can have several projects or artifacts in that repo. And I guess that's why it's called mono repo because you can have multiple things in in it. So I guess that's the difference between the monolith where you have where you actually build one app and deploy it to one place.

11:00 The Monolith is the opposite of the microservice style. Yeah. Whereas the mono repo is just a way of organizing your code, and sharing, how do you propagate changes, look at dependencies across either libraries, or there are companies that take this really far, like crazy crazy far, like Google and Facebook, I believe, haven't worked on it. But I hear that they have one repo. Yes. Or like all all? What one, really just like, what's the checkout story look like on that? And we are not code? Yeah, it's got to be a lot of code, I believe. Google, probably gonna miss associate this. But I think Google uses bazel. And there's different tools that allow them that are kind of not just get but something that can handle you know, that scale of code. So it really, when I think about organizing, organizing my code, it's either me or Me and a couple people working on the code. And it's it's pretty contained. But when you start to think about hundreds or 1000s of people across projects, it starts to get really wild. Right? Yeah, that's has to be a completely different different story. Already really pumped to see how they how they do how they work working with teams. Yeah, absolutely. So maybe we could talk about some of the why.

12:26 If you're not doing this, why would you do that? Like it seems you highlighted that there's kind of these these two trends that you saw out there and some of your articles, and we'll talk about the articles and link to them. You talked about seeing a trend to more people are trending towards this mono repo. And more people or other other groups of people trending towards having more small repos. Yeah, for you take my little micro service example, the user access service might be its own repository separate from the catalog service, whereas others might say, we're gonna put all that together and all the utilities and that other data reporting project and all of that goes into one giant repo, even though there's a big team on it, right? Yeah. Well, I think in the terminal, from my experience, what I've seen joining different teams and different companies that I've seen exactly that some, quite recently, they joined the team at a company with several teams. And they actually migrated from a mono repo to several repositories. And it was part of their microservice journey, as they call it, because they had one repo with all their code. But that code base were so difficult to work with. So they kind of wanted to extract one app at a time into a separate repository just to be able to deploy that one and work within the in a reasonable way with your with tooling support and things like that. So before that there was a different company joining a different team. And we went the total opposite way. We had a couple of microservices and

14:15 stem that we were quite easy to work with, but we identified issues or problems with it, because to us, maybe there's one service that was outdated her dependency. So the biggest problem was the actual code duplication because we had one service that we had developed one thing for, and we had another service that we needed code that was very much like the thing we had in that in that

14:46 other service. So I guess the solution could be to extract that one into to a library but then you have three but repositories and yeah, I guess there's difficult to find that like good balance between one or the other.

15:00 I agree and that extracting, I mean, that's certainly one of the possibilities. As you say, well, we're gonna you know what we now have a third repository and what you have the share that data access repository and package. And, you know, that's probably not the type of thing you publish the PyPi, but it's very likely something you would publish to some kind of internal dependency artifact system that you would depend upon, right. But the problem is, if it's used in just these two places, it's, it sounds like that sort of description, the kind of the team is probably working on both sides of those microservices, and they understand the broader system. But as it grows, and more people depend upon it, it's harder to understand this little standalone project, who is using it? In what ways? Are they completely? Can we make a change here? If we refactor this? Who do we talk to about painting, even just the signature of a function? How do we reach out to the other parts of code other stakeholders and say, Look, we need to change this function. But

16:07 we got to you know, we're changing the data model, and you're gonna have to figure out how to go along. On the other hand, if all of those projects were together in a giant mono repo, you have tooling that understands well, what functions call this function? Or what thing imports this class? Or who's using it? Is it used at all? Actually, maybe you could delete it if you thought someone was using it, no one's using it right. There's, there's a lot of understanding of the broader integration if it's all there with you, right, definitely.

16:41 This portion of Talk Python to Me brought to you by Microsoft for startups founders hub, starting a business is hard. By some estimates, over 90% of startups will go out of business in just their first year. With that in mind, Microsoft for startups set out to understand what startups need to be successful and to create a digital platform to help them overcome those challenges. Microsoft for startups founders hub was born founders hub provides all founders at any stage with free resources to solve their startup challenges. The platform provides technology benefits, access to expert guidance in skilled resources, mentorship and networking connections, and much more. Unlike others in the industry, Microsoft for startups, founders hub doesn't require startups to be investor backed, or third party validated to participate. Founders hub is truly open to all too. What do you get if you join them, you speed up your development with free access to GitHub and Microsoft Cloud computing resources, and the ability to unlock more credits over time to help your startup innovate founders hub is partnering with innovative companies like open AI, a global leader in AI research and development to provide exclusive benefits and discounts through Microsoft for startups founders hub, becoming a founder is no longer about who you know, you'll have access to their mentorship network, giving you a pool of hundreds of mentors across a range of disciplines and areas like idea validation, fundraising, management, and coaching, sales and marketing, as well as specific technical stress points, you'll be able to book a one on one meeting with the mentors, many of whom are former founders themselves. Make your idea a reality today with a critical support you'll get from founders hub, to join the program, just visit talk python.fm/founders hub, all one word, the links in your show notes. Thank you to Microsoft for supporting the show.

18:32 Yeah, I totally agree. And our editors are so smart. And they came can find you suggest and If you have this function signature isn't really correct and stuff like that. And that's so much easier when you have your source code in, in a folder that is like right next to, to the one using it. So that's a huge benefit, like editor wise to the developer experience, I guess. Yeah, absolutely. And I think one of this is both a benefit and a challenge. You know, I'll maybe link to the mono repo Wikipedia page. And here's, here's some of the advantages. One of the number one advantage list is ease of code reuse. So it's possible not necessarily suggested, but possible that you say, well, the data access functions and classes that we need on this side, we need some of them over here. But if you have the whole mono repo, you could just say well, import them in both projects, and deploy, you know, a larger piece of code to your server, but who cares? The servers have a lot of storage, and they'll be fine, right? Yeah, the challenge, I think is going to be the, you're going to end up with a tightly coupled architecture

19:42 pretty badly if if you just say, Well, I see way over there, there's that file, and that's the one I want and we're just going to grab that and you know, it doesn't necessarily encourage good behavior, but it does make reusing code to understand how it's being used. Easy, right? Yeah. And also you, you're like in the risk zone.

20:00 No actually building a monopoly again. Yeah, exactly.

20:04 She just part of the part of the API employees run there and part of the API points are in there. But they're, they're effectively just one giant thing. Right? So I guess what I think is that if you are using a mono, do you want to have your code in a mono repo, I guess you would need some sort of tooling or ideas about how to separate your code into into separate artifacts that don't have the entire code base, and his package only the wall, the code that is actually needed for for this artifact. So I guess that's, that's part of the challenge. Having a mono repo, I would say. So I've been thinking about this a little bit, leading up to our conversation today. And certainly, you can use packages in the, you know, we have this problem in Python or this challenge, where packages mean different things, but it has the same word. So a package could be just a grouping of modules into a directory that has a Dunder init. Or it could be something on PyPI, that you ship and you deploy a new version on its own. And I mean, in just the on disk, but Dunder init, sort of Yeah, local grouping, right. So you could create these these sorts of groups within your mono repo and say, we're going to import that, but have a little bit of a formal separation and say, Look, we're not necessarily going to deploy it through some versioning story and let other people pull it in, because then we lose track of who's using it and how they're using it. Are they on the right version? But we'll still maybe think of them as a Python package? In a sense. Do you have any experience with doing doing it one way or the other? Any preference? Yeah, what I was thinking of the company that I joined, that were migrating from the Monterey bolt, they had done a couple of attempts to do this code sharing thing with the likes, I think it was like git sub modules or semblance of things started flux. But all of that ended up into in, it was became too complicated to, to understand what was going on. And I think even the editors support for it wasn't really, really perfect to what when you had like, these kind of dynamic linking. So I guess that's, that's why they they chose to abandon that idea. Yeah, it sounds a little bit like with the sub modules, that it was not a pure mono repo, but kind of, let's have different sections on on our

22:44 repository, but bring it together, when we see it for development as if it was a mono repo, right? Like, we're gonna sort of put these files and this is a sub module, that's a sub module, and they're kind of separate. But then, once they're all checked out and linked up, then our tool things if it is one giant thing, like the mono repo would be right, yeah. So it's kind of a, an intermediate. I also thought about this as well, like, maybe, maybe you could put together to get tools like that, I do want to highlight a couple of Git tools. Because maybe I'll take a quick bit of audience feedback real quick. But I do think that, you know, when it's five people, 10 people, you just check the thing out, and it's gonna be fine. But as it gets larger and larger, both over time, and lines of code and number of people, it's, it's gonna be a thing where it almost becomes unmanageable to just do a git clone URL and see what happens, right? It can go grab the grab coffee, and when he got back, it's not hasn't downloaded. We've probably seen that XKCD where there's people like fake sword fighting on chairs, like get back to work, like, you know, we're doing a git clone. Leave us alone. Okay, sure. If I gotcha. It's gonna be while quick. audience feedback says mono repos are okay, if you have a dedicated team that manages the advanced tooling required to deal with them. Yeah, absolutely. And I sort of related Lucas asked like, would you use bazel for your projects, or rather make files are similar in case of lengths and builds? So yeah, there's the different tools that like Facebook and Google and those folks use. There's also pants. Weinberger has talked a lot about it. I've had him on the show before. And pants is one of these tools that can kind of help pants build. But, David, how about you like what did what were you all using in terms of more advanced tooling? Or was there any anything special back then? It was not No, not really more advanced than then actually make files to make things happen. But the the place the team that I joined, actually started to use this,

24:52 I guess we're going to talk about it's in the architecture called Polyliths, and there's also tooling. As for the term that can offer

25:00 There are a solution to many of these headaches where we're having our mono repo. So yeah, absolutely. And back then it was because Polilith originates from from closure. So we were actually writing closure code. And and from for Python I, I will say started to look around for solution I actually read read a little bit about pants. I think that can solve a lot of problems, too. It seems like a really a great tool with a lot of useful, useful functionality. And then there, then there's also poetry, I don't think they've up it's not really, it's not really about Monorepos. But you can, I believe that you can use pure poetry and have your dependency selected the third party libraries, your own org, or the one PyPI, in sort of an add on and not the third party, but your own in an editable mode. So they will, as soon as you change something, it will be updated. So I guess there are some tools that can help you along the way. But I guess there's still a lot of frustration with having that smooth and really joyful mono repo experience that you would like to have. So that what led me to start working on this project. I do think that Python, the way that its dependencies, and its understanding of linking files together through directories and things like that makes it a little bit more challenging than other systems. Like if I was doing C++, I could open up Visual Studio code and create a broader, broader project and say, these three libraries are what I want to see as my project. And it doesn't really matter where they come from, you build it, and they linked together and there's sort of a build the Delta only type of thing, whereas in Python, you kind of need to bring on a little bit more tooling to say, I know, it looks like there's some giant Python thing here, but just these two pieces. That's what I want to think of as the thing you know, what we're going to talk about with some of the stuff that you've done with poetry with Polylith, and others, certainly make that relevant. I do want to talk about the git tools. But it's also interesting this comment from David Pool. It says we use submodules for legal licensing reasons that is home have GPL code separate from our priority code, proprietary code, rather than just dropping it in, which obviously has different implications. Oh, that was very interesting to learn about. Yeah, yeah. I hadn't really thought about that, either. But yes, definitely, if we want to think about it. So let's just talk get for a moment. Now, one of the big challenges is, if we're going to put this all into one giant GitHub repository, which I hinted that it could get really large, especially if you put binary files, like some of your build tooling, or other assets, you might put it in there. And then that makes it extra tricky. The less something can diff the more it kind of piles up quick. As I was thinking about this thing, I learned about a couple cool ideas. Let's let's talk about this one first partial clone. This is something that was totally new to me. So normally, it's a git clone, the URL to the Git repository. However, you can say things like filter, dash dash filter equals blob. Have you seen this before? David? No, this is totally new to me. So, but it looks really interesting. Yeah. So what happens here is, if you the blob is like a binary file, right? And what you're saying when you say filter blob is it'll check out all of the Git history. And normally, when you do a clone, you get, at least for the branch you're on, you get every version of the file, so you get clone, you disconnect from the network, and you've got everything, right, which is the beauty of Git. But if you've got a real huge repository, it also might be the drawback of Git. So you can filter out these blobs in the historical sense. And if you say this, what you see in your hard drive for the working directory is identical. But the dot Git folder, or the history only has the working version, not all copies of the history of the blob. This has like a really huge effect. So I did this on Talk Python Training my courses website. And if I just say git clone, the repo, it pulled down 118,000 objects. It resolved 71,000 deltas, and it updated 10,000 files in was a gig on disk. If I just say filter, dash dash filter equals blob, colon none. It goes from 118,000 to 10,000. Downloads, it goes for it's less than half the size, and the resulting files on disk. Those were the same but the intermediate delta is where like when 70th or 1/50. Really a big difference. And this is, you know, it's a pretty old repo, it's got a lot of stuff

30:00 is nothing compared to what a lot of people have. So, one, there's one problem where like, okay, if I'm going to try to get clone of a mono repo, there's just no way. Right? So link, adding this aspect here, I think actually would be really valuable. Yeah, definitely. Because it's,

30:15 I guess in a normal, the use case is that you want to work with the latest version of the source code, you want to develop something new. So I guess that's right, what you want there on this makes time. And what happens is, if you say, well, actually, we need to switch branches, or we need to go back three months in time, it just goes back to the network and, and clones a little bit more. It's like an incremental clone, as it needs it. So I think actually, this, this should help a lot of people who don't know about it working with mono repos that turn out to have a lot of files and a lot of historical, especially binaries that grow over time. Yeah, so those are the ones that are huge. You know, it's not the text files, usually that are the problem. So I have to bookmark this

31:03 This portion of talk Python, to me is brought to you by brilliant.org. You're a curious person who loves to learn about technology. I know because you're listening to my show. That's why you would also be interested in this episode's sponsor brilliant.org. Brilliant.org is entertaining, engaging and effective. If you're like me and feel that binging yet another sitcom series is kind of missing out on life, then how about spending 30 minutes a day getting better at programming, or deepening your knowledge and foundations of topics you've always wanted to learn better, like chemistry or biology, overall, brilliant. Brilliant, has 1000s of lessons from foundational and advanced math to data science, algorithms, neural networks, and more. With new lessons added monthly. When you sign up for a free trial, they ask a couple of questions about what you're interested in, as well as your background knowledge, then you're presented with a cool learning path to get you started right where you should be. Personally, I'm going back to some Science Foundations, I love chemistry and physics, but haven't touched them for 20 years. So I'm looking forward to playing with PV equals nRT, you know, the ideal gas law, and all the other foundations of our world. With brilliant, you'll get hands on a whole universe of concepts in math, science, computer science, and solve fun problems while growing your critical thinking skills. Of course, you could just visit brilliant.org directly. Its URL is right there. And the name isn't it. But please use our link because you'll get something an extra 20% off an annual premium subscription. So sign up today at talk python.fm/brilliant and start a seven day free trial. That's talk python.fm/brilliant. The link is in your podcast player show notes. Thank you to brilliant.org for supporting the show.

32:43 Related to that. So quirky ads, wouldn't a shallow clone be more predictable. So this is also interesting. So shallow clones is a older way to do this in Git and GitHub. The problem is with shallow clones, you don't get the full history and change log with these partial clones. You have all of the history commit history and details. You just don't have the files. And they're incrementally pull in. So you could do a shallow clone. And then there's another one, what was it called a sparse clone. So a sparse clone is another tool that you can bring in here for advanced git usage where you say, I know I've got this huge directory structure, but I just want to get these three directories, or this subdirectory structure. And you clone only part of the files. Right? So we were talking about how Python understands just like the whole thing is one giant project. And maybe even you check it out and try to open it, your editor will just sit there indexing, indexing, and you know, autocomplete won't really work and you know very well and go crazy. So you can just say, I want these three directories, and I want them partially cloned. So they only have like, the recent history, and they're not so insane. And you can kind of combine these to get really focused views into a mono repo, which I thought was pretty interesting. Yeah. So anyway, when when I think back to the story you told about how you guys were using sub modules, I kind of feel like these partial clones plus sparse clones might be a better fit than trying to you know, symlink things together. Because it really just it just is the same thing if you want to clone the whole thing you do but then you can kind of just as you kind of filter out and you can also with those sparse clones you can retro actively add and you go oh, I also need that directory say like get sparse, add Oh, this out and end this piece to come in as well. And there's some interesting ways to put these together. So I think these tools are going to be for people who are working with mono repos. I think those advanced git features that I called out might be really helpful. What do you think? Yeah, I totally agree, especially when you have a mono repo that is a lot of code. So it seems like this is you wouldn't

35:00 The live with want to live without it, I guess because it's probably not that helpful. Yeah, I think so too. So sparse checkout, I believe is actually

35:11 smarter checkout is the term I'll link to it as well as partial clone and sparse checkout. There we go. Nice. Okay. There's so many features in Git that I guess most of us don't don't use. Yeah, I think so too. Like, I've been doing it for a really long time. And this sparse checkout is completely new to me. I only learned about it. Yeah, I was trying to research a little bit more some, well, how do you do actually manage with these mono repos as we were preparing for our chat today? So yeah, I think there's a lot of tools and flexibility that are not obvious or not apparent that people can use to make mono repos work really, really well. There are so a lot of interesting ways to structure your code and put it together and use it once you get it checked out. So maybe let's What do you want to start us start with a fresh take on mono repos? This is one of your articles. Yeah, yeah. Why not? Yeah. So tell us the story here. I wrote this article. About a year ago, almost a year ago. Before that, I was trying to figure out how to work in habit, this nice developer experience in a mono repo and coming from closure and learn to have been having Learn, learn new things, and to have some new fresh ideas on how you can solve, solve things I wanted to give, give it a try in in in python two, so. And also at the same time, I was actually doing working in with microservices. But in several repos, and I'm kind of found myself those roles, not a huge thing, but it was still like a lager sort of a logger module or, or a package. I knew that I had done it in the other microservice just a couple of weeks ago. So okay, what should I do? Should I create a library now, this is way too small to create a library. And it's not it's not wasn't even open an open source. It's like a properity. system. So we would need a lot from private repo servers and things like that. So I just ended up in copying some code. And you know, while people would go, of course, you should never do that. But sometimes, it's just not complicated enough, or important enough or big enough to justify all the change management and dependencies and like, you know what? That file it goes into this project? And it's, usually it's fine. Yeah, until they get out of sync, or there's some weird, you want to upgrade one? And then oh, well, where else is it right? Or you discover a bug in that part, and you forget about that you're copied into the couple of times and over the other repos over then you have a lot of work to do, there's probably a whole section of cybersecurity history and like breaches where they thought they fixed a problem in some system. And it turns out, someone else found a copy of it that wasn't fixed and broken. And, yes, this is not ideal. I really want to take to give the Polylith in Python I tried, because I really enjoyed the way things are structured. And a lot of these like headaches are solved. There's and polylith that is really, really new to me, maybe tell people about poly list before we go on, because I suspect a lot of people don't know about this. Yeah, maybe we shouldn't be as big in there, then we'll come back to it, then. Well, it's, it's an architecture, but it's also a tool level, or something with a tooling support. And it's open source, and it's developed by a fellows suite. You're welcome John. And I was fortunate to actually work in the same team as him. So he was the one introducing this, we decided to give it a give it a try. He was new in our dream team. And I was I have to confess, I was a little bit skeptical at the beginning because skeptical of monorepos in general to because based on previous bad experiences, so it will totally this didn't mean idea. He said that you have when you write code, you're supposed your aim to write them in small parts. And that's what populates called components. And a component, the pololith uses the idea of Lego but for code. So a component could be a piece of brick, a Lego break, that you can reuse in several, in several ways. And a component can be everything from a small, like tech view, something that we do normally would put in a utils folder, like functions that do maybe some parsing or something. But it can also be a combination of other components. They don't have to be of the same size. It's the idea of composability and reusability. That is the important thing. So the big parts of Polylith or our components, and they will have something called bases

40:00 And that is also a component by that kind of special kind of components. If you think about LEGO, if you're going to build like a house, you're you often have some a base plate where you put your Lego bricks on it. So a base is sort of that part. And encode that could be like, if you have like a fast API app, maybe a base plate could be where you define the endpoints like, Okay, I use the API, decorator start or something like that. So that could be the base. And then the code that actually does something could be a combination of different components. And that's a rational way to think of it. Yeah. Whenever you talk about stuff, what are the things is difficult is to understand what is the scale? Or how are? How are these different? So one way it will our functions components? Are our modules, components, our packages, components? Like? What are how to identify that since it's not a formal language, runtime terms? Yeah. It's really, really good helped me understand how I make ends meet these things in Python. Yeah, yeah, that was really interesting. Because

41:12 you can see a component, it's not a fully blown feature, like maybe a library that you publish in on PyPy would be it's smaller than that. And I guess it could be a single function. But it's probably one or more functions that kind of relates with let's say that you

41:33 what should

41:35 should prepare with an example. But let's say that you you want to parse a CSV file or something, then I would probably separate the different things he wants to deal with at CFCs, we fall into functions already. And the component is where you kind of grew up the functions that kind of relate to each other that are that makes sense to have in a Python package. So then I mean, a namespace with a Dunder in it. So yeah, that could be a component. Yeah. Although it could be modeled in one of these sub packages. It's, those sub packages often have multiple jobs and roles and you're like, let's, let's stay really focused on the one thing that it does, right? Yeah. Okay. And all of this, this lives in

42:21 polyliths calls the workspace and that is basically a repository with that with our top cons, what's there with a configuration about what how your repository looks like. So you have your components in namespaced, packages, basically. And you have your basis, the entry points are all of your apps. And then you have something called projects, or a project. And that is the artifacts or artifacts that you want to build. So you can have only one project, if you're going to build one thing, maybe our test API service. But the benefit comes when you are bill about to build something new. Then you have your project infrastructure, like the configurator project configuration, and what is is defined in the in our in a folder called projects. And then what the code you use, you pick the code from the components and bases folder, so you, you will reuse the same source code, and then you package it into different artifacts. So it sounds a little bit like we've got this mono repo with all of this stuff. And the polylith is Its job is to say, well, we're going to look into these these little parts of this monolith. And I need this part in this part in this part. And it's some tooling and some concepts to help you manage some artifact, you know, not, we don't usually have exact build artifacts, often if you're not doing, you're shipping out separate packages. But maybe these three pieces here make up the fast API service that we're gonna host over there. And maybe these two services make up the data science tools are willing to give to the data scientists for their notebooks, and that there could be some overlap in those, right? Yeah. Yeah. Okay, then another good thing is with the workspaces that you have, you don't really do much work in the project folder or something like that. Because the main idea is that you have a developer development environment that includes all your bases, all your components. So the good thing with that is that you can experiment and try out code without

44:36 worrying about if you have imported the correct software, you just you have a top project folder containing all of your dependencies and packages, and then you can take it from there. Once you're ready to build the project. They've been build something out of it an app or whatever it is, then you can start constructing that project specific configurations, you can you can choose where

45:00 I want to start, but I usually start from the development workspace. And I'd really like a way of working called repl driven development. But I also learned from closure, which is they try out things. And basically, yeah, that's a really nice developer experience that you get from having the entire source source code, you can try out things combined components and develop new features doing more that more and more of that as well, this repl Driven Development, or I'd say, not necessarily development, but exploration, I got to kind of understand, I'm not really sure, is this gonna click together? Right? Or is this rather than putting a lot of structure in place, because I'm not even sure I really want to stick with it, you know, fire up a repl. For those of you don't know, if you just type Python, what you get read eval print loop, that's the repl. I do it in PyCharm these days, because Python has a Python console, but it gives you autocomplete and tab completion and like, all of the things that are in your project when you're playing in the repl, but you know, still same idea. That's really great. Yeah, absolutely. So are you guys using this on your on your projects right now? Or what's what are these days? Yeah, I'm fairly new to the team that I joined. So

46:12 I've introduced them to the ideas, but they have like already code and stuff in place. So my hopes are that we will give it their way. And once we have something new to about development, or include the existing microservice, maybe we could could give it this idea. Try so. So that's basically because, yeah, that's always the problem is, even if you yourself are not new, the ideas may be new to you, and you've done a bunch of previous work. Like for me, I was showing you that repo before, I'm just I'm thinking, there's a lot of cool stuff I could do about how to restructure this and reuse it and make it available, you know, sort of bring more of the mono repo stuff to some of the things I'm doing. But then I got to update the continuous deployment changes. And I've got an update where the web server, just like, you know, it's just like, there's all this stuff that's there. And it's, you know, do you kind of pause what you're doing to try some new big organization of code here. That's how it goes, right? Oh, learn, Polylith. That's where we actually used it. In production. We had several kinds of different services and apps where we had everything in a in a Palante, a mono repo, but that was closure and closure, is a compiled language like C++ or C# is that right? Yeah, it's on top of the JVM. So okay, so yeah, that is compiled that Yeah, to through that tangent? Yeah, I do feel like things are just a little the, the deliverable artifacts are slightly more obvious and easy to distinguish. When you're talking about something that compiles and like, here's the library that drops into the bin folder. And here's the executable binary that drops, you know, there's an output folder that has all the pieces that were selected, whereas Python, it's got a little more careful, hey, put that together. Yeah. So what I came up to was, how can this idea be, we'd be used in Python, and then that was actually what's led me what led me to poetry, which I think is a really nice tool. Because poetry I think, is out a lot has a lot of nice ways of handling projects and dependencies and structure and stuff like that, there were a couple of things missing to make this idea work. Because when you have a project configuration, you actually include components from a relative path. So you navigate up, then navigate down to their actual component. And if he would, just to build the way in which our source distribution from that level would wouldn't be a valid package, because then you would need to ship the entire Monorepo structure. And you don't want to do that. So what I did was, I developed a plugin to poetry that actually allows for having relative includes, and that will build the code that will build the wheel and the source distribution with the kind of correct pause. So it takes all the package dependencies and puts them in the same folder basically, before it does the wheel and then you have a valid distribution that you can use. So it's, it doesn't mean little copying and stuff like that. So alright, so the actual output here, it's a couple of wheels that we could say, pip install into a virtual environment, and they work together. Is that right? Yeah. Okay, try this idea with like services 20 like fast API services, instead of including the source to code like as a tree, installing it with PPF from a wheel or a source distribution, preferably be from a wheel if you don't have any, like operating system specific stuff so and I think that works really well the end result if you do it in a Docker container.

50:00 you can like have full control of what's what's in there? So yeah, so in your Docker file that builds the Docker image, you can just say, you know, copy these three wheels over pip install them into my, my Python environment I have over there, and it's just taken the way you call them workspaces that you need. Over there. The what is the terminology that you call the artifacts here? I mean, I know their wheels and packages, but is there a polylith term that that matches? Oh, cool. I don't think so. Maybe it's like a built artifact? Perhaps. Yeah. And it's probably the most simplistic scenario is that you have like an app like an API endpoint, or maybe a CLI or even a library. And you probably want to install them in different places, maybe, hopefully, even on AWS lambda. So so you can kind of have the control of the over the deployment in your CI saying that I want to deploy this lambda in here. And I want to deploy this FastAPI over there. So and with follow if you can't, you can, then this wheels differently. Nice. Yeah, I feel like the if you think of micro services, the and monolith, the AWS lambda are any serverless Functions as a Service story is like the most extreme version of this? Yeah, here's a single function to get support, here's a single function to get deployed, right, like just one after another, right? It's kind of out of control. And you have all of them in separate repositories. Yep. Oh, please. No. Yes, that would be definitely tricky. I want to come back and talk more about this poetry plugin, because it's really cool. But let's address this question from Lucas here is, how would you approach versioning in a mono repo, like, have these different services of the different pieces. So if I'm gonna have that fast API thing that builds over there, I'm gonna have some other projects that are built with some overlap that are shared over to save my data science team, they're going to analyze data in some other way, but reuse some of the code. I've got some thoughts about what are your thoughts on versioning? Say, in the repository? Or how you deploy them? Yeah, that's a really good question. If I'm looking for a more polylith perspective, I would suggest a very simplistic solutions, let's say that you have your this project depends on something with this version. And the other project is still on an earlier version. And I think that can be solved with the components itself, because all source code is

52:34 made up of all these components. So if you're going to build a new version of something, and if that version uses a new third party dependency, or that is incompatible, I would suggest to you to add it as a new separate component. So your new projects that will use that one will take that component instead of the old, I think it's a good practice if these components have the same for as long as possible, the same API, so it should be same easy to switch from the old to the new one. So that would be my solution to versioning. In a year, at least when using using polylith. If we look at not the mono repo style, but you build you build an artifact like a wheel from one repo, and you put it up there and someone else depends upon that maybe through an internal artifact management, Private PyPI, you would pin your version in the requirements file for that other one, right? Because that repo is changing at a different rate in a different cadence than maybe the the library that it depends upon. And that's really natural for us as Python people because we already have a great long list of things that are open source that we don't build that we depend on, right? Fast API Pydantic. And starlit would be an example from what we've been talking right, those things you don't control and you got you depend on them. So you've been the versions and upgrade them as you see fit. But one of the advantages of the mono repo to me, as far as I see it, at least is the whole system, not just your part of the system, but the whole system is consistent all the time on the main branch or the production branch or whatever the thing the shipping branch is. So I think you would maybe branch do some of your work, merge that back in and at that point, you could ship everything if you need to. Right, yeah. Because you're continuously keeping it together as a whole system not like well, that library built in that library built because they're separate repos, but you put them together and who knows what's going to happen. It's I think this is an advantage of the mono repo, in a sense. Yeah, me too. I definitely think it makes it easier to notice about using some parts of the code is six using a certain version of a dependency you will learn about it quite quickly, because if you would install it and try to run the code

55:00 So it's easier to note. And hopefully it will get easier to also update that code. But if you are in a situation where, well, there's so much code that you need to refactor, or maybe there's a breaking change that kind of has re thought the entire idea of that happens, maybe you need to do some sort of separation, and key told us until you have the time to refactor that. But yeah, I guess in most cases, it's pretty straightforward to update just everything is in the mono repo, right? Yeah, exactly. And if you've got some sort of continuous integration, or some kind of automated check, you're going to find out pretty quickly, this change you made has a consequence over there. And I think that's why people are really, people who are psyched about mono repos are excited about it. It also feels to me like, if there was going to be a breaking change, it's going to happen either way. It's just is it going to happen in small little pieces? Or is it going to happen in one terrible, huge? Oh, you got the new one? Well, let me tell you, the new ones really different. It doesn't work anymore. Like oh, no, you're just like, you want to merge more often? Or you want to try to integrate things more often, and not just wait some long period of time and go Now do they go together? Why are there 100? Or 1000? merge conflicts? I don't know, right? Like, the more you do these little continuous check backs and integrations, it's just gonna be so much easier. Oh, yeah. Totally agree. Yeah. I mean, the only scenario where you don't have to go back and pay that penalty is where the other service that you're versioning against says, We're never gonna go to the new version. And if this is internal code, it's unlikely that it's never gonna go to the new version, unless it becomes just dead. And then who cares? You're gonna need to integrate them eventually. Just keep doing it continuously. So yeah, I'm starting to really come around to the idea of these things. Yeah, yeah. So this polylith plugin for poetry. It's super cool. So for example, here on your example, you say poetry, space, poly space info on the terminal on the command prompt, and it'll say, hey, look, in here, we have two projects, made the two components and two bases, the lambda project and the fast API project. And they're made up of these different elements. And it really shows you know, what part of your code is, depending on the other ones even gives you a pretty table. This made with everybody. Yeah. I love it. Oh, great. Yeah. That's my favorite. favorite tool. So good. Yeah. So it's a really nice looking UI that you put together here as well. Thanks, Rick, I'm really happy to hear it here. That's how easy it for people to Yeah, go ahead. Sorry, I don't mean this taka, we just want to mention that what we see here is the the tooling support for for the Polonez thing in Python, and I decided to make it as a poetry plugin, I have some plans to like break it out of poetry to make it separate a CLI to maybe I'll do that in the future. But for me, it was a good fit for poetry, since I'm relying a lot of poetry features. And the the commander Polyinfo, is showing you are on overview of your Mono repo. And this is my example project with. So it's not a lot of code. But the idea is that you can do with list you own your components and bases, the common name for the these are bricks, then you get a sort of an overview of what what's in that mono repo, you can sort of get a idea of what's in there, what does this thing do. And then it's listed per project. So you can see which project is actually using, which brick brick, let me just give a little bit of visual information for people listening. So under the brick column, it says we have the logging, we have the messaging, we have the great API, and we have the messages for lambda. And then you've got in different columns, the different projects that might be consuming them and little checkmarks or, or dashes to say using or not using it, it makes it really visually clear how your elements fit together, right? I'm continuing adding commands to this tool and others also, you can use this information, the information about the workspace and the individual project to in your CI to determine if let's say that you change the message component, you do something in it. And then you would want to have the projects that are affected bent for that. So the tooling will help your CI to make decisions should build this project or to if it should skip building, because nothing has changed. So that's part of the tooling fork. That's pretty interesting because we've had attempts and they've always been like an awesome 80% solution that never quite works. But really good solution. Really good tool ideas, I guess to say, if you change this code

01:00:00 what actually needs to be tested again, or what need what needs to be analyzed again, if this other part of your system doesn't depend on it, you don't need to run those tests, right? either. If just the file changes, that doesn't tell you anything, you need to look at things like code coverage, what part of the system was touched if this part by affected by this at all right? Those, those are always really tricky, and how to keep like a history of code coverage, you know what to do. And all those attempts I've seen just kind of like, we tried, but we don't really do that, we'll just run the files that change, which is never enough. But this kind of is a natural way to express dependencies and that tree to say, Okay, if we change the greet API, we see that the lambda thing doesn't work with it. So we don't need to test anything to do with the lambda stuff. We only need to test the fast API aspect, right? Yeah. Looks like is that tooling in place now? Yeah, that's in place now. Okay. The polyinfo command. I think there was like the first command that I actually added, then there's a diff command. And what did I do? How did?

01:01:09 Yeah, the latest addition to commands is a check command. Because since you're, I was talking about development experience that you working out in our development project where you have everything, and then you might not the touch the project that much. And that means that you you're you could potentially forget to include dependencies, because as of today, there's no automatic here thing yet. I'm planning to add that later. But so far, you need to keep track of your dependencies and stuff like that. So I added a check command that actually does performance analysis on the source code. So if you, let's say that one of the components uses the requests library or something like that, or you don't have it in your dependencies, then it you would be notified for that particular project, it's very likely that you will discover it anyway in your development environment. But this is an extra check to just make sure once you're about to build something, can I really build this specific project? So it's a few commands, but it will be more commands that are more and more more useful, I guess? Yeah, it looks great. And you have some examples over in the Python, Polylith examples repository that people can check out. Yeah, yeah. So is the poetry plugin ready for people if they wanted to use it? I think so. I know, I haven't seen any stats yet. But I have a couple of users that contact me through the GitHub repo and social media that

01:02:46 are made. Some of them are just experimented with it and others have, I think they actually were working with it. And they're in like your daily work. And I think it's useful like I have to work there. remind myself to contact them

01:03:03 regularly to just check up on how it goes. And hopefully they will come to this repo. And let let me know if something doesn't work as intended. So but it's a new tool, and it's I probably needs need some more more work on it, of course. So yeah. Well, a lot of people will hear about it. Now. They can come check it out, and play with it.

01:03:29 And I'm sure you're taking contributions and prs. And you wouldn't mind Oh, yeah, audition, I would love you to have that. So So patients are very welcome. Yeah. And you also have really nice examples here. Like you have two videos that show how it works, you know, 15 minute YouTube videos, you've got some pictures, and you know, well done on that it makes it really easy for people to come and just see like, okay, is this interesting is it apply to me? So, good work, I do want to make a one really quick follow up Corky out there had mentioned that maybe shallow clones were a better, more predictable choice than the partial clones with a filter equals blob. In general, the people at GitHub are recommending not to use the shallow clones anymore. But to use instead this, these partial clones, because it keeps the history and it can incrementally go back and pull the stuff in as needed. However, there is one time where you may really want those shallow clones. And the reason I'm thinking of this is you talked about builds and using this to make builds run faster and only focusing on the parts that have changed. If you have a CI, the CI doesn't care about the history of your GitHub project. It just wants the working files, right? So you can do a shallow clone and say just give me only the files on the tip of this branch and then build it. And that can be dramatically faster than saying give me all five years of history of every single file and reassociate them

01:05:00 So, if you're thinking about CI, the shallow clone idea that I was dismissing a little bit is exactly a good choice, I think because you don't care about version history if you're trying to see if the current version builds or not. Right. So anyway, just a quick follow up on that. All right, David, I think we're probably out of time, I definitely encourage people to go check out your poetry plugin, they can check out PolyLith at Poly Lith. Get Booked on IO, of course, I'll link to it in the show notes. Now, before we get out of here, the two final questions to ask you, of course, if you're going to write some code, if you're going to work on the poetry poly lith plugin, whatever, what editor are using these days, well, these days I use Emacs. I really like the to lengthen before before emacs IDE. I really liked PyCharm2, but then I decided to learn Emacs and I stuck it out. Okay, every programming language. I'm going to code everything in Emacs. Nice. Yeah. Long, long ago. That was my very first editor for programming. Oh, oh, cool. And then yeah. brings me back to working on Silicon Graphics mainframes computers do in C++. So notable PyPI package something recent that we talked about something else that you want to just tell people about the thought was awesome. ran across. Recently, I have to say, rich, because it's such an awesome tool we'll learn when you when you're going to develop a CLI and want it to look nice. And yeah, I would Rich is a really, really good tool. A lot of visualization features and stuff like that. So that's a fantastic, yeah, good recommendation. And there's so much momentum behind rich these days. And if you're making some CLI developer oriented tool, just give it a little color, give it a little structure and something like rich, you know, even just a little bit of color, or a little bit of distinguishing, you know, one line of text from another makes such a big difference and being able to use it really quickly and easily. And Rich's probably the best way to do that, by far right. Yeah, it should check out my new names. Come on, Paula check because it uses our rich feature, but I will. I'm really happy about it silly, but I'm really happy. It uses an emoji too. While you're waiting. Fantastic. Oh, yeah, there's I love emojis. I love emojis in in CLI as well. All right, final call to action. People want to get started with mono repos with polylith with some of these ideas we've talked about, what do you tell them, head over to the polylith Git repo or the official colorist of docs and read about it and see. Also, if you're interested in monorepos in general, check out the other solutions that are out there. Because there are a lot of different approaches with different kind of focuses that may be fit your situation best of course, polylith because I, yeah, developed tool and really like that, but there's probably tools that are better for a different situation. So just explore, I will say, right, right. The tools that maybe Google chooses to manage its codebase might be the wrong tools that you choose for years, because the scale is so different. You might add so much complexity that it's not relevant. You know, it makes it really hard. But you don't need that complexity because you've got five projects, not 5000 projects, right. So yeah, absolutely. Look, look around as good advice. Okay, David, thank you for being here. It's been really fun chat. I learned a bunch. Yeah, thank you proved wrong. So we really play a fun to be on the show. Thank you. Yeah, you bet. Bye bye.

01:08:42 This has been another episode of Talk Python to me. Thank you to our sponsors. Be sure to check out what they're offering. It really helps support the show. Starting a business is hard. Microsoft for startups founders hub provides all founders at any stage with free resources and connections to solve startup challenges. Apply for free today at talk python.fm/founders hub. Stay on top of technology and raise your value to employers or just learn something fun in STEM brilliant.org. Visit talk python.fm/brilliant to get 20% off an annual premium subscription.

01:09:21 On level up your Python we have one of the largest catalogs of Python video courses over at talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in sight. Check it out for yourself at training.talkpython.fm Be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the iTunes feed at /iTunes, the Google Play feed at /play and the direct RSS feed at /RSS on talk python.fm. We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air. Be sure to subscribe

01:10:00 to our YouTube channel at talk python.fm/youtube This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon