Monitor errors and performance issues with

#355: EdgeDB - Building a database in Python Transcript

Recorded on Wednesday, Feb 16, 2022.

00:00 What database are you using for your apps these days? If you're like most Python people.

00:04 It'S probably Postgres SQL.

00:06 If you roll with no sequel like me, you're probably using MongoDB. Maybe you're even using a graph database focused more on relationships. But there's a new Python database in town, and as you'll learn during this episode, many critical Python libraries have come into existence because of it. This database is called Edge DB. Edge DB is built upon Postgres and implemented mostly in Python. It's something of a marriage between traditional relational databases and an O RM. Python's async and await keywords UV loop at the high performance async IO event loop and Async PG all have ties back to the creation of Edge DB. Yuri Salvanov, the cofounder and CEO of Edge Psf fellow and Python Core Developers here to tell us all about Edge DB, along with the history of many of these impactful language features and packages. This is Talk Pythonomy episode 355.

00:59 Recorded February 16, 2022.

01:14 Welcome to Talk Python Me.

01:16 A weekly podcast on Python. This is your host, Michael Kennedy. Follow me on Twitter where I'm at McNutt and keep up with the show and listen to past episodes at talk.

01:24 Python.

01:25 Fm and follow the show on Twitter via at talk.

01:28 Python.

01:29 We've started streaming most of our episodes live on YouTube.

01:32 Subscribe to our YouTube channel over at talk.

01:34 Python.

01:35 Youtube to get notified about upcoming shows and be part of that episode.

01:40 This episode is brought to you by Century and Signal. Use Century to find out about and fix errors when they happen and build real time next generation video meeting rooms with Signal's API. Transcripts for this and all of our episodes are brought to you by assembly AI do you need a great automatic speech to text API? Get human level accuracy and just a few lines of code? Visit talk. Python. Fmassemblyai. You are welcome to Talk Python to me. Yeah, it's great to have you here. We just met recently at Pai Bay down there. So in honor of that, I wore my Pai Bay shirt today.

02:14 Oh my God, I forgot about that episode. I probably should have worn my T shirt.

02:18 Yeah, Where's your pie Bay shirt?

02:22 What a cool conference, huh?

02:23 It is. I love small conferences.

02:25 I like small conferences. And in the time of COVID and all of this madness, having a winter conference outside in California at a beautiful food court area where it's warm all there were just so many things to like about that. I got to tell you, it was great.

02:40 It was amazing. It was the best day of the year for me. Essentially, just being able to talk to people finally and see many friends was amazing.

02:49 We both gave talks there. I talked about Flask and HTMX, and you spoke about building a database engine, a whole database with Python. And that was interesting. So then I watched a little more and I just thought, wow, there are a lot of interesting pieces of technology in and around this thing you build called Edge DB. So I'm Super excited to dive into that with you. But before we do, let's just hear your story real quick. How did you get into program in Python?

03:14 So Michael Honda Ellis and I met many years ago, probably 14 years ago or something like that, working in a small Canadian company building big enterprise software for companies like Walmart. Back then, we were actually like the system that we were working on was written in PHP and I mean, we pushed PHP to the limits, but we always knew that when we start our own thing, we will be looking for something new and fresh for us to tinker with.

03:41 And we looked around and we just left with a lot great syntax.

03:47 Fantastic. Was that Django? I mean, that's right around the time of the Django growth or was it something else that brought you in?

03:52 We started with Django, we played with it a little, but actually we just started building our own thing pretty much immediately without looking like too deeply at existing frameworks or anything. Yes.

04:06 I get the sense that you and your co founder are framework builders.

04:09 Yes, we are. Somebody asked me maybe it was quiddo. I don't remember anymore. What was your first thing that you wrote on Python? And I said function decoratory.

04:22 Exactly. Awesome. So how about now? What are you doing these days?

04:25 Working on Edge DB full time.

04:27 HDB, full time, exclusively. Yeah. We're building a great company here, so it requires 100% of my attention.

04:36 Yeah, I bet it does.

04:38 You can build a business on the side, but it's a hard time. And you have this great article that talks about how you're going to build your favorite new database in a month, but that it actually takes ten years, pretty much.

04:52 It was a long, sometimes painful journey and we didn't realize right off the bat that we will be building a database. Right. We are building a Python framework and essentially, granted, the torment was right.

05:06 A better way to talk to databases in Python was your idea, right?

05:10 Yeah, exactly.

05:11 But I guess you didn't have in mind that you would also build the database.

05:15 No idea.

05:16 Yeah, very cool. Well, I think what you build is pretty interesting and people are going to enjoy checking it out. But more so, I think what is pretty interesting is there's a lot of things in the Python space that we enjoy and we appreciate, especially what I would consider to be the advantages of modern Python. I don't know how you feel about I know you've been deep in this world, but to me it seems like just two or three years ago, people building frameworks think fast API or Pythonic or stuff like that have really embraced.

05:50 They take full advantage of Python. Three. Right. They say, oh, look, we have these we have typing, we have ASIC in a way. We have all these things that we can bring together. And it really feels like that stuff is all starting to come together in a big way over the last couple of years. What do you think?

06:02 Yes. I also have this feeling that the system becomes more and more robust, that people build amazing systems with Python. I think that asynchronous IO played a part in it for sure. But I think that the other big thing that is happening to Python right now is strict timing. My Pi and other similar tools. This is what actually allows you to manage your code base at scale. And this is just incredibly important. So, yeah, those two things I would say.

06:33 Absolutely. And we're going to get into it when we get into the architecture and stuff. But you talk about using Python for making parts of your Python code faster. And of course, that relies heavily on typing because you want to say, here's an Int 64, don't turn it to a pylon object pointer. We just want an Int 64 that works on the stack really quickly, right?

06:52 Yeah, absolutely. I mean, that's an open question. Will Python ever enjoy strict typing that the Python interpreter actually takes care of to make things run faster or not? But for Python, it's absolutely critical. And actually, sometimes I had this feeling that writing code in S easier than in Python because, hey, I have a compiler, something mismatches. I know the compile time, not the run time.

07:17 Yeah. I suspect my Pie is a little like that as well, right, exactly.

07:21 So when my Pie started happening because I was experimenting heavily with season before my Pie became popular, when my Pie finally became like this common thing to use. Yeah. It was almost a revelation that we finally have this beautiful workflow with Python.

07:34 Well, I want to talk about some of the technologies that are sort of surrounding this larger project that you've been working on. So over on GitHub. Github. Commaggesttack.

07:47 This is your company and one of the HDB and all that is coming out of. But there's a lot of interesting things happening here that I think people who see modern Python doing its thing are going to appreciate. We talked about the Async stuff and so on.

08:06 I wanted to kind of dive into some of those first that are sort of orbiting your projects that you all have created here. So let's start with Magic Python. Tell us what this Magic Python is about.

08:16 So Magic Python is a syntax highlighter. It's actually used in vs. Code by default. If you use vs code and you edit Python and vs. Code, this is the stuff that vs code uses harder code. It was used by GitHub for years to highlight all Python code. And recently I think GitHub switched to the other Python highlighter. But, yeah, Magic Python, I guess, is incredibly popular. It was born out of frustration, actually, because we were big fans of Metaprogramming. We abused Python a lot in interesting ways. And one of the ways to abuse it was to push some meta information to function annotations. It was before my Pi and before typing. So, yeah, we just were adding stuff to those annotations. So we quickly discovered the built in syntax highlighters in TextMate. Back then, I was using TextMate heavily. They just couldn't highlight the my goal was to basically, hey, can we create our own syntax letter for Python that we just take care of notation? And by the way, highlight all of the newer stuff that is available in Python Three, because back then Python Two is still the King and Python Three was kind of barely supported.

09:28 Interesting. So a lot of the highlighters and Editors and stuff really would highlight kind of based on Python Two syntax.

09:36 Exactly. But I was like.

09:37 No, it's.

09:42 Clear to me that Python Three is the future. But yeah, the industry was still kind of moving slowly towards it. But the key innovation of Magic Python and I think this is why I think it's high quality thing is unit tests. So I'm a big fan of writing tests and having this test driven development. And first saying after highlighting Hello World in text made, first thing for me was to figure out can I actually build like a unit test engine? Because if you think of those syntax Collider, it's essentially a huge regex. It's just mind boggling, like huge reggae.

10:22 I was thinking about that.

10:28 Well, I was thinking down the road, you have a really interesting query syntax that's pretty rich and powerful for Edge DB. Yeah. Did your experience writing Magic Python give you the ability to go like, oh yeah, we can write this thing that parses this insane complex language. How much did this play into your ability to go beyond SQL?

10:50 I wouldn't say much. I mean, we have syntax highlighters for our Schema files and Dashboard. They're pretty basic. Right now we just highlight keywords and literals. We have some interesting plans about that and we can talk about that later. I guess we'll be talking about hTbl, like implementing Language Server protocol for HQL. But the highlighter itself is pretty simple. But I used this unit testing framework in those highlighters and this is what gives me peace of mind. I know that it's just working. When I'm adding like a new operator or a new keyword, I don't have to just test it manually on some big file.

11:27 Yeah, absolutely. Sort of speaking to that thing, I talked about a lot of interesting stuff coming out of your work. Adrian out there says didn't know you also made Http tools as well. Indeed. Yeah. There's a lot of cool stuff that you've done. So final thing on Magic Python, can I use it for other purposes than just vs. Code? Sublime and Adam like, if I wanted to build my own thing that printed out like terminal stuff or even some other kind of UI app, could I use this more generally than the Editors?

11:54 I haven't tried it myself, but given that GitHub was using it to highlight the code, I believe that there must be some libraries and packages that just can consume this text, made inspired syntax and just highlight stuff. You printed the terminal. I see.

12:10 So it comes out as TextMate and then it just happened to these three Editors?

12:14 Yeah.

12:14 I think sort of common heritage understand that.

12:17 I think textmates started the revolution originally, then Sublime Text just inherited the format and then be a school just decided, hey, we should just use it.

12:25 Yeah.

12:25 Cool.

12:27 Very cool. All right, so when you spoke about your journey towards creating this product in this business, you talked about how central having Asynchronous I O and server work is going to be. And of course, that is true, right? Not all databases, but most databases are able to be a point of extreme concurrency to the point that they can like, handle the processing. Right. So if you got a web app, you can scale your web app out and like, if it's got two connections or 200 connections to the database, generally, that's fine. The database is meant to sort of scale that vertically, I guess. So you really talked about, well, if you're going to do this in Python, that probably means leveraging Async. Io pretty strongly, right?

13:10 Yeah, it was pretty clear that we need Asynchronous. Io. As you said, databases have to handle lots of connections. And also it's important to understand that most databases like positive, for example, the cost of establishing a new connection is pretty high. So we want to educate tools to mitigate that, like PG Bouncer, for example. It's like middleware you put in front of PostgreSQL to make connections cheaper. And we just didn't want to have any of such tools as requirements for. We just wanted it to work natively out of the box without any configuration. So, yeah, we had to have cheap connections in terms of how fast you can connect. And also, I mean, if your connection is just hanging out there, we wanted to allow that essentially. So we had to have a way to handle thousands, maybe hundreds of thousands, just concurrent connections that maybe are not super active but just open. And Asynchronous Core is the only way how you would be able to do this.

14:08 Even if Python didn't have Gill, for example, we would still use Asynchronous. Io to tackle this problem.

14:16 This portion of Talk Python to me is brought to you by Sentry. How would you like to remove a little stress from your life? Do you worry that users may be encountering errors, slowdowns or crashes with your app right now? Would you even know it until they sent you that support email? How much better would it be to have the error or performance details immediately sent to you, including the call stack and values of local variables and the active user recorded in the report. With Sentry, this is not only possible, it's simple. In fact, we use Sentry on all the Talk Python web properties. We've actually fixed a bug triggered by a user and had the upgrade ready to roll out as we got the support email. That was a great email to write back. Hey, we already saw your error and have already rolled out the fix. Imagine their surprise, surprise and delight your users. Create your Century account at Talkpython. Fmcentry and if you sign up with the code Talk Python all one word. It's good for two free months of Century's business plan, which will give you up to 20 times as many monthly events as well as other features. Create better software, delight your users and support the podcast. Visit talkpython. Fmcentry and use the coupon code Talkpython.

15:32 There's an overhead for threads and the context switching between the OS, trying to figure out if that thread still needs to do stuff. You can't have hundreds of thousands of threads and be in a good place.

15:42 Yeah, but my concern wasn't even that. Maybe we would be smart and implement some sort of empty and scheduling or something like that. I don't know. It's just I don't believe that humans are good at rising threaded code acing away gives you this luxury of essentially seeing where you can actually give up control of the current code when it can obey things and potentially switch the context. Right. So you can be smart about locking access to shared resources and things like that. With threads, it's way harder. Maybe with rust it's easier because there is some compiled magic that can help you. But with pretty much every other language, threats based programming is very hard.

16:25 It is hard. Well, I suspect many people, but not everyone out there listening knows that when you use the Async I O tasks and so on, at least by default, they run on a single thread. There's not actual threading happening. You can use threads or multi processing. You can get that true concurrency, but this is different. It's not really threads.

16:46 Yeah, it's different. Basically, the idea for Async await is to use it for I O bond code. So if your code is doing lots of IO, pushing data from multiple connections here and there, this is an ideal thing. But if you're computing something like, I don't know, doing something scientific computation or just use blocking IO or disk IO, it's best to offset that computation into a separate process.

17:12 But yeah, if you just want to handle a lot of I O in Python concurrently, async IO is the way.

17:18 Yeah, the way that I like to think of async IO and Async wait is what you're scaling is. You're scaling the waiting. If you're waiting on anything, if I'm waiting on a database or waiting in the database version for the client to talk to me or not to talk to me, then you can basically take that period where you'd be waiting and turn that into predictive computational time.

17:38 I love it. I think we should put this straight into the Ducks.

17:42 I never thought about this because people tell me I'll see these benchmarks and stuff said, oh, well, I did this thing where I overwhelmed the database and then it didn't go very fast. When I did Async IO, it's like, well, because there's no period in which you're waiting. You're like constraining the resource beyond what it can take. But if there was some sort of, oh, I'm waiting for this thing to get back to me, well, then all of a sudden there's your performance. Okay, so when I saw this come out, I was super excited. I think this was three, four. My history reminds me correctly when this came out. Do you remember?

18:15 I think it was around three, four, because the most important prerequisites.

18:19 I think so, because three, five. Go ahead. Sorry.

18:22 Yeah. The most important prerequisite for Asian Cayo to happen was actually the yield from centers. Probably not a lot of people remember about it, but back then, Asynca required, this like Ad quarting the carrier and you would use yield from instead of a weight in your code. So that path. So basically Python 33, I think was a moratorium on modifying Python language. So we had to wait for Python 34 to add yield on. And then you think it will happen, right?

18:47 That enabled it. But when I remember when it came out, I was like super excited about this. And I sound like, oh, this is a harsh programming model. This is really like Direct and Juggling, those sorts of things.

18:59 And I had experience with C Sharp which had Async and Await Keywords as well. Oh, my gosh, I wish this language had Async and away. And then I didn't know you then, but I thank you because you offer authored Pep four, nine, two coroutines with Asynchronous. Basically, we have Async and Await in Python because of you, right? Well, somewhat, yeah. I don't want to give you too much credit. You created the prep that said, let's stop using the old firm and continue and all these other things that you do.

19:28 I can tell you the interesting our backstory.

19:31 Sure. Yeah. So basically we were trying to figure out the future API for HCB back then when Http wasn't doing a thing and we knew that we want to support Asynch IO in our future client. But how do you actually have a migration block? Like you would have to say, try, finally accept, commit. It's like a lot of code and we have context managers in Python.

19:56 Right.

19:56 So with the contacts manager, you would just say with transaction and just do all this magic behind the scenes. But we didn't have an asynchronous version of Width we have Yield From. But how do you kind of move together? Yield from width wasn't clear. So I thought, hey, we have Acker. We could have Acin with done at the national case. We just should replace Yield form with a Weight because it was also familiar with C Sharp. And I also liked the short and neat syntax of Facing a Way. And then the next thought was that, hey, what if you have a cursor to the database and just want to iterate over the roles and make it like prefetch those roles? And this is how Easy Four was born. And then in about a couple of weeks, Language summit happened. I think it was in Montreal, that was PyCon us in Montreal. And I met with Guido. I showed him a rough sketch and he said, yeah, let's do it. I think I implemented the first prototype of this thing in the interpreter over a couple of nights. Like I just called it straight for 8 hours. I wanted to impress Widow.

21:01 I just had this rough implementation. And then just over the course of like month and a half, I was refining it and writing this path. And this is how it happened. I think it all happened because of Guido, because first of all, he saw clearly this is an improvement to your drone. Like a big improvement.

21:16 It's a huge improvement. It makes it incredibly approachable. It's like you do what you normally do, but sometimes you might have to put the word of weight there.

21:22 Exactly.

21:23 But your mental model isn't about callbacks and weird stuff like that. It's just like you write the regular code, but you sometimes need to await a thing. And it's beautiful.

21:32 Yeah, exactly.

21:33 I'm grateful to Guido because first of all, he recognized this thing and encouraged me. And second of all, he actually inspired lots and lots of refinements in this proposal. And I was working with him essentially all this time. Like a discussion happened in Python and sometimes he and I changed emails and he proposed some ideas and I would just pick the path. The Widow was actually also behind this proposal to a big extent.

21:57 There's some mind blowing stuff here, like the Async with, for example, as you point out, right? These are wild ideas. Instead of just calling, just saying here there can be a function you have async for, async with, there's really neat things in here, actually.

22:11 Yeah. I think that's still Async With is pretty unique. Like JavaScript, for example, is lucky because they have this nice syntax for declaring anonymous functions. Right?

22:22 So you can just say await transaction and pass a function. And it's like a multi line function. You can do whatever you need. You don't actually have to have something like JavaScript, but in many other languages you would need something like this. And pretty much, I think we pioneered this idea in Python. I think I saw a proposal to make using Async and C Sharp, but I'm not actually engaged with Sharp immunity. Maybe it was implemented, maybe not. I don't know.

22:52 That would be the parallel, but I'm also not tracking it.

22:55 Yeah.

22:55 Okay. This is really cool. So awesome work on this Pep and getting this.

23:00 Thank you.

23:02 So let's talk to more Asynchronous things real quick here before we get to HDB. Actually, three one jumped the list just yesterday.

23:10 Absolutely.

23:11 Okay, so when you're doing Async IO, there's this background event loop that looks at all the things that could be done and says, are any of them waiting? Can we take that while it's waiting and put it aside and go do something else? Right. I scale the waiting story and there's an implementation for that in Cython. But you all decided, you and Elvis, your co founder, decided it would be nice if there was a faster, more optimized version of that part that does the checking and execution. So you created this thing called UV loop and ultra fast async IO event loop. It's incredibly easy to use. Right? Like to install it's two lines, right? You import and then you run install and you're good to go, which is fantastic. Tell people about UV looping.

23:58 Should this just be standard stuff we do in all of our code that uses Async and away?

24:04 Let's jump in. So Euler wasn't the first thing that I created. The first thing was actually Http tools. Someone asked you about it like a few minutes ago. So I just wanted to experiment with Python. I discovered Python. I thought, hey, this might actually be a useful tool and allow us to speed up Python a lot for some things like parsing, Http for example.

24:26 Right. Loops.

24:27 Yeah, exactly, exactly. So I look at NodeJS and they used ACP parser. I think that parser was actually extracted from NGINX.

24:37 And yeah, it just wrapped it in Http tools. It's literally like 100 lines of code, maybe even less, maybe 50, just like a small robber over the C library. And it worked and it worked great. Then I got this superpower. I can quickly.

24:54 What other things can I grab?

24:56 Because you could do the same, just like using Python CPI. But you would end up writing like three X, maybe five X amount of code and using Sizing, it just feels like magic. So yeah, it worked. And then I was like, interesting. So there is this Lee BPV library that actually powers node. Gs and it's cross platform and it's super fast and OGS is fast. Maybe, just maybe I can do the same. I can just wrap it into Python and make a drop in replacement for event. So I prototype something relatively quickly, maybe in a few days. Basically, I just implemented like a loop object and also the staple of the most basic thing. And it worked. It worked just fine. I was able to implement call later and then I was able to run a coaching like wait, slip one, print hello world and it worked. And then I just over the course of next several months, I think three, maybe four, maybe five months. I was gradually implementing Asynchronous API swearing a lot because I discovered that this API surface is just huge.

26:02 It's an enormous API actually. And yeah, then we posted Benchmarks and I think it went somewhat viral. It was on HN. I think it was like post number one on HN for a long time.

26:17 Yes. I think Brian and I covered over on the Python Bites podcast when it came out because it was big news. Yeah.

26:23 People are excited specifically because basically we show that you can write some Python code like a simple protocol parser and it would be almost as fast as Go and sometimes it's on faster than the Null GS, which was surprising. So yeah, I think a lot of people are excited about it.

26:41 Yeah, that's fantastic. So the quick takeaway here is UV loop makes Asynch IO two to four times faster. You've got some benchmarks for different situations and amount of data and so on with regard to sockets. So let's wrap this one up with. Is that a universal statement that you would recommend there UV loop install?

26:59 It depends, I think for production it makes a lot of sense to use evelope or you should try it because there are still some minor incompatibilities in Europe up that are really hard to track. Maybe there's some behavior difference or maybe there's a boxing using something that a lot of people are not using the URL and still a possibility. So yeah, use it with Care introduction in local development. I don't think you need it vanilla Asynch should be plenty. There is one more interesting thing about Uzillo. It's a package, it's a package on Pip. So if we find a bug, we fix it and we publish a package. You don't have to wait until Python three dot eleven dot seven to get your box fixed or improvements made. Yeah, or improvements made. Exactly. So this kind of suggests that it's a great idea to use. But on the other hand, we really haven't had any emergency releases or anything in a long time. We basically release almost like every year just to catch up with the latest bypass conversion. I would say that you will be pretty stable at this point.

28:00 Yeah, very cool. Yeah, it definitely seems. I think also it's probably a context of when does it make sense, right? If you're running three tasks and that's your whole program, who cares how fast the event loop is? Alright, three tasks. But if you have many, many fine grain tons of little tasks and there's lots of like how complex and how many tasks? Basically how complex is the task coordination job of Async IO? The more complicated it is, probably the better benefit you'll get from Uvloop. What do you think?

28:28 If you go deep in the details, I would say it's not so much about juggling tasks around. It is more about performing I O in the most optimal way is just because it's so low level. It just uses lots and lots of tricks under the hood to just do IO faster. And the entire loop of calling callbacks it's a tight loop and C essentially. So it's much faster than a loop in Python. So that actually those two points. But yeah, the performance improvement is noticeable, usually very noticeable with the UV loop. Cool.

29:03 The benefit is if it's literally import Uvloop Uvloop install, run your benchmarks, comment that line out. Run your benchmarks again.

29:11 Exactly.

29:12 It's so easy, you don't have to commit to it. It's not like I'm going to swap O RMS and try it again.

29:17 Exactly. But I just love packages in Python that do this magic. Like, if you remember, there was this package called Cycles created by Armin Rigo, creator of PyPy. You just import cycle cycle install or something like that. And boom, you have like an alternative CPU. Your program just magically becomes 510 times faster. Just magic. So yeah, it's great when we can do something like this.

29:41 Yeah, that's fantastic. Adrian has an interesting question. I know this came up around requests a couple of years ago.

29:47 Yes. Could you give your thoughts on having things as part of the Standard Library, basically having UV loop in this case be the replacement for Async I loop rather than having an external package updated independently?

30:01 Yeah, it's an interesting question. And I'm not super involved in conversations like this. I know that Python core developers consider it actually separating South Library and shipping off on the side so that it can have like its own release schedule. I think it's sort of mitigated with Lucas Lang actually speeding up the release cycle for Python. It's being released like every year, which is amazing. And I think the pressure is lower down the separation status library. As far as including UV loop as part of Standard Library, I'm not sure it's a good idea. First of all, it's entirely in sight. It's like 50,000 lines of sight or something like that. We will have to either adopt Siphon as like an official standard library tool or rewrite it in C. And if you write it in C, it's going to be 100,000 lines.

30:44 That will be huge. So probably not going to happen anytime soon. Major with things like My by Sea would make it happen eventually. That might still pretty early, right?

30:56 Okay. Yeah. The conversation was had around that with regard to request as well. Maybe you're a part of it since you're a core developer, but they decided not to make requests. The new Http library of CPython because it would hobble requests.

31:14 It would mean requests could only be changed once every twelve months or something like that. Right?

31:19 Yeah, I think one of the concerns with requests specifically, and I wasn't actively involved in those conversations at all. But I think the concern that I heard was that Http is pretty wild and you often need to fix security issues and bugs, and you need to act quickly. And if something as huge as requests and so fundamental as requests was part of standard library, we would just have to be way more flexible about making bug releases for see Python.

31:50 Python, it's such a huge thing, right? Like operating systems bundle. It like multiple different workflows are centered around it.

32:00 It runs on helicopters and Mars.

32:03 There's a lot of Edge cases. People are not thinking about it.

32:06 Exactly. Just upgrading a separate library is so much easier than upgrading the entire Python thing. So, yeah, I think this is why packages like Request for sure will stay out of standard library.

32:17 Yeah. All right, final question before we move on from UV Loop, because it's not even our main topic, but it is very interesting. Teddy asks, are there any trade offs of using Uvloop as opposed to the native built in one?

32:30 I think this is time for me to make a shout out because we still haven't implemented a couple of APIs that are in Async IO like API Balls protocol. Maybe there is something else. I just haven't got time to do it myself. Very busy with SGB, so if anyone wants to join the project and help, that would be great. And that basically answers the question. The fundamental episode already all there. It's almost 100% compatible with Uvlop with manual Asynchronous. No trade offs, except there are a couple of relatively new APIs, I think past three nine and past 310 that are still missing from you.

33:10 We still should implement them to be a true replacement, right?

33:13 Yeah, I think it's sent File and API Balls and maybe something else.

33:17 Okay. Uv Loop is running inside HDB.

33:21 Yeah. It powers the I O server. Basically we use multi processing architecture in Http. We have a pool of compiler processes because this is like computation, CPU heavy thing to compile your query. And then there is a core I O process that just runs UV loop and quickly goes through your connections and pushing data between clients, et cetera.

33:48 This portion of Talk Python to me is brought to you by Signal Wire. Let's kick this off with a question. Do you need to add multiparty video calls to your website or app? I'm talking about live video conference rooms that host 500 active participants, run in the browser, and work within your existing stack, and even support 1080p without devouring the bandwidth and CPU on your users devices. Signalwire offers the APIs, the SDKs and Edge networks around the world for building the realest of real time voice and video communication apps with less than 50 milliseconds of latency. Their core products use WebSockets to deliver 300% lower latency than APIs built on rest, making them ideal for apps where every millisecond of responsiveness makes a difference. Now you may wonder how they get 500 active participants in a browser based app. Most current approaches use a limited but more economical approach called SFU, or Selective Forwarding Units, which leaves the work of mixing and decoding all those video and audio streams of every participant to each user's device. Browser based apps built on SFU struggled to support more than 20 interactive participants, so SignalWire mixes all the video and audio feeds on the server and distributes a single, unified stream back to every participant. So you can build things like live streaming fitness Studios where instructors demonstrate every move from multiple angles, or even live shopping apps that highlight the charisma of the presenter and the charisma of the products they're pitching at the same time. Signal Wire comes from the team behind Free Switch, the open source telecom infrastructure toolkit used by Amazon Zoom, and tens of thousands of more to build mass scale telecom products. So sign up for your free account at talkpython. Fm SignalWire and be sure to mention Talkpython to me. Receive an extra 5000 video minutes that's talk Pythonalwire and mention talk Python to me for all those credits.

35:38 Another thing that came out just today, I know this is I don't want to spend too much time on it, but there's a big new feature for tasks and async. Io in Python 311 coming very soon. And you just gave a shout out on Twitter yesterday saying that Task Groups is coming to async. Io. This is the way because right now, if you start to task, there's no way to say, well, if this one fails, don't even bother running that one. Right. They're fully independent. This is the way to sort of create a dependency and control them as a set. Right. Tell us real quick about this. Yeah.

36:13 We have an API for spoiling tasks concurrently. It's called Asyncho Gather, but it's just a suboptimal API in many ways, and this API is way superior. We have to credit Nathaniel J. Smith for his work on Trio and Trio Nurseries specifically. And Trio, we can run an entirely different podcast. That is all just about actually.

36:37 Yeah. We talked about Trio on the show quite a while ago when it was fairly new.

36:41 It's an amazing thing and there are lots and lots of great ideas in Trio. One of them is having this thing. It's called Nursery in Trio and the Task groups, essentially, they just replicate this Nursery idea. They Port it from Trio to the bigger points about how this API works are all similar to Trio. There are some details about how cancellation works, et cetera, but most people probably won't really care about that one. Yes.

37:10 Okay, very cool. It's great to see more innovation and the async. Io.

37:15 Yeah. But task Groups, I'll just talk for a couple more minutes about Task groups. So Task Groups was like a requested thing. A lot of people wanted task groups in Asian Ko, and I was GM sometimes I was GM on a daily basis entirely. We promised us task groups. When can we have our task groups? So the big elephant in the room with task groups is how do we handle exceptions? Because multiple things can fail at the same time and the essential will propagate out of this agent.

37:42 With does group, you'll end up with a hierarchical tree of exceptions representing the state of failure, which is not how we typically think of exceptions.

37:51 Exactly. And we just had to figure it out and we had to figure it out in the core because if it was just some, I don't know, some exception class defined an Asynchronous, then what would happen when your Asynchronous program crashed? Right? You wouldn't have like a correct trace back in your terminal. You wouldn't be able to understand what actually happened. So we had to integrate this into trace backs into GBOT. We needed to make sure that it's like a standard thing that pulls like Century, for example, can take advantage of it and provide you great visibility into what happens in your application. So we had to work with this exception group thing and there's this amazing new core developer, Eric Catfield, and she spare headed this effort of just implementing this and drafting a proposal and just doing it to completion, essentially. And it's because of her work. Actually, task groups are finally a thing because task groups themselves, it's like 100 lines of code with comments. There is not much to them. The huge thing is getting exception groups in. And I believe Python is the first language that has this feature like right in the syntax right in its runtime model. And this is also huge because I actually believe that Python now can be like one of the best languages to do concurrent programming in. I don't know, maybe when we have Jet or something like that, it might actually match goal and performance somewhat the ideal thing.

39:18 So I'm looking at this syntax here which I'll try to quickly simply communicate to people listening on audio. It's an Async with block. So what you say Asyncwith asyncio task group and you create this task group and then you can create tasks within there that are all grouped together and then you also can do things like await stuff while you're in there.

39:40 It looks to me like one of the things that often I don't see possible in Python's async previously is the ability to just fire off a task and have it sort of just run in the background to completion. So you don't have to do like Asanko, like run all or gather or any of those types of things. Basically the with block, we won't exit the with block until all the tasks are finished or till it fails. One of those two, right?

40:08 Yeah.

40:08 That's the nature of it alone to just kind of say I don't need to kind of store up all the tasks and then make sure I'm waiting on them forever. Like, I can just kick them off and then if they happen to start in this place, then they're going to finish when this with block finishes. I'm even more excited about this than I was before.

40:23 Right.

40:25 It's a nice API to compose things on async. Io, and I believe it's one of the bigger deals for Asynch IO in the recent years, so I'm Super excited about this.

40:35 311 is out soon. Exactly.

40:37 Sure. The real estate.

40:38 I know it's an Alpha stuff right now, so it's getting real near.

40:41 Yes. It should be close.

40:43 Yeah, for sure. Awesome. All right, last thing before we get to edge DB proper, I would say that Postgres is the most popular database for Python people doing database things, possibly with the exception of sequel Lite. But that really counts for just like, oh, I'm doing testing or, oh, I use this for this incredibly small but, like, production level stuff. Got to be Postgres, right?

41:04 Yeah.

41:06 Maybe throw in some MySQL and then like, a little bit down, maybe some MongoDB, something like that. But clearly it seems like Postgres has a lot of interest for folks if you want to talk to it through Async and a weight, which is exactly how you want to scale your database stuff. A pretty popular library is this one called Async PG, right?

41:26 Yeah.

41:29 You and Elvis created that, huh?

41:30 Yes. So it was an interesting experience. Basically, we knew that HDB will be based on Postgres. That was clear day one. And we also knew that we have to have, like, this very high performance bridge, essentially between Python and Postgres.

41:49 They have to be Asynchronous, so there was no good Asynchronous client or Postgres back at the time. And we couldn't just use Psychopg, the most popular Postgres driver, because it uses text encoding for data. Maybe not anymore, but it used to have to use text encoding and we actually had to use binary for something. So we just knew that, okay, we have to just jump in and explore Post protocol, and we decided, okay, let's write a driver.

42:18 This is how Asmpg was born, I think. What makes Asynchronous different, just Besides that, it implements binary protocol and asynchronous its API because we were not basing it on the carbon Python DB API. We basically designed an API to be as low level as possible, as close to Postgres semantics as possible. So in DB API, there is this thing called Cursor, which has nothing to do with the actual database cursor, so we don't want to replicate that. So, yeah, we just built like, what we thought were proper primitives, working with polygraphs as efficiently as possible. We used by their protocol plus Asynch IO, and we of course use Siphon to speed up, like, all the bottlenecks in it. It's pretty much entirely in Siphon actually. And yeah, the result is just amazing. To this day, as input is like one of the fastest August clients on the planet, across all languages.

43:13 That's fascinating. You can see that it beats NodeJS and go pretty handily there.

43:18 Yeah, we should probably update this chart, actually. I'm pretty sure that they updated PG library for NLGs, so this is updated. I think it's closer in performance to Asynpg, but I think Asynpg is still the fastest.

43:30 Yeah, cool. Awesome. Well, very nice work there. So taking all these together, UV loop, Async and Await, and the language Async, PG, all these are building up your skills to sort of almost build a database. And so then you went on and actually did build a database, right?

43:46 Yeah, pretty much. So we had this framework which was like almost in Python for many years, and we built multiple different production applications. We shift applications that were deployed to G, Cisco companies like that. And we knew it's something interesting, but we also knew that it has to be bigger than just a Python form. Like it has to be a database. It's a surprisingly long road to make something to go this path, essentially, because you have to define a grilling, you have to define type system, you have to define standard library, you have to define protocols, how it works, how migrations work, all the different syntaxes for schema modeling. It's a huge thing. And yeah, with all the right primitives in Python itself, we knew that we can start morphing our code base into the separate service, essentially. And yeah, that was the necessary and required groundwork to make HGV happen. Without it, we would probably not succeed.

44:45 So Http really written in Python?

44:47 It is mostly. Mostly, yeah, mostly entire. Like I O service server is essentially a Siphon thing. So it's see. And this is why if you look at benchmarks of HDB, it's actually pretty close to positives to a new Postgres. Like the overhead of HDB is super low. That is only possible because of sighting and like all low level tips and tricks that we learned when we were working on Uvloop and Asynpj. So we really optimize it a lot. The compiler part, the thing that actually takes an HQL query and compiles it to SQL, that thing is pure Python and runs a separate process. But we do some also fix to make it fast. Like, we cache things aggressively.

45:30 In most applications, you don't have thousands of queries, you only have like 5100.

45:35 Yeah.

45:35 So they get cached pretty quickly and then you don't even run Python anymore. From that point on.

45:41 It'S just C. Oh, interesting. Yeah. You don't need to incredibly optimize the understanding of the query because like you said, it's not ad hoc stuff happening that happens at scale.

45:52 Exactly.

45:53 I mean, it's great when your compiler is exceptionally fast, but for database, especially if it's smart around extracting constants. Let's say you send select one, and then your next query, select two. Essentially the same query subsidy the same query, just different constants. So if you extract it and you cache the file query as if this wasn't a constant, but argument the query, then you don't need to compile it for the second time.

46:18 Yeah, I don't know Postgres Super well, but I know some databases at their level. When they see a query, they're like, oh, I've seen this query before. They can cache the query plan and those types of. So that's like another level of performance and speed up as well. Right?

46:34 We do that as well. And I mean, we did it even in ACP three. For example, as in PG, automatically prepares statements for you to enjoy this optimization so that Postgres doesn't have to repair virtual grid can just execute the precasted plan. We do the same in HDB and many other things. This is why HDB is based on positives, but it fully envelopes positives because we want to be in full control and underlying Postgres instance.

47:00 Right. So in some sense, this is a brand new database that's got some really cool features that I'm going to ask you about very soon.

47:06 Absolutely.

47:06 But in the other sense, it's got a lot of stability because it's kind of a database level API rethinking of a well known core that people already trust.

47:16 This is an interesting thing, actually. A lot of people are not 100% satisfied with relational databases for a variety of reasons.

47:25 Somebody is not satisfied with scaling, some are not satisfied with SQL, and some not satisfied with migrations and how Bridget the Schema is and how inconvenient it is to deal with the relational database. So it's a huge problem. You have a part of it, which is just language design, standard library type system, how that part works. The second is workflows around your database. The third is the engine of your database. Like how it actually works. Hdb wants to challenge everything, but we're also not dumb enough to challenge everything at the same time. We understand that just writing this whole thing from scratch isn't possible. No company in the world would be able to pull it off, or maybe some companies would be able to, but they're definitely not a startup.

48:11 But they have many employees.

48:16 Exactly. So for us, the only viable strategy was to pick a database that is already trusted, that is already fast and universally loved, which is Postgres. And it's also incredibly capable and just built on top of it. And it's not actually a new approach. Database like lots of databases actually are built on like primitive key value databases like level DB or something like that. It's a popular approach. We're just taking it further. We are saying that, hey, using a key value storage won't buy us much. We are like high level programming language requires a lot of code to be written to properly be executed in good time. But SQL looks like this nice compile target. So this is why we use Postgres.

49:03 Yeah, very cool. Kind of the TypeScript to JavaScript equivalent of the database query language in a sense.

49:07 Yes, pretty much. I mean, sometimes I explain HDB is LVM. Like imagine LVM. It compiles your high level code to low level code and then ship etc. For and the same about HDB. We compile your high level schema to a proper normalized table layout. We compile our HQL high level query language down to SQL. And that sequel can actually be jittered by positive, essentially. Ultimately, your HQL might be executing with like a native cool speed. Not now, but in the future.

49:38 So what's the elevator pitch for people who are out there? They're slightly not super thrilled about the database they're necessarily using, whatever that is. And they're kind of exploring. I've picked up a few things that I think make it unique, but I want to ask you, it's your baby.

49:54 All right, I guess I'll give two pitches. One is super high level and one is slightly more low level. A super high level pitch is that imagine you have a tool and when it's a great tool, it becomes an extension of your hand. Essentially, you don't notice it, you just do things right. Current databases are not like that. They acquire lots and lots of mental overhaul to work with them. Like, what word library do you use in this language?

50:22 Is there lazy Loading and plus one stuff I got to consider or not. And all those kinds.

50:26 Exactly. And then you have to learn their API, and then you have to learn sequel and understand how those things interact with each other. And then you have to care about deployment and migrations. It's just so much headache. This alone explains why MongoDB was so popular and is so popular, because a lot of people just decided, okay, to help with that. I don't want to deal with this.

50:46 I believe in the relational space altogether.

50:49 Yeah, exactly. Just abandoning the strain.

50:53 Yes. And we want to fix all of that in HDB. When I give you a tool that you just don't notice, we want to give you a data model that just feels native to Python or TypeScript or Go or any other language. You don't have to think in tables anymore. When I give you a query language that is super easy to use and learn and compose and build query builders around. And essentially we want to essentially kill the entire concept before we don't need it anymore. We are almost sorry that Orange has to exist in a way.

51:28 I was going to ask you about the difficult problem.

51:31 The problem was called object impedance mismatch. There maybe tables to like objects. It's a super hard problem, and I feel sorry that they have to go through this. But we just looked at this problem and decided, hey, can we actually just solve the subject impedance problem in a different way? Can we just avoid solving it entirely? Can we just give you database with the proper high level data model that doesn't have this problem at all? Sure. Then suddenly you don't need or else.

51:56 Let's talk real quick about the actual way to find what would be the equivalent, I guess a DDL table create script or somewhat related to that, maybe closer is like an Orm class.

52:08 Okay, can I start a little from afar?

52:11 Yeah, let's start back.

52:12 Okay. Now it's going to be the second pitch which is slightly more detailed. So we say that HDB is a new kind of database. It's not just relational, we call it a graph relational database. Essentially we are saying that we created an extension to the relational model. So what actually constitutes the object to the graph relational model is first of all, in all of your rows, all of your tuples in your relational algebra, they essentially have a globally unique key. Now this is a requirement. So data independent is just UID. Essentially every role in your database will have it. This is the first requirement. First modification, the second extension is links. The idea that links between data is like a first class citizen of the model. You don't need join, you don't need foreign key, you just know that, hey, if this type links to another type, it's just going to be like the relationship between the unique ideas. This is what unique ideas give you. And the second thing is that everything is set.

53:19 If you have an object that is connected to multiple other objects, this is a set of objects. If you have an object that has a bunch of properties, a set of properties, even a single thing is a set as well. And this later enables HBO to be super composable. But these are just like three simple kind of axioms that are in the core of the bottle. So if we talk about like this schema snippet where we have an object type blog post with required property content which is text and required link author which is another type called User. It's going to be compiled to a table in SQL with a column called Content, with a column ID which is going to be a unique UIG where blog post will have it automatically, it's immutable, it's read only, you don't have to create them manually and user will also be a table. We also have ideas. And then we'll have a separate column which is going to be called Author which will have ideas of users. So ultimately like deep beneath what you see in HGB like this high level schema, it's all compiled properly to the relational model. It's all normalized there. We are still relational, we still exhibit the same characteristics. It's just hiding a lot of this like low level things that you had to bother with with this high level, just abstracting away the low level stuff.

54:44 Is there a way to directly connect to that relational view?

54:48 You mean?

54:49 Otherwise.

54:52 I'm not sure if that's necessarily a good idea. But in sequel, Alchemy, there's a way to go. Like, I just need to get out of here and send raw sequel for a moment. Right. Like that feels like that's kind of the same. I just need to go to the guts for a minute.

55:05 Yeah. So with HGB, the goal is for you to never actually need that. There's just one exception to this. Just one exception.

55:13 Ideally, yeah. Okay.

55:14 But basically our goal with HQL, first of all, and make it more high level. And second of all, we knew that, hey, order for a relational database to be successful, it just has to have real language. Right. Because our data mobile is different, we have to come up with our own. This is how HQ was born and we spent years designing SQL. And the reason why is because we wanted it to be actually more powerful than SQL in many ways. Basically, if you have something that is expressible in SQL but isn't expressible in Hscal, to treat it as a bug immediately. If something is easier to do in SQL, it's a bug. And this is why we spent so many years kind of refining this thing to make SQL a capable thing. So basically, you never need to use SQL. You don't need to know about sequel or know about its existence. And this is a powerful thing because when you use a normal library, you have to know about sequel with SGB. No, you just learn a language. You're good to go for the rest of your life. Essentially, there's just one use case when you might need sequel. It's when let's say you're a big company and you're using some bi tools like Tableau or something like that, analysts that already know sequel.

56:28 And we're going to do something about we're going to open like adapter. Adapter. Exactly. Allow you to just run SQL against the database in readonly mode.

56:38 So that makes a lot of sense because there are these big bi tools and you're like, if your data is here, do you really want to have some job to move it to another Postgres just to run an analysis on it?

56:48 Yeah, exactly. I mean, just like with us not attacking this problem all at once and implementing the engine and the language and everything else here, we also understand that we are not going to replace all the business intelligence infrastructure overnight and we have to make it become possible. It's not there yet. We'll be part of a future release eventually.

57:09 You'll have a nice roadmap, which we'll cover in a minute. But I really love how you saw all data. Don't do it.

57:16 I can just say it out loud.

57:18 Like the ideas that just for people want to see if they go there just visually the way that you've laid this out of where you are and where you're going. So many libraries and products should model this because so often you'll reach out to the companies. Hey, it'd be great if you could do this. Oh, yeah, it's on our roadmap. Like, oh, yeah, what is that like? Where do you even have that? Anyway, I think your roadmap is great, but give us the update.

57:41 It is beautiful. And I encourage everybody to go and check it out. It's roadmap. It is slightly outdated. Well, lots of things that are in progress. We're already done.

57:50 Yeah. This Formula car here, this is a 2021 series. They just redid the Formula One car for 2022.

57:56 That's probably not what you're talking about.

57:58 Yeah.

57:59 All right.

57:59 So tell us what's coming for this.

58:01 What's coming? It took us years for building HDB One, and during this time we're almost encouraging people not to use Http because it's a relational database. If you build a business from an Alpha version of relational database and goes down, your business will go down with it, most likely.

58:16 And people should know you just released one point. Oh, that's a huge thing.

58:21 A week ago it was some hacker news number one for 13, maybe 14 hours. Wow, it's a pretty interesting event. We also had a live stream of launching it, talking about architecture of Http, of Greek language, comparing it to SQL. It's a great event and I encourage you to check it out if you have time. And if you're interested, it's YouTube, Http, check it out, you'll find it there. But yeah, so it took us years to do one all just right to make sure that scale is right, that is designed as sound, and that the schema is right. And the workflows and CLI and the cloud APIs, everything is just right. And that we are confident that, hey, we're not going to be changing it, we're not going to be retroactively. Fixing things took us many years, but now it's out. And now we don't want to spend many years on HDB 2.0. We actually want to make it way quicker. We have a solid foundation. We can iterate much faster now and this is what we're going to do. So our current target internal target is to release 2.0 sometime in May 2022, but relatively so 2.0 will have a few features. One is almost implemented. It's group by statement. As I said, the idea of HQL is to actually surpass sequel capabilities. And right now with Htl, it's already incredibly powerful. You can fetch that data hierarchy, you can compute things, you can use aggregate functions, you have some queries, you have JSON. It's an incredibly powerful language right now, but a proper group by statements will give it proper analytical flavor. Now you will be able to actually create reports. And we have a great group by design. By the way, we try to make HQL design process as open as possible. We have RFCs, It's GitHub, Hdbrfcs. So if you're interested to look at how our group buys different from SQL group I and why it's better than SQL group I, you can just go ahead and read an RPC while group buy. So group by is going to be one thing. Second thing is going to be a proper explain for your queries. Like why is my query slow? We have some ideas on how to make it less critic than the default explain output that you get most database. Then there is an exciting thing and I hope that we'll have enough time to implement it now, which is access control. So HDB is this like vertically integrated thing. So you define your schema and in your schema you can define aliases, which is basically view. In your relational database you can define fields or object types that are computed dynamically with HQL. So schema depends on SQL and SQL depends on schema in Http they are independent. So we have this idea, not that it's like super new, but in HDB is going to be super powerful is that you'll be able to specify different policies on your schema type, like allow reading something or allow mutating something or disallow etc. For right. And we don't want to hard code that essentially we are introducing this concept of context in a database. You'll be able to define sort of like global variables, like context variables in your schema, say user ID in 64 and something else.

01:01:34 And then when you just get your connection in your Python code, you say with context plus user ID that is automatically passed to the database. In your schema, you can implement Arbitrarily arbitrarily access logic on your schema type and this logic will be automatically enforced in order queries. So fantastic data for the home page is filtered, your fetching data for report, and it only includes the data that your business logic allow it to be there. So basically with SGB we'll have schema, and that schema not only will define just the data layout of your application, but also the access patterns and many other things in the future.

01:02:13 Yes, I really want to ask you about the query syntax because I find it super interesting, especially also how it relates to OEMs and so on. But Michael out in the audience has a pretty neat question that sort of follows onto the roadmap first. So since LGB is fundamentally Python, it would be great to have a way to run userdefined functions in Python against stored procedures. But Python not sequel.

01:02:41 Yeah, it's an interesting question. I mean, user defined functions. Well, first of all, there are like a couple of different planes, I would say of user defined functions in the context of HDB, because HDB has this notion of extensions. The API is not public yet, but HDB, for example, supports GraphQL natively. You can just run Htl. We want you to be able to define potentially eventually like user defined API handlers there, so that with HDB you would not need a backup at all if your business logic is relatively simple and you don't need like a full blown application.

01:03:25 Interesting. So if I've got like something on Netlify where it's pure static code, I just write a little JavaScript, some view or whatever. And it could theoretically do read only stuff, maybe to an HDB instance or something like that.

01:03:38 Or even write only. Yeah, absolutely. We just want to kind of push this idea of back endless development as far as we can. And because HDB has this incredibly powerful scheme and will soon have access control that already allows you to eliminate a lot of code. Right. If only you could define some simple server side database site functions.

01:03:58 A little bit of Python in there.

01:04:00 I'm starting to write a little bit of Python or JavaScript or maybe Rust or something that you can just make that request to Stripe API, do something and then glue things together and maybe you don't need back end at all. So this is our vision eventually to allow things like this. And secondly is user defined functions within the database because we're using Postgres functions are going to be running like inside posts. You will be able to call them from the query language like hey, use NumPy to crunch this data for me, like write an SQL. This is also possible. There are extensions for Postgres that allow you to do that. It's possible to define user defined functions. Multiple different extensions for that, are there? So yeah, it's an interesting thing for us to think about and we are thinking about it, but probably not for two point.

01:04:48 Yeah.

01:04:48 Okay, very cool.

01:04:50 Let's talk about this statement here for a minute.

01:04:53 This query syntax highlights a lot of probably what makes HDB unique and some of your motives here. So if you wanted to go and get a a movie which has a relationship to an actor's table and you want to do some sort of filter type thing, you would say select movie curly brace. Look at that title. That's the select projection. So movie title basically. And then actors curly brace, name and email. So is this part right here? The subactors is that traversing the relationship? That graph relationship.

01:05:28 Exactly. You're basically traversing the graph inside the selection.

01:05:32 Say order by name. You have this cool convention of dot, which if you are in one of these scopes like actors, then you can say dot and it means dot name applies back to actors. Is that right? Yes.

01:05:45 And basically this is just syntax sugar. Nothing prevents you from spelling it out completely. Like you say, you can say orderbymovie actors name, but you're already inside actors. Essentially. We're just giving you this.

01:05:57 Yeah, fantastic. Then another thing that stands out for the query syntax is you can define inline variables using the Walrus operator by the way.

01:06:06 So you can say average review equals math meant reviews of the movie then dot rating.

01:06:13 Is this also traversing?

01:06:15 Exactly, yeah. So basically a movie type has a MultiLing reviews. So multiple reviews can be attached to movies and every review has let's say five star rating, an integer one to five. And this is how you quickly can say hey, just calculate the mean number of all Linked reviews and all their ratings. Somebody on Hacker News years ago aptly called HGO as a child of sequel and Graphquil. And I mean, it's funny, but there is truth to it because GraphQL made it extremely obvious to people that working with Object hierarchies. This way, when you can just have a query that just selects something deep, right is extremely important. People suddenly realize this is cool. Some companies don't try to make Graphkyl work for relational databases such as Casuala, and they have an amazing product. The only problem is that GraphQL isn't actually it wasn't designed for growing database. It's a Navy language which is a Rest replacement. So while it works for some things, good luck computing something in Graph. Quote you can fetch things but you cannot compute like your average review is not possible to do in GraphQL. Sequel, on the other hand is very stubborn. When you have to select anything nested like thinks of tables, you have to think in tables. You either like Select Super white tables and then you have to write some Python code to kind of combine it back to your shape or use a norm or use a massive database. You can use things like Ariag, but SQL, it doesn't shine problems like this. So with HQ we are kind of wearing both worlds. You have this deep fetch syntax and you have an ability to drop computation at any point of your query.

01:08:02 Now, a couple of other super important things about HQL. If you want, I can go into them.

01:08:08 Yeah, we're getting short on time, but yeah, go ahead.

01:08:10 Sure. As I said before, sometimes the P Http is like this the thing like compiler. When we compile Htl query to SQL, we have one important thing. Every SQL query, no matter how complex it is, it's always compiled to just one SQL query. And this is very important in the context of relational databases because when you have just one single query, it's atomic. So you don't need an explicit transaction.

01:08:35 You always work with the same snapshot update. Essentially.

01:08:38 Interesting. So you're not in this case like going doing a query for the movies and then doing a query for the actors and then doing a query for the reviews as three steps.

01:08:48 It's basically a three way join and then you're getting the data back out. Something like that.

01:08:52 Slightly more complicated than a three way join.

01:08:57 Yeah, that's the idea. For one, it's always one sequel query. It's very important. We use lots of interesting tricks to make it happen. And if you are interested about those tricks, YouTube, Https, and whichever live event we explained the soul actually, but it's an important thing. And then HQL is actually very composable. So you can pack multiple different queries into one query. You can have a query that reads data, insert data, mutates data, and introspects the schema all in one huge thing. And I will execute quickly for you and return your data like a proper way ready for you to be consumed. So actually is extremely powerful in that regard. This is what separates it from Orbs, because your Orbit, SQL, Alchemy or Prisma or something like that. They might have a high level API for some operations, but they also don't really restrict themselves on how many queries it will take to implement the API. Sometimes.

01:09:51 And plus one. Yeah, right.

01:09:52 And if you benchmark it on localhost, for example, databases on your laptop and your code executes on laptop appears to be fast. So you have three queries instead of one. So what?

01:10:02 Like there is zero latency between your database and your call and probably not full production levels of data?

01:10:08 Sure. But when you move it to the data center, you will have latency between your code and database. And even if you have like one millisecond latency between your queries, suddenly you just start losing performance a lot. Because your Python that uses or JavaScript that uses more operation, you can actually fire like ten queries. This is easy. Like ten queries is fine. And imagine it's just 10 seconds and just doing that just latency, nothing else. You're just losing performance. So with SGB, it's not a thing.

01:10:39 All right, so final question here. When I run this, what do I get back in Python? Obviously there's a nice Async and synchronous Python API to talk to this. But when I run this query in Python, what do I get?

01:10:54 It depends on how you run it.

01:10:56 We offer you two modes, essentially two output models. Any SQL query can be compiled with JSON. In our Python client, you just say query JSON and it will return your JSON data like ready to be pumped your front end. Or you can just say query and when you say query it will return you rich Python object. You have movie Python object which with the title string attribute, with an actor's list which will have actors objects within it, etc. But it's also very compact, like on the I O level. So we are not sending super fat tables or anything. The data is neatly serialized, so no need for any thing. It's just like you have your native object data model in the database, you query it and you get objects out. So you never have to think about like any tables or anything. It's always sellable nice.

01:11:47 All right, final question then. We really do have to wrap it up. One of the things that's really nice about O RMS is I can say my thing dot and I get a list in my editor of what I should be getting back from the database.

01:12:00 Can I do that with this? I know the movie is basically defined in the graph QL schema definition. Is there a way to do like a type shed type thing? Yeah, edge DB. Sorry, the Edge DB schema language. But is there a way to do like a type shed thing to say? Well, that thing you get back looks like this.

01:12:19 Yes. Unfortunately not in Python yet. In TypeScript, we just released our query builder and it's insane because the API of the query builder super coolly replicates the layout of the SQL query. It's basically one to one correspondence. It's like almost the same thing. And in TypeScript we just focused on TypeScript first, then Python is next. But for TypeScript, yes, you reflect your schema with just one command line command. And in Vs code you now have full autocomplete. You can express your queries and TypeScript no matter how nested they are, no matter what kind of computation you do. It's still the same idea. Whatever query you build on your TypeScript is going to just single SQL query. Just single SQL query. It's going to be fast and you have full auto completion and more. You actually have full return type inference. You don't have to type anything. You have a query, your Vs code and type script. They will know the type of the data that's going to be returned. Okay, first like magic, we're going to see if we can replicate this experience with Python and my Pie. This is going to be our goal to make something like this happen. Right now we just have this low level, relatively low level client API for Python. You can run any skill clear. You can get data for it async or sync entirely up to you. But the typing integration specifically isn't there. And second, part of this question is that we are looking in future implementing a language server protocol for Http. So install Http locally and then we just connected and then you would have your order complete for HQ queries for schema files. This is going to be great, but I'm just not sure what kind of ETA we can put on that problem before and all.

01:14:02 Yes, looking forward to it. Very neat work on SGB and obviously all the building blocks that we talked about at the beginning. Congratulations.

01:14:09 Thank you, Michael.

01:14:09 Yeah, you bet.

01:14:10 All right.

01:14:10 Very quickly. Lightning round like quick favorite editor disco.

01:14:15 Although I enjoy Veeam as well.

01:14:17 Right on. And then, notable Pi package.

01:14:20 I'll list mine Eulo.

01:14:22 All right, I'll list my pie as well. My pie is a great thing. Used my pie.

01:14:26 Cool. Right on. All right, final call to action. People are interested in your projects, probably primarily Edge DB. What do you say? How to get started?

01:14:34 Yeah, absolutely. It's ready for you. It's 1.0. It's stable. Follow us on Twitter It's Twitter Edge database without any underscores or just Edge database. Follow us on Twitter. You will find the Discord link right in Twitter description. So join our Discord. We try to grow community and yeah, do something amazing. Hdb. I can say it with full confidence. Hdb is the most amazing thing, that thing that ever happened to relational databases. So take a look at it. This is the beginning of hopefully a big movement.

01:15:07 Yeah. Fantastic. Let me put in one final post script question. Sorry. I really wanted to ask you this, and I think it matters for people considering adopting it, but keep it super quick. What's the business model when you guys release this thing? How do people get it? Will there be a free version? What's the story?

01:15:23 So HGB is full of open source. It's Apache license. It's extremely permissive, no strings attached. We'll make money by running HDB for you. Essentially, we will have a host version of HDB HDB as a service. Yeah, absolutely. And this is how most database companies make money those days. It's not any more about enterprise version of your database so much. It is about, hey, can you run this database for us at a private cloud? This is what businesses want.

01:15:47 Back it up, scale it, give it all that kind of back.

01:15:50 Okay, we're possibly working on that. Although you can run HDB right now, on top of our Postgres, RDS Postgres and Google Cloud, we have guides for that. So if you need to deploy your HDB application, we have your back. But we'll have this, like, native proper cloud version of HDB with which you will be able to just put, like, with one terminal command, you will be able to bootstrap cloud database for yourself. It's going to be amazing.

01:16:14 All right, fantastic. Thanks, Jerry.

01:16:15 Thank you.

01:16:16 Yeah. Bye.

01:16:18 This has been another episode of Talk Python to me. Thank you to our sponsors. Be sure to check out what they're offering. It really helps support the show. Take some stress out of your life. Get notified immediately about errors and performance issues in your web or mobile applications with Century. Just visit hawkpython. Fm Sentry and get started for free. And be sure to use the promo code talk. Python. All one word. Add high performance multiparty video calls to any app or website with Signal wire. Visit talk. Python. Fm SignalWire and mention that you came from Talk. Python. Me to get started and grab those free credit when you level up your Python. We have one of the largest catalogs of Python video courses over at Talk. Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in site. Check it out for yourself at training. Python. Fm. Be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top you can also find the itunes feed at itunes the GooglePlay feed at play and the Directrss feed atrssontalkon FM we're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpathon YouTube this is your host, Michael Kennedy, thanks so much for listening. Appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon