EdgeDB - Building a database in Python

Episode #355, published Sun, Mar 6, 2022, recorded Wed, Feb 16, 2022

Episode Deep Dive Links Transcript

What database are you using in your apps these days? If you like most Python people, it's probably PostgreSQL. If you roll with NoSQL like me, you're probably using MongoDB. Maybe you're even using a graph database focused more on relationships.

But there's a new Python database in town, and as you learn in during this episode, many critical Python libraries have come into existence because of it. This database is called EdgeDB. EdgeDB is built upon Postgres, implemented mostly in python, and is something of a marriage of a traditional relational database and an ORM.

Python's async and await keywords, uvloop - the high performance asyncio event loop, and asyncpg all have ties back to the creation of EdgeDB.

Yury Selivanov, the co-founder & CEO of EdgeDB, PSF fellow, and Python core developer is here to tell use about EdgeDB along with the history of many of these impactful language features and packages.

Play on YouTube

Watch the live stream version

Episode Deep Dive

Guests introduction and background

Yuri Selivanov is the co-founder and CEO of EdgeDB. He is a Python core developer and a major contributor behind many well-known Python technologies, especially in the async and concurrency space. Examples of Yuri’s work include the async and await syntax in Python (PEP 492), UVLoop for faster asyncio event looping, and asyncpg, an advanced asynchronous driver for PostgreSQL. His deep involvement in the Python community and passionate drive to simplify database technology ultimately led to the creation of EdgeDB, which combines aspects of relational and graph databases on top of PostgreSQL.

What to Know If You're New to Python

Here are a few tips and resources to help you better follow the concepts discussed in this episode:

Start with basic Python syntax (functions, loops, classes) to understand the examples of concurrency and database code.
Familiarize yourself with how Python's async and await keywords let you build scalable applications that handle many connections or tasks at once.
Recognize that databases often require specialized libraries in Python (e.g., asyncpg or ORMs). This episode explores how EdgeDB can eliminate some typical complexity.

Key points and takeaways

EdgeDB as a “Next-Gen” Database EdgeDB is a new project built on top of PostgreSQL but offers a higher-level schema and a more intuitive query language, EdgeQL. Rather than forcing developers to think in terms of tables and rows, EdgeDB presents a graph-like model with strong relational underpinnings. This reduces or even removes the need for an ORM while still benefiting from Postgres's reliability and ecosystem.
- Links and tools:
  - edgedb.com
  - EdgeQL Language Reference
Built Primarily in Python Much of EdgeDB’s implementation is done in Python (with speed-critical parts in Cython) to leverage the language’s readability and extensive ecosystem. The team uses multi-processing architectures to handle concurrent IO in the core server process. By carefully caching queries and splitting out tasks, EdgeDB remains high performance while staying developer-friendly.
- Links and tools:
  - Cython
Postgres Under the Hood While EdgeDB is perceived as a new database, under the covers it relies on a specialized Postgres instance. This provides a stable, well-tested foundation and allows EdgeDB to focus on the higher-level features. Developers thus enjoy Postgres-level consistency, reliability, and tooling, with a modern interface layered on top.
- Links and tools:
  - postgresql.org
Async and Await in Python Yuri Selivanov was instrumental in adding async and await syntax (PEP 492) to Python, significantly simplifying asynchronous code. This design allows Python developers to write high-throughput, low-latency networking code in a style similar to synchronous code. The addition of async context managers (async with) and iteration (async for) enhances code readability and safety.
- Links and tools:
  - Python PEP 492
UVLoop: A High-Performance Event Loop UVLoop is a drop-in replacement for the built-in asyncio event loop that uses libuv (Node.js’s event loop library) under the hood. Installing UVLoop can significantly boost your application’s performance for IO-heavy tasks. You simply import and install it in your Python code to get better concurrency without major refactoring.
- Links and tools:
  - UVLoop on PyPI
asyncpg: PostgreSQL at Top Speed To power EdgeDB’s communication with Postgres, Yuri’s team developed asyncpg, an asynchronous Postgres driver. It’s one of the fastest available drivers across many languages, largely due to its lean architecture, use of binary protocols, and speed-critical portions built in Cython. Even if you’re not using EdgeDB, asyncpg can accelerate your Python + PostgreSQL projects.
- Links and tools:
  - asyncpg on GitHub
EdgeQL vs. SQL In EdgeQL, you navigate relationships in a more natural, nested manner rather than performing multiple JOINs. By design, even deeply nested queries compile to a single SQL statement to reduce round trips and ensure atomicity. Compared to SQL, EdgeQL can drastically improve code clarity, especially for complex, real-world data models.
- Links and tools:
  - EdgeQL Docs
Caching and Compiling for Performance EdgeDB’s compiler caches queries, reducing overhead for repeated operations. It also automatically detects when only constants change in a query so it doesn’t recompile the entire statement. This technique, combined with Postgres’s own query-planning caches, yields performance close to raw SQL in many scenarios.
- Links and tools:
  - EdgeDB architecture overview
Future Enhancements for EdgeDB The conversation covered potential features such as advanced access control, inline Python UDFs (user-defined functions), and extended GraphQL compatibility. The team is actively working on group-by queries in EdgeQL, expanded analytics, and other ways to avoid dropping to raw SQL. This keeps developer velocity high while bridging the gap between a relational database and a “no boilerplate” data interface.
- Links and tools:
  - EdgeDB Roadmap
Open Source + Cloud Service EdgeDB is open-sourced under the Apache 2.0 license, ensuring it remains free to use, self-host, and modify. On the business side, the EdgeDB team plans to offer a managed cloud version of EdgeDB—similar to hosting providers for Postgres or MongoDB—so developers can focus on building their apps without managing infrastructure.

Links and tools:
- GitHub - EdgeDB Repo

Interesting quotes and stories

On the impetus for async and await: Yuri shared that he was exploring a better Pythonic way to do asynchronous transactions and realized a dedicated language-level keyword was necessary.
Why build on Postgres?: “We knew we needed the reliability and core speed of Postgres,” explained Yuri, “but we wanted a more natural data model for developers.”

Key definitions and terms

AsyncIO: Python’s built-in framework for asynchronous programming, allowing single-threaded concurrency by switching tasks whenever one is idle.
Cython: A superset of Python that compiles to C for speed gains, especially important for tight loops and data processing.
ORM (Object-Relational Mapping): A library or framework that maps between objects in code and relational database tables, often leading to complexity that EdgeDB aims to eliminate.
EdgeQL: The high-level query language used by EdgeDB, designed to combine relational rigor with graph-like access patterns.

Learning resources

Python for Absolute Beginners: A thorough introduction to Python’s core concepts if you want to solidify your foundation.
Async Techniques and Examples in Python: Dive deeper into concurrency, threads, and async / await in Python.
Modern APIs with FastAPI and Python: Discover another modern Python web framework fully embracing async principles.

Overall takeaway

EdgeDB represents a significant rethinking of relational databases, blending the proven backbone of PostgreSQL with a new, Python-friendly query language and schema model. Thanks to powerful Python-based tools such as asyncpg and UVLoop, EdgeDB eliminates many ORM pain points, letting developers focus on data rather than glue code. Coupled with the open-source model and upcoming hosted services, EdgeDB has the potential to reshape the Python database landscape for both small teams and large enterprises alike.

Links from the show

Yury Selivanov: @1st1

MagicPython: github.com/MagicStack/MagicPython
uvloop: github.com/MagicStack/uvloop
asyncpg: github.com/MagicStack/asyncpg
TaskGroups and ExceptionGroups: twitter.com
EdgeDB: edgedb.com
Schema modeling: edgedb.com/showcase/data-modeling
Easy EdgeDB book: edgedb.com/easy-edgedb
Roadmap: edgedb.com/roadmap
pgMustard: pgmustard.com
PyBay: Building a Database with Python Talk: youtube.com

Michael's course on async and await + Cython + uvloop: talkpython.fm/async
Michael's PyBay talk: Flask + HTMX: youtube.com
Watch this episode on YouTube: youtube.com
Episode #355 deep-dive: talkpython.fm/355
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #355 deep-dive: talkpython.fm/355

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 What database are you using for your apps these days?

00:02 If you're like most Python people, it's probably Postgres SQL.

00:05 If you roll with no SQL like me, you're probably using MongoDB.

00:09 Maybe you're even using a graph database focused more on relationships.

00:13 But there's a new Python database in town, and as you'll learn during this episode,

00:18 many critical Python libraries have come into existence because of it.

00:22 This database is called EdgeDB.

00:24 EdgeDB is built upon Postgres and implemented mostly in Python.

00:29 It's something of a marriage between traditional relational databases and an ORM.

00:33 Python's async and await keywords, uv loop, the high-performance asyncio event loop,

00:39 and asyncpg all have ties back to the creation of EdgeDB.

00:43 Yuri Selvanov, the co-founder and CEO of EdgeDB, PSF fellow, and Python core developers

00:49 here to tell us all about EdgeDB along with the history of many of these impactful

00:53 language features and packages.

00:55 This is Talk Python To Me, episode 355, recorded February 16th, 2022.

01:14 Welcome to Talk Python To Me, a weekly podcast on Python.

01:17 This is your host, Michael Kennedy.

01:19 Follow me on Twitter where I'm @mkennedy, and keep up with the show and listen to past

01:23 episodes at talkpython.fm.

01:25 And follow the show on Twitter via at Talk Python.

01:28 We've started streaming most of our episodes live on YouTube.

01:32 Subscribe to our YouTube channel over at talkpython.fm/youtube to get notified about upcoming

01:38 shows and be part of that episode.

01:40 This episode is brought to you by Sentry and SignalWire.

01:43 Use Sentry to find out about and fix errors when they happen and build real-time next-generation

01:49 video meeting rooms with SignalWire's API.

01:52 Transcripts for this and all of our episodes are brought to you by Assembly AI.

01:56 Do you need a great automatic speech-to-text API?

01:59 Get human-level accuracy in just a few lines of code.

02:02 Visit talkpython.fm/assemblyai.

02:04 Yuri, welcome to Talk Python To Me.

02:07 Yeah.

02:07 It's great to have you here.

02:08 We just met recently at Pi Bay down there.

02:11 So in honor of that, I wore my Pi Bay shirt today.

02:14 Oh my God, I forgot about that episode.

02:17 I probably should have worn my t-shirt.

02:18 Yeah, where's your Pi Bay shirt?

02:20 Yeah, yeah, yeah, yeah.

02:21 What a cool conference, huh?

02:23 It is.

02:24 I love small conferences.

02:25 I like small conferences.

02:26 And in the time of COVID and all of this madness, having a winter conference outside in California

02:34 at a beautiful food cart area where it's warm.

02:36 Oh, there were just so many things to like about that.

02:39 I got to tell you, it was great.

02:40 It was amazing.

02:40 It was the best day of the year for me, essentially.

02:44 Just being able to talk to people finally and see many friends was amazing.

02:49 We both gave talks there.

02:50 I talked about Flask and HTMX.

02:52 And you spoke about building a database engine, a whole database with Python.

02:57 And that was interesting.

02:59 So then I watched a little more and I just thought, wow, there are a lot of interesting pieces of

03:04 technology in and around this thing you built called Edge TV.

03:08 So I'm super excited to dive into that with you.

03:10 But before we do, let's just hear your story real quick.

03:12 How do you get into programming in Python?

03:14 So my co-founder, Elvis, and I met many years ago, probably 14 years ago or something like that,

03:19 working in a small Canadian company, building big enterprise software for companies like Walmart.

03:25 Back then, we were actually like the system that we were working on was written in PHP.

03:31 And I mean, we pushed PHP to the limits, but we always knew that, hey, when we start our own thing,

03:36 we will be looking for something new and fresh for us to tinker with.

03:40 And we looked around and we just like Python.

03:44 Liked it a lot.

03:46 Great syntax.

03:46 Fantastic.

03:47 Was that Django?

03:48 I mean, that's right around the time of the Django growth or was it something else that brought you in?

03:52 We started with Django.

03:54 We played with it a little, but actually like we just started building our own thing pretty much

04:01 immediately without looking like too deeply at existing frameworks or anything.

04:06 Yeah.

04:06 I get the sense that you and your co-founder are framework builders.

04:09 Yes.

04:10 Yes, we are.

04:11 Somebody asked me, maybe it was Guido, I don't remember anymore.

04:13 What was your first thing that you wrote on Python?

04:17 And I said, the function decorator.

04:18 I could just...

04:20 John Trayden.

04:21 Exactly.

04:22 Awesome.

04:23 So how about now?

04:24 What are you doing these days?

04:25 Working on EdgeDB full-time?

04:27 EdgeDB.

04:28 EdgeDB full-time exclusively.

04:31 Yeah.

04:31 We're building a great company here.

04:33 So that requires 100% of my attention.

04:36 Yeah.

04:36 I bet it does.

04:37 It's...

04:38 You can build a business on the side, but it's a hard time.

04:42 And you have this great article that talks about how you're going to build your favorite new database in a month,

04:48 but that it actually takes 10 years.

04:49 Do you like that, right?

04:51 Yeah, pretty much.

04:52 Yeah.

04:52 It was a long, sometimes painful journey.

04:55 And we didn't realize, like, right off the bat that we will be building a database, right?

05:01 We were building a Python framework.

05:02 And the Python ORM, essentially.

05:04 And granted, that ORM was...

05:06 Right.

05:06 A better way to talk to databases in Python was your idea, right?

05:09 Yeah.

05:10 Yeah, exactly.

05:11 But I guess you didn't have in mind that you would also build the database.

05:14 No idea.

05:16 Yeah.

05:16 Very cool.

05:17 Well, I think what you built is pretty interesting, and people are going to enjoy checking it out.

05:22 But more so, I think what is pretty interesting is there's a lot of things in the Python space that we enjoy and we appreciate,

05:30 especially what I would consider to be the advantages of modern Python.

05:35 I don't know how you feel about it.

05:37 I know you've been deep in this world.

05:38 But to me, it seems like just two or three years ago, people building frameworks, you know, think FastAPI or Pydantic or stuff like that,

05:47 have really embraced the...

05:50 They've taken full advantage of Python 3, right?

05:52 They said, oh, look, we have these typing, we have typing, we have ASIC and away.

05:55 We have all these things that we can bring together.

05:56 And it really feels like that stuff is all starting to come together in a big way.

06:00 Is that over the last couple of years?

06:02 What do you think?

06:02 Yeah.

06:03 I also have this feeling that the ecosystem becomes more and more robust, that people build amazing systems with Python.

06:09 I think that asynchronous IO played a part in it for sure.

06:14 But I think that the other big thing that is happening to Python right now is strict typing.

06:19 mypy and the other similar tools.

06:23 This is what actually allows you to manage your code base at scale.

06:27 And this is just incredibly important.

06:30 So yeah, those two things I would say.

06:33 Absolutely.

06:33 And you talk, we're going to get into it when we get into the architecture and stuff, but you talk about using Cython for making parts of your Python code faster.

06:40 And of course, that relies heavily on typing.

06:43 Because you want to say, here's an int 64.

06:44 Don't turn it to a pi, you know, pi long object pointer.

06:49 We just want an int 64 that works on the stack really quickly, right?

06:52 Yeah, yeah, absolutely.

06:53 I mean, it's an open question.

06:55 Will Python ever enjoy strict typing that the Python interpreter actually takes care of to make things run faster or not?

07:04 But for Cython, it's absolutely critical.

07:06 And actually, I had this, sometimes I had this feeling that writing code in Cython is easier than in Python because, hey, I have a compiler.

07:13 Something mismatches.

07:15 I know it at the compile time, not the run time.

07:17 Yeah.

07:17 I suspect myPy is a little like that as well, right?

07:20 Exactly, exactly.

07:21 So when myPy started happening, because I was experimenting heavily with Cython before myPy became popular, when myPy finally became like this common thing to use.

07:30 Yeah, it was almost a revelation that we finally have this beautiful workflow with Python.

07:34 Well, I want to talk about some of the technologies that are sort of surrounding this larger project that you've been working on.

07:41 So over on GitHub, github.com/magic stack.

07:45 This is for, this is your company.

07:48 And one of the, you know, where sort of EdgeDB and all that is coming out of.

07:53 But there's a lot of interesting things happening here that I think people who see modern Python doing its thing are going to appreciate.

08:02 We talked about the async stuff and so on.

08:05 And so I wanted to kind of dive into some of those first that are sort of orbiting your projects that you all have created here.

08:12 So let's start with Magic Python.

08:14 Tell us what this Magic Python is about.

08:16 So Magic Python is a syntax highlighter.

08:18 It's actually used in VS Code by default.

08:21 So if you use VS Code and you edit Python in VS Code, this is the stuff that VS Code uses under the hood.

08:27 It was used by GitHub for years to highlight all Python code.

08:32 And recently I think GitHub switched to this tree sitter other Python highlighter.

08:37 But yeah, Magic Python was, and I guess is, incredibly popular.

08:42 It was born out of frustration actually because we were big fans of metaprogramming.

08:47 We abused Python a lot in interesting ways.

08:50 And one of the ways to abuse it was to push some meta information to function annotations.

08:55 It was before mypy and before typing.

08:58 So yeah, we just were like adding stuff to those annotations.

09:02 So we quickly discovered that built-in syntax highlighters in TextMate back then.

09:06 Back then I was using TextMate heavily.

09:08 They just couldn't highlight annotations.

09:10 So my goal was to basically, hey, can we create our own syntax highlighter for Python that would just take care of annotations?

09:17 And by the way, highlight all of the newer stuff that is available in Python 3.

09:22 Because back then Python 2 was still the king.

09:25 And Python 3 was kind of barely supported.

09:28 Interesting.

09:28 So a lot of the highlighters and editors and stuff really would highlight kind of based on Python 2 type syntax.

09:35 Exactly.

09:36 Exactly.

09:36 But I guess.

09:37 No, it's 20, whatever this was, in 2015 or something.

09:41 2015.

09:42 Yeah, yeah.

09:42 It was clear to me that Python 3 is the future.

09:44 But yeah, the industry was still kind of moving slowly towards it.

09:48 But the key innovation of Magic Python, and I think this is why I think it's a high quality thing, is unit tests.

09:56 So I'm a big fan of writing tests and having this test-driven development.

10:02 And the first thing after highlighting Hello World in TextMate, first thing for me was to figure out, can I actually build a unit test engine?

10:12 Because if you think of those syntax highlighters, it's essentially, it's a huge reg exp.

10:17 It's just mind-bogglingly, like huge, huge, huge reg exp.

10:21 And writing reg exp is hard.

10:23 Yeah.

10:25 But modifying them is much harder.

10:28 I was thinking about that.

10:28 Well, I was thinking down the road, you have a really interesting query syntax that's pretty rich and powerful for EdgeDB.

10:36 Yeah.

10:37 Did your experience writing Magic Python give you the ability to go like, oh, yeah, we can write this thing that parses this insane sort of complex language?

10:46 How much did this play into your ability to go beyond SQL?

10:50 I wouldn't say much.

10:51 I mean, we have syntax highlighters for our schema files and the EdgeQL.

10:55 They're pretty basic right now.

10:56 We just highlight keywords and literals.

10:59 We have some interesting plans about that.

11:01 And we can talk about that later.

11:02 I guess we'll be talking about it to be like implementing language server protocol for EdgeQL.

11:07 But the highlighter itself is pretty simple.

11:10 But I used this unit testing framework in those highlighters.

11:15 And this is what gives me peace of mind.

11:17 I know that EdgeQL highlight is just working when I'm adding like a new operator or a new keyword.

11:22 I don't have to just test it manually on some big file.

11:27 Yeah, absolutely.

11:28 Sort of speaking to that thing that I talked about, a lot of interesting stuff coming out of your work.

11:31 Adrian out there says, didn't know you also made HTTP tools as well.

11:35 Indeed, yeah, there's a lot of cool stuff that you've done.

11:38 So final thing on Magic Python.

11:40 Can I use it for other purposes than just VS Code, Sublime, and Atom?

11:44 Like if I wanted to build my own thing that, you know, printed out like terminal stuff or like even some other kind of UI app.

11:52 Could I use this more generally than the editors?

11:54 I haven't tried it myself.

11:56 But given that GitHub was using it to highlight the code, I believe that there must be some libraries and packages that just can consume this text-made inspired syntax and just highlight, I don't know, stuff you print into terminal.

12:10 I see.

12:10 So it comes out as text-made and then it just happens to these three editors with their...

12:14 Yeah, I think...

12:15 Sort of common heritage, understand that.

12:16 Yeah, yeah.

12:17 I think text-made started the revolution originally, then Sublime Text just inherited the format and then VS Code just decided, hey, we should just use it.

12:24 Yeah.

12:25 Cool.

12:26 Yeah, very cool.

12:27 All right.

12:28 So when you spoke about your journey towards creating this product in this business, you talked about how central having asynchronous IO and server work is going to be.

12:41 And of course that is true, right?

12:43 Not all databases, but most databases are able to be a point of extreme concurrency to the point that they can handle the processing, right?

12:52 So if you've got a web app, you can scale your web app out.

12:55 And if it's got two connections or 200 connections to the database, generally, that's fine.

13:00 The database is meant to sort of scale that vertically, I guess.

13:04 So you really talked about, well, if you're going to do this in Python, that probably means leveraging asyncio pretty strongly, right?

13:10 Yeah, yeah.

13:10 It was pretty clear that we need asynchronous IO.

13:13 As you said, databases have to handle lots of connections.

13:16 And also it's important to understand that most databases like Postgres, for example, the cost of establishing a new connection is pretty high.

13:23 So we wanted Edge.

13:26 And I mean, there are tools to mitigate that.

13:28 Like PG Bouncer, for example.

13:29 It's like middleware you put in front of PostgresQL to make connections cheaper.

13:33 And we just didn't want to have any of such tools as a requirement for Edge.

13:38 We just wanted it to work like natively out of the box without any configuration.

13:42 So yeah, we had to have cheap connections in terms of like how fast you can connect.

13:47 And also, I mean, if your connection is just hanging out there, we wanted to allow that essentially.

13:52 So we had to have a way to handle thousands, maybe hundreds of thousands, just concurrent connections that maybe are not super active, but just, I mean, open.

14:02 And asynchronous core is the only way how you would be able to do this.

14:07 Like not even like even if Python didn't have GIL, for example, you would still use asynchronous IO to tackle this problem.

14:14 This portion of Talk Python To Me is brought to you by Sentry.

14:19 How would you like to remove a little stress from your life?

14:22 Do you worry that users may be encountering errors, slowdowns or crashes with your app right now?

14:28 Would you even know it until they sent you that support email?

14:31 How much better would it be to have the error or performance details immediately sent to you, including the call stack and values of local variables and the active user recorded in the report?

14:42 With Sentry, this is not only possible, it's simple.

14:45 In fact, we use Sentry on all the Talk Python web properties.

14:49 We've actually fixed a bug triggered by a user and had the upgrade ready to roll out as we got the support email.

14:55 That was a great email to write back.

14:57 Hey, we already saw your error and have already rolled out the fix.

15:01 Imagine their surprise.

15:02 Surprise and delight your users.

15:04 Create your Sentry account at talkpython.fm/sentry.

15:09 And if you sign up with the code talkpython, all one word, it's good for two free months of Sentry's business plan, which will give you up to 20 times as many monthly events as well as other features.

15:20 Create better software.

15:22 Delight your users and support the podcast.

15:24 Visit talkpython.fm/sentry and use the coupon code talkpython.

15:32 There's an overhead for threads and the context switching between the OS trying to figure out if that thread still needs to do stuff.

15:38 Yeah, you can't have hundreds of thousands of threads and be in a good place.

15:41 Yeah, but my concern wasn't even that.

15:44 Maybe we would be smart and implement some sort of M2 and scheduling or something like that.

15:49 I don't know.

15:50 It's just I don't believe that humans are good at writing threaded code.

15:54 Async await gives you this luxury of essentially seeing where you can actually give up control.

16:02 of the current code when it can await things and potentially switch the context, right?

16:06 So you can be smart about locking access to shared resources and things like that.

16:10 With threads, it's way, way harder.

16:13 Maybe with Rust, it's easier because, I mean, there is some compile time magic that can help you.

16:19 But with pretty much every other language, thread-based programming is very hard.

16:25 It is hard.

16:26 Well, I suspect many people, but not everyone out there listening knows that when you use the asyncio tasks and so on, at least by default, they run on a single thread.

16:38 There's not actual threading happening.

16:40 When you use threads or multiprocessing, you can get that true concurrency.

16:43 But this is different.

16:45 It's not really threads.

16:46 Yeah, it's different.

16:47 Basically, the idea for async await is to use it for IO bound code.

16:53 So if your code is doing like lots of IO, pushing data from multiple connections here and there, this is an ideal thing.

16:59 But if you're computing something like, I don't know, doing something, scientific computation or just use blocking IO or disk IO, it's best to offset that computation into a separate process.

17:12 But yeah, if you just want to handle a lot of IO in Python concurrently, asyncio is the way.

17:18 Yeah, the way that I like to think of asyncio and async and await is what you're scaling is you're scaling the waiting.

17:25 If you're waiting on anything, if I'm waiting on a database or waiting in the database version for the client to talk to me or not to talk to me, then you can basically take that period where you'd be waiting and turn that into predictive computational time.

17:38 I love it.

17:39 I think we should put this like straight into the ducks.

17:41 I've never thought about this week.

17:44 Because people tell me, I'll see these benchmarks and stuff said, oh, well, I did this thing where I overwhelmed the database and then it didn't go very fast when I did asyncio.

17:53 It's like, well, because there's no period in which you're waiting.

17:56 Like you're like constraining the resource beyond what it can take.

18:00 But if there was some sort of, oh, I'm waiting for this thing to get back to me.

18:03 Well, then all of a sudden there's your performance.

18:05 Okay.

18:06 So when I saw this come out, I was super excited.

18:09 I think this was 3.4.

18:10 My history reminds me correctly when this came out.

18:14 Is that, do you remember?

18:15 I think it was around 3.4 because the most important prerequisite.

18:19 I think so.

18:20 Because 3.5.

18:21 Yeah, go ahead.

18:21 Sorry.

18:22 Yeah, the most important prerequisite for asyncio to happen was actually the yield from syntax.

18:26 Probably not a lot of people remember about it.

18:28 But back then, asyncio required this like add-corting decorator and you would use yield from instead of await in your code.

18:35 So that PAP, so basically Python 3.3, I think, was a moratorium on modifying Python language.

18:42 So we had to wait for Python 3.4 to add yield from and then asyncio happened.

18:46 We made it happen.

18:47 Right.

18:47 That enabled it.

18:48 But when I remember when it came out, I was like super excited about this.

18:51 And I said like, oh, this is a harsh programming model.

18:54 This is really like direct and juggling those sorts of things.

18:59 And I had experience with C#, which had async and await keywords as well.

19:05 I'm like, gosh, I wish this language had async and await.

19:07 And then I didn't know you then, but I thank you because you authored PEP492 coroutines with async and await.

19:15 Basically, we have async and await in Python because of you, right?

19:18 Well, somewhat.

19:19 Yes.

19:19 I don't want to give you too much credit.

19:21 I did.

19:21 You created the PEP that said like, let's stop using yield from and continue and all these other things that you do.

19:28 I can tell you the entire backstory.

19:30 It's relatively short.

19:31 Yeah.

19:32 So basically we were trying to figure out like the future API for HDB, Python client back then when HDB wasn't doing a thing.

19:41 And we knew that we want to support asyncio in our future client.

19:47 But how do you actually have like a migration block?

19:49 Like you would have to say like try, finally accept, back, commit.

19:53 It's like a lot of code.

19:54 And we have context managers in Python, right?

19:56 So with the context manager, you would just say with transaction and just do all this magic behind the scenes.

20:02 But we didn't have an asynchronous version of with.

20:05 We had yield from.

20:07 But how do you kind of mush together yield from and with wasn't clear.

20:12 So I thought, hey, if we have like async keyword, we could have async with.

20:16 Then it was natural case.

20:18 We just should replace yield from with await because it was also familiar with C#.

20:22 And I also liked the short and neat syntax of async await.

20:26 And then the next thought was that, hey, what if you have a cursor to the database and just want to iterate over the rows and make it like prefetch those rows.

20:34 And this is how async four was born.

20:36 And then in about a couple of weeks, language summit happened.

20:39 I think it was in Montreal.

20:40 That was back on US in Montreal.

20:43 And I met with Guido.

20:45 I showed him like rough sketch and he said, yeah, let's do it.

20:48 I think I implemented the first prototype of this thing in the interpreter over a couple of nights.

20:54 Like I just called it like straight 48 hours.

20:57 I wanted to impress Guido.

20:59 And yeah, I just had this like rough implementation.

21:03 And then just over the course of like months and a half, I was refining it and writing this path.

21:08 And this is how it happened.

21:09 I think it all happened because of Guido.

21:11 Because first of all, he saw like clearly like this is an improvement to Yulthrom, like a big improvement.

21:16 It's a huge improvement.

21:17 It makes it incredibly approachable.

21:19 It's like you do what you normally do, but sometimes you might have to put the word await there.

21:22 Exactly.

21:23 But your mental model isn't about callbacks and weird stuff like that.

21:27 It's just like you write the regular code, but you sometimes need to await a thing.

21:31 And it's beautiful.

21:32 Yeah, exactly.

21:33 Exactly.

21:33 And yeah, I'm grateful to Guido because first of all, he recognized this thing and encouraged me.

21:38 And second of all, he actually like inspired lots and lots of refinements in this proposal.

21:43 And I was like working with him essentially all this time.

21:46 Like a discussion happened in Python, in Python dev.

21:49 And sometimes he and I exchanged emails and he proposed some ideas and I would just tweak the path.

21:53 Yeah, Guido was actually also behind this proposal to a big extent.

21:57 There's some mind-blowing stuff here, like the async with, for example, as you point out, right?

22:02 These are wild ideas, right?

22:04 Instead of just calling, just saying here, there can be a function.

22:06 You have async for, async with.

22:08 There's really neat things in here.

22:10 Actually, yeah, I think that still async with is pretty unique.

22:15 Like JavaScript, for example, is lucky because they have this nice syntax for declaring anonymous functions, right?

22:21 And so you can just say await transaction and pass a function.

22:26 And it's like a multi-line function.

22:28 You can do whatever you need.

22:29 You don't actually have to have something like async with in TypeScript or JavaScript.

22:34 But in many other languages, you would need something like this.

22:38 And pretty much, I think, we pioneered this idea in Python.

22:43 I think I saw a proposal to make using async in C#, but I'm not actively engaged with C# community.

22:49 So I'm not.

22:50 Maybe it was implemented.

22:51 Maybe not.

22:51 I don't know.

22:52 That would be the parallel, but I'm also not tracking it.

22:55 Yeah.

22:55 Okay.

22:56 This is really cool.

22:57 So awesome, awesome work on this PEP and getting this language.

23:00 Thank you.

23:01 Thank you.

23:01 Yeah.

23:02 So let's talk two more async I think real quick here before we get to EdgeDB.

23:05 Or actually, three.

23:08 One jumped the list just yesterday.

23:10 Absolutely.

23:11 Okay.

23:11 So when you're doing asyncio, there's this background event loop that looks at all the things that could be done and says, are any of them waiting?

23:20 Can we take that while it's waiting and put it aside and go do something else, right?

23:25 That way I scale the waiting story.

23:27 And there's an implementation for that in CPython.

23:29 But you all decided, you and Elvis, your co-founder, decided it would be nice if there was a faster, more optimized version of that part that does the checking and execution.

23:41 So you created this thing called uv loop, an ultra fast asyncio event loop.

23:45 It's incredibly easy to use, right?

23:47 Like to install it.

23:47 Two lines.

23:48 It's two lines, right?

23:50 You import and then you run install and you're good to go, which is fantastic.

23:55 Tell people about uv loop and how broadly, should this just be standard stuff we do in all of our code that uses async and await?

24:03 That's an interesting question.

24:04 Okay, let's jump in.

24:05 So uv loop wasn't the first thing that I created.

24:08 The first thing was actually HTTP tools.

24:11 Someone asked you about it like a few minutes ago.

24:13 So I just wanted to experiment with Cython.

24:16 I discovered Cython.

24:18 I thought, hey, this might actually be a useful tool and allow us to speed up Python a lot for some things like parsing HTTP, for example.

24:26 Right.

24:26 The tight loops.

24:27 Yeah.

24:28 Exactly.

24:28 So I look at Node.js and they used a C HTTP parser.

24:33 I think that parser was actually extracted from Nginx.

24:36 And yeah, just wrapped it in HTTP tools.

24:40 Just literally like 100 lines of code, maybe even less, maybe 50.

24:43 Just like a small wrapper over the C library.

24:47 And it worked.

24:48 And it worked great.

24:49 Then I, oh my God, I now have this superpower.

24:52 I can quickly.

24:54 What other things can I grab and wrap up and put it into PHP?

24:56 Exactly.

24:56 Because, I mean, you could do the same, but just like using Python C API, but you would end up writing like 3x, maybe 5x amount of code.

25:04 And using Cython just feels like magic.

25:06 So, yeah, it worked.

25:08 And then I was like, hmm.

25:10 Interesting.

25:11 Interesting.

25:11 So there is this libuv library that actually powers Node.js and it's cross-platform.

25:16 And it's super fast and Node.js is fast.

25:19 Maybe, just maybe, I can do the same.

25:23 I can just wrap it into Python and make a drop-in replacement for it eventually.

25:27 So I prototyped something relatively quickly.

25:29 Maybe in a few days.

25:31 Basically, I just implemented like a loop object and call soon.

25:34 Like, basically the staple of ACKIO.

25:37 The most basic thing.

25:39 And it worked.

25:40 It worked just fine.

25:41 I was able to implement call later.

25:42 And then I was able to run a coroutine like, await sleep one, print hello world.

25:47 And it worked.

25:48 And then I just, over the course of next several months, I think three, maybe four, maybe five months.

25:54 I was gradually implementing AsyncIO API.

25:57 I swearing a lot because I discovered that this API surface is just huge.

26:01 AsyncIO event loop is just, it's an enormous API actually.

26:04 And yeah, then we, we posted benchmarks and I think it went somewhat viral.

26:11 It was in HN.

26:13 I think it was like post number one on HN for a long time.

26:17 Yeah.

26:17 I think Brian and I covered it over on the Python Bytes podcast when it came out because it was, it was big news.

26:22 Yeah.

26:23 Yeah.

26:23 People are excited specifically because basically we showed that you can write some Python code, like a simple protocol parser, and it would be almost as fast as Go.

26:33 And sometimes it's on faster than the Node.js, which was surprising.

26:38 So yeah, I think a lot of people were excited about it.

26:41 Yeah, that's fantastic.

26:41 So the quick takeaway here is uv loop makes AsyncIO two to four times faster.

26:47 You've got some benchmarks for different situations and amount of data and so on with regard to sockets.

26:52 So let's wrap this one up with, is that a universal statement that you would recommend there?

26:57 uv loop.install?

26:59 It depends.

27:00 I think for production, it makes a lot of sense to use uv loop or you should try it because I mean, there are still some minor incompatibilities in uv loop.

27:07 That are really hard to track.

27:09 Maybe there is some behavior difference, or maybe there is a box simply using something that a lot of people are not using with uv loop.

27:15 And it's still a possibility.

27:18 So yeah, use it, use it with care in production.

27:21 In local development, I don't think you need it.

27:23 Like vanilla AsyncIO should be, should be plenty.

27:25 There is one more interesting thing about uv loop.

27:28 It's a package.

27:29 It's a package on PyPI.

27:31 So if we find the bug, we fix it and we publish a package.

27:35 You don't have to wait until Python 3.11.7 to get your bugs fixed.

27:40 Or improvements made.

27:41 Yeah.

27:41 Or improvements made.

27:42 Exactly.

27:43 So this kind of suggests that it's a great idea to use uv loop.

27:47 But on the other hand, we really haven't had like any emergency releases or anything in a long time.

27:52 We basically release almost like every year just to catch up with the latest Python version.

27:57 I would say that uv loop is pretty stable at this point.

28:00 Yeah.

28:01 Very cool.

28:01 Yeah.

28:02 It definitely seems neat.

28:03 I think also it's probably a context of when does it make sense, right?

28:06 If you're running three tasks and that's your whole program, who cares how fast the event loop is?

28:12 All right.

28:12 It's three tasks.

28:13 But if you have many, many fine grain, tons of little tasks and there's lots of, like how complex and how many tasks,

28:20 like basically how complex is the task coordination job of asyncio, right?

28:24 The more complicated it is, probably the better benefit you'll get from uv loop.

28:28 What do you think?

28:28 If you go deep in the details, I would say it's not so much about juggling tasks around.

28:34 It is more about performing IO in the most optimal way.

28:38 Okay.

28:38 And LibUV is just because it's so low level.

28:41 It just uses lots and lots of tricks under the hood to just do IO faster.

28:46 And the entire loop of like calling callbacks in the loop is just, it's a tight loop in C essentially.

28:52 So it's much faster than a loop in Python.

28:55 So that actually, yeah, those two points.

28:57 But yeah, the performance improvement is noticeable usually.

29:01 Very noticeable with the uv loop.

29:03 Cool.

29:03 The benefit is if it's literally import uv loop, uv loop dot install.

29:07 Yeah.

29:07 Run your benchmarks.

29:09 Comment that line out.

29:10 Run your benchmarks again.

29:11 Exactly.

29:12 Exactly.

29:12 It's so easy to, you don't have to commit to it.

29:14 It's not like, oh, we're going to swap ORMs and try it again.

29:17 Exactly.

29:17 Yeah.

29:17 But I just love packages in Python that do this magic.

29:21 Like if you remember, there was this package called psycho created by Armin Grigo, creator

29:26 of PyPy.

29:27 You just import psycho, psycho install or something like that.

29:30 And boom, you have like an alternative CPython eval loop.

29:34 Your program just magically becomes five, ten times faster.

29:37 It's just magic.

29:38 So yeah, it's great when we can do something like this.

29:41 Yeah, that's fantastic.

29:42 Adrian has an interesting question.

29:44 I know this came up around requests a couple years ago.

29:47 He asks, could you give your thoughts on having things as part of the standard library?

29:51 Basically having uv loop in this case be part, you know, be the replacement for asyncio loop

29:58 rather than having an external package updated independently.

30:01 Yeah, it's an interesting question.

30:02 And I'm not super involved in conversations like this.

30:04 I know that Python core developers consider it actually separating standard library

30:08 and shipping of them aside so that it can have like its own release schedule.

30:12 I think it's sort of mitigated with Lukasz Lang actually speeding up the release cycle for

30:17 Python now.

30:18 It's being released like every year, which is amazing.

30:21 And I think the pressure is lower now to separating standard library.

30:24 As far as including uv loop as part of standard library, I'm not sure it's a good idea.

30:29 First of all, it's entirely in Cython.

30:32 It's like 50,000 lines of Cython or something like that.

30:34 We will have to either adopt Cython as like an official standard library tool or rewrite it in C.

30:41 And if you rewrite it in C, it's going to be 100,000 lines in C or something like that.

30:45 It will be huge.

30:46 So probably not going to happen anytime soon.

30:49 Maybe with things like mypyC, we can make it happen eventually.

30:54 Yeah, that's interesting.

30:55 But mypyC is still pretty early.

30:56 Right.

30:56 Okay.

30:57 Yeah, the conversation was had around that with regard to requests as well.

31:02 Maybe you're even part of it since you're a core developer.

31:05 But they decided not to make requests the new HTTP library of CPython because it would hobble requests.

31:13 Like it would mean requests could only be changed, you know, once every 12 months or something like that.

31:19 Right?

31:19 Yeah, I think one of the concerns with requests specifically, and I wasn't actively involved in those conversations at all.

31:25 But I think the concern that I heard was that HTTP is pretty wild and you often need to fix some security issues and bugs and you need to act quickly.

31:36 And if something as huge as requests and so fundamental as requests was part of standard library, we would just have to be like way more flexible about making bug releases for CPython.

31:49 But Python is just, it's such a huge thing, right?

31:53 Like operating systems bundle it, like multiple different workflows are centered around it.

31:59 It's just...

32:00 It runs on helicopters in Mars.

32:02 I mean, come on.

32:03 Exactly.

32:03 There's a lot of edge cases.

32:05 People are not thinking about it.

32:06 Exactly.

32:06 Just upgrading a separate library is so much easier than upgrading the entire Python thing.

32:11 So, yeah, I think this is why packages like requests for sure will stay out of standard library.

32:17 Yeah.

32:17 All right.

32:18 Final question before we move on from uv loop, because it's not even our main topic, but it is very interesting.

32:23 Teddy asks, are there any trade-offs of using uv loop as opposed to the native built-in one?

32:30 I think this is time for me to make a shout out because we still haven't implemented a couple of APIs that are in AsyncIO, like API balls protocol.

32:41 Maybe there is something else.

32:43 I just haven't got time to do it myself.

32:45 We are busy with SGB.

32:47 So if anyone wants to join the project and help, that would be great.

32:50 And that basically answers the question.

32:52 The fundamental APIs are already all there.

32:55 It's almost 100% compatible with uv loop.

32:58 No...

32:58 With vanilla AsyncIO.

33:00 No trade-offs except there are a couple of relatively new APIs, I think path 3.9 and path 3.10 that are still missing from uv loop and that we still should implement them.

33:11 Yeah.

33:11 To be a true replacement, right?

33:13 Yeah.

33:13 I think it's sent file and API balls and maybe something else.

33:17 Okay.

33:17 uv loop is running inside EdgeDB?

33:20 Yeah.

33:21 It powers the IO server.

33:24 Basically we use multi-processing architecture in EdgeDB.

33:28 We have a pool of compiler processes because this is like computation CPU heavy thing to compile your query.

33:36 And then there is a core IO process that just runs uv loop and quickly, quickly, quickly goes through your connections and pushing data between clients, posts, etc.

33:48 This portion of Talk Python To Me is brought to you by SignalWire.

33:51 Let's kick this off with a question.

33:52 Do you need to add multi-party video calls to your website or app?

33:56 I'm talking about live video conference rooms that host 500 active participants, run in the browser and work within your existing stack, and even support 1080p without devouring the bandwidth and CPU on your users' devices.

34:09 SignalWire offers the APIs, the SDKs, and Edge networks around the world for building the realest of real-time voice and video communication apps,

34:17 apps with less than 50 milliseconds of latency.

34:20 Their core products use WebSockets to deliver 300% lower latency than APIs built on REST, making them ideal for apps where every millisecond of responsiveness makes a difference.

34:30 Now, you may wonder how they get 500 active participants in a browser-based app.

34:34 Most current approaches use a limited but more economical approach called SFU, or Selective Forwarding Units,

34:40 which leaves the work of mixing and decoding all those video and audio streams of every participant to each user's device.

34:46 Browser-based apps built on SFU struggle to support more than 20 interactive participants.

34:52 So SignalWire mixes all the video and audio feeds on the server and distributes a single unified stream back to every participant.

34:59 So you can build things like live streaming fitness studios where instructors demonstrate every move from multiple angles,

35:05 or even live shopping apps that highlight the charisma of the presenter and the charisma of the products they're pitching at the same time.

35:12 SignalWire comes from the team behind FreeSwitch, the open-source telecom infrastructure toolkit used by Amazon, Zoom, and tens of thousands of more to build mass-scale telecom products.

35:22 So sign up for your free account at talkpython.fm/signalwire, and be sure to mention Talk Python To Me to receive an extra 5,000 video minutes.

35:31 That's talkpython.fm/signalwire and mention Talk Python To Me for all those credits.

35:38 So another thing that came out just today, I know this is, I don't want to spend too much time on it, but there's a big new feature for tasks and async.io in Python 3.11 coming very soon.

35:51 And you just gave a shout out on Twitter yesterday saying that task groups is coming to async.io.

36:00 This is a way, because right now if you start two tasks, there's no way to say, well, if this one fails, don't even bother running that one, right?

36:07 They're fully independent.

36:08 This is a way to sort of create a dependency and control them as a set, right?

36:12 Tell us real quick about this.

36:14 I have an API for spawning tasks concurrently.

36:17 It's called async.io gatherer, but it's just a suboptimal API in many ways.

36:22 And this API is like way superior.

36:25 We have to credit Nathaniel J. Smith for his work on Trio and Trio Nursery specifically.

36:31 And Trio is, I mean, we can run an entirely different podcast episode just about Trio and async.

36:36 I actually had Nathaniel on, yeah, we talked about Trio on the show quite a while ago when it was fairly new.

36:41 It's an amazing thing and there are lots and lots of great ideas in Trio.

36:44 One of them is having this thing, it's called nursery in Trio and the async and the task groups,

36:53 async.io task groups, essentially, they just replicate this nursery idea.

36:56 They port it from Trio to async.io.

36:59 Like the bigger points about how this API works are all similar to Trio.

37:04 There are some details about how constellation works, et cetera, but most people probably won't really care about that one.

37:10 Yeah.

37:11 Okay.

37:12 Very cool.

37:13 It's great to see more innovation.

37:14 Yeah.

37:15 And the async.io.

37:16 Yeah.

37:17 But task groups, I'll just talk for a couple more minutes about task groups.

37:19 So task groups was like a more requested thing.

37:22 A lot of people wanted task groups in async.io.

37:25 And I was DMed like, sometimes I was DMed like on a daily basis entirely.

37:29 Like we promised us task groups.

37:31 When can we have our task groups?

37:32 So the big like elephant in the room with task groups is how do we handle exceptions?

37:37 Because multiple things can fail at the same time and the essential will propagate out of this async with task group.

37:43 You'll end up with a hierarchical tree of exceptions.

37:47 Exactly.

37:48 And we just had to figure it out.

37:49 And we had to figure it out in the core because if it was just some, I don't know, some exception class defined in async.io, then what would happen when you, when your async.io program crashed?

38:02 Right.

38:03 You wouldn't have like a correct state, a trace back in your terminal.

38:06 You wouldn't be able to understand what actually happened.

38:09 So we had to integrate this into trace backs and like a debug.

38:13 We needed to make sure that it's like a standard thing that tools like Sentry, for example, can take advantage of it and provide you like great visibility into what happens in your async application.

38:23 So we had to work in this exception group thing.

38:26 And there's this amazing new core developer, you read CatRiel and she spearheaded this effort of just implementing this and drafting a proposal and just doing it to completion essentially.

38:42 And it's because of her work, actually, task groups are finally a thing because task groups themselves, it's like 100 lines of code.

38:49 There is with comments, there is not much to them.

38:52 The huge thing is getting exception groups in.

38:55 And I believe Python is the first language that has this feature, like right in the syntax, right in the runtime model.

39:03 And this is also huge because I actually believe that Python now can be like one of the best languages to do concurrent programming.

39:10 And I don't know, maybe when we have JIT or something like that, it might actually match a goal in performance somewhat.

39:16 Yeah.

39:17 Would be ideal thing.

39:18 So while I'm looking at this syntax here, which I'll try to quickly simply communicate to people listening on audio.

39:24 It's an async with block.

39:25 So what you do is you'd say async with asyncio dot task group, and you create this task group and then you can create tasks within there that are all grouped together.

39:35 And then you also can do things like await stuff while you're in there.

39:39 It looks to me like one of the things that often I don't see possible in Python's async.

39:46 Previously, it's the ability to just fire off a task and have it sort of just run in the background to completion.

39:55 So you don't have to do like async or like run all or gather or any of those types of things.

40:00 Basically, the width block, we won't exit the width block until all the tasks are finished or until it fails.

40:07 One of those two, right?

40:08 Yeah.

40:09 That's a cool feature of it alone to just kind of say like, I don't need to kind of store up all the tasks and then make sure I'm waiting on them forever.

40:15 Like I can just kick them off and then like if they happen to start in this place, then they're going to finish when this width block finishes.

40:22 I'm even more excited about this than I was before.

40:24 Right, right, right.

40:25 It's a nice API to compose things in asyncio.

40:28 And I believe it's one of the bigger deals in the recent years.

40:34 So I'm super excited about this.

40:35 3.11 is out soon.

40:37 Exactly sure the release date.

40:38 I know it's in alpha stuff right now.

40:40 So it's getting real near.

40:41 Yeah, yeah, it should be should be close.

40:43 Yeah, for sure.

40:44 Awesome.

40:45 All right.

40:46 Last thing before we get to EdgeDB proper people.

40:49 I would say that Postgres is the most popular database for Python people doing database things, possibly with the exception of SQLite.

40:56 But that really counts for just like, oh, I'm doing testing or oh, I use this for this incredibly small, but like production level stuff.

41:03 Got to be Postgres, right?

41:04 Yeah, yeah.

41:05 It's fair to say.

41:06 Yeah, maybe throw in some MySQL and then like a little bit down, maybe some MongoDB, something like that.

41:11 But like, clearly, it seems like Postgres has a lot of interest for folks.

41:16 If you want to talk to it through async and await, which is exactly how you want to scale your database stuff.

41:24 A pretty popular library is this one called asyncpg, right?

41:27 Yep.

41:28 Yep.

41:29 Yeah, you and Elvis created that, huh?

41:30 Yeah.

41:31 So yeah, it was an interesting experience.

41:33 Basically, we knew that the EdgeDB will be based on Postgres.

41:37 That was clear.

41:38 They won.

41:39 And we also knew that we have to have like this very high performance bridge, essentially, between Python and Postgres.

41:48 And it had to be asynchronous.

41:50 So there was no good asynchronous client for Postgres back at the time.

41:56 And we couldn't just use PsycAPG, the most popular Postgres driver, because it uses text encoding for data.

42:03 Maybe not anymore, but it used to have text encoding.

42:08 And we actually had to use binary for something.

42:11 So we just knew that, okay, we have to just jump in and explore Postgres protocol.

42:15 And we decided, okay, let's write the driver.

42:18 Yeah, this is how Async PG was born.

42:20 And I think what makes Async PG different, just besides that it implements binary protocol and it's asynchronous, it's API.

42:27 Because we were not basing it on the common Python DB API.

42:32 We basically designed an API to be as low level as possible, as close to Postgres semantics as possible.

42:39 So in DB API, there is this thing called cursor, which has nothing to do with the actual database cursors.

42:44 So we didn't want to replicate that.

42:46 So yeah, we just built like what we thought were proper primitives, working with Postgres as efficiently as possible.

42:53 We used binary protocol plus Async I/O.

42:57 And we of course use Cython to speed up like all the bottlenecks in it.

43:01 It's pretty much entirely in Cython actually.

43:03 And yeah, the result is just amazing to this day.

43:07 Async PG is like one of the fastest Postgres clients on the planet across all languages.

43:13 That's fascinating.

43:14 You can see that it beats Node.js and go pretty handily there.

43:18 Yeah, we should probably update this chart actually.

43:20 I'm pretty sure that they updated PG library for Node.js.

43:23 So this is outdated.

43:24 I think it's closer in performance to Async PG, but I think Async PG is still the fastest.

43:30 Yeah.

43:31 Cool.

43:32 Awesome.

43:33 So taking all these together, uv loop, Async and await in the language, Async PG, all of these are building up your skills to sort of almost build a database.

43:43 And so then you went on and actually did build a database, right?

43:46 Yeah, pretty much.

43:47 So we had this framework, which was like almost an arm in Python for many years.

43:51 And we built multiple different production applications with that.

43:54 We shipped applications that were deployed to GE, Cisco companies like that.

43:58 And we knew it's something interesting, but we also knew that it has to be bigger than just a Python or like it has to be a database.

44:07 It's a surprisingly long road to make something to go this path essentially, because you have to define a query link.

44:15 You have to define type system, you have to define standard library, you have to define protocols, how it works, how migrations work, all the different syntaxes for schema modeling.

44:24 It's a huge thing.

44:25 And yeah, with like all the right primitives in Python itself, we knew that we can start like morphing our code base into like this separate service essentially.

44:36 And yeah, that was the necessary and required groundwork to make HDB happen.

44:41 Without it, we would probably not succeed.

44:44 Cool.

44:45 So HDB really written in Python?

44:47 It is.

44:48 Mostly.

44:49 Yeah.

44:50 Mostly.

44:51 Entire like IO service server is essentially a Cython thing.

44:54 So it's in C and this is why if you look at benchmarks of HDB, it's actually pretty close to Postgres, to vanilla Postgres.

45:02 Like the overhead of HDB is super low.

45:04 That is only possible because of Cython and like all low level tips and tricks that we learned when we were working on uv loop and the asyncPG.

45:13 So we really optimize it a lot.

45:15 Yeah.

45:16 The compiler part, the thing that actually takes an HQL query and compiles it to SQL, that thing is pure Python and that runs in a separate process.

45:24 But we do some also tricks to make it fast.

45:27 Like we cache things aggressively.

45:29 I mean, in most applications, you don't have thousands of queries.

45:32 You only have like 10, 50, 100.

45:35 Yeah.

45:36 So they get cached pretty quickly and then you don't even run Python anymore.

45:40 From that point on, it's just C.

45:41 Oh, interesting.

45:42 Yeah.

45:42 You don't need to incredibly optimize the understanding of the query because like you said, it's not ad hoc stuff happening that happens at scale.

45:52 Exactly.

45:53 Exactly.

45:54 I mean, it's great when your compiler is exceptionally fast, but for a database and especially if it's smart around extracting constants, let's say you send a select one and then your next query is select two.

46:06 Essentially it's the same query.

46:07 Substitute the same query, just different constants.

46:09 So if you extract it and you cache the file query as if this wasn't the constant, but the argument the query, then yeah, you don't need to compile it for the second time.

46:17 So yeah.

46:18 Yeah.

46:19 I don't know Postgres super, super well, but I know some databases, they at their level, when they see a query, they're like, oh, I've seen this query before.

46:28 They can cache the query plan and those types of, so that's like another level of performance and speed up as well.

46:34 Right.

46:35 We do that as well.

46:36 I mean, we did it even in async Pg, for example, async Pg automatically prepare statements for you to enjoy this optimization.

46:43 So that Postgres doesn't have to reparse or SQL query can just execute the precast plan.

46:48 We do the same in HDB and many other things.

46:51 This is why HDB is kind of, it's based on Postgres, but it fully envelops Postgres because we want to be in like full control on the underlying Postgres instance.

47:00 Right.

47:01 So in some sense, this is a brand new database that's got some really cool features that I'm going to ask you about.

47:05 very soon.

47:06 Absolutely.

47:06 But in the other sense, it's got a lot of stability because it's kind of a database level API rethinking of a well known core that people already trust.

47:15 This is an interesting thing.

47:16 Actually, a lot of people are not 100% satisfied with relational databases for a variety of reasons.

47:22 Somebody.

47:23 Yeah.

47:24 Somebody is not satisfied with scaling.

47:28 Some are not satisfied with SQL and some not satisfied with migrations and how rigid the schema is and how inconvenient it is to deal with the relational database.

47:38 And so it's a huge problem.

47:40 You have a part of it, which is just language design and by standard library and type system, how that part works.

47:46 The second is workflows around your database.

47:48 The third is the engine of your database, like how it actually works.

47:52 EdgeDB wants to challenge everything, but we also not dumb enough to challenge everything.

47:59 At the same time, we understand that just writing this whole thing from scratch is impossible.

48:05 No company in the world would be able to pull it off.

48:07 Well, maybe some companies would be able to, but they're definitely not a startup.

48:11 But they have many, many employees.

48:13 Exactly.

48:14 And they're probably public.

48:15 Giant tech companies.

48:16 Exactly.

48:17 So for us, the only viable strategy was to pick a database that is already trusted, that is already fast and universally loved, which is Postgres.

48:28 And it's also incredibly capable and just build on top of it.

48:32 And it's not actually a new approach in databases.

48:34 Like lots of databases actually are built on like primitive key value databases, like level DB or something like that.

48:41 It's a popular approach.

48:43 We're just taking it further.

48:44 We are saying that, hey, using a key value storage won't buy us much.

48:49 We are like high level programming language requires a lot of code to be written to properly be executed in good time.

48:57 But SQL looks like this nice compile target.

49:01 So this is why we use Postgres.

49:02 Yeah.

49:03 Very cool.

49:04 Kind of the TypeScript to JavaScript equivalent of the database query language in a sense.

49:08 Yeah, pretty much.

49:09 I mean, sometimes I explain HDB as LLVM.

49:11 Like imagine LLVM, it compiles your high level code to low level code and then it, et cetera.

49:17 And the same about HDB.

49:19 We compile your, like high level schema to like a proper normalized table layout.

49:24 We compile our HQL, high level query language down to SQL.

49:28 SQL and that SQL can actually be jittered by Postgres.

49:31 So essentially, ultimately your HQL might be executing with like at native code speed, not now, but in the future.

49:38 So what's the elevator pitch for people who are out there?

49:42 They're slightly, you know, not super thrilled about the database they're necessarily using, whatever that is.

49:47 And they're kind of exploring.

49:49 I picked up a few things that I think make it unique, but I want to ask you, it's your baby.

49:54 All right.

49:55 I guess I'll give two pitches.

49:56 One is super high level and one is slightly more low level.

50:01 A super high level pitch is that imagine you have a tool and when it's a great tool, it becomes an extension of your hand.

50:08 Essentially, you just don't notice it.

50:10 You just do things, right?

50:11 Current databases are not like that.

50:13 They require lots and lots of mental overhead to work with them.

50:18 Like what one library do you use in this language?

50:21 Right.

50:22 Is there lazy loading and N+1 stuff I got to consider or is it not and all those kinds of things?

50:26 Exactly.

50:27 And then you have to learn their API and then you have to learn SQL and understand how those things interact with each other.

50:33 And then you have to care about deployment and migrations.

50:36 It's just so much headache.

50:38 This alone explains why MongoDB was so popular and is so popular because a lot of people just decided, okay, to hell with that.

50:45 I don't want to deal with this.

50:46 I believe in the relational space altogether.

50:49 Yeah, exactly.

50:50 Just abandoning this train.

50:53 Yeah, and we want to fix all of that in HDB.

50:57 We want to give you a tool that you just don't notice.

50:59 We want to give you a data model that just feels native to Python or TypeScript or Go or any other language.

51:06 You don't have to think in tables anymore.

51:08 I want to give you a query language that is super easy to use and learn and compose and build query builders around.

51:16 And essentially we want to essentially kill the entire concept of ORM.

51:21 We don't need it anymore.

51:23 We are almost sorry that ORMs have to exist in a way.

51:28 I was going to ask you about that.

51:29 There are so many incredibly difficult problems.

51:31 This problem is called object impedance mismatch there.

51:33 Yeah.

51:34 And tables to like objects.

51:35 It's a super hard problem.

51:37 I feel sorry that they have to go through this.

51:39 But we just looked at this problem and decided, hey, can we actually just solve this object impedance problem in a different way?

51:47 Can we just avoid solving it entirely?

51:49 Can we just give you a database with the proper high level data model that doesn't have this problem at all?

51:54 Sure.

51:55 And then suddenly you don't need ORMs.

51:56 Let's talk real quick about the actual way you define what would be the equivalent, I guess, of a DDL table create script or somewhat related to that maybe closer is like an ORM class.

52:08 Like it's kind of the...

52:09 Okay.

52:10 Can I start a little from afar?

52:11 Yeah, yeah.

52:12 Let's start back.

52:13 Okay.

52:14 Now it's going to be the second pitch, which is slightly more detailed.

52:17 So we say that HDB is a new kind of database.

52:20 It's not just relational.

52:22 We call it a graph relational database.

52:25 Essentially, we are saying that we created an extension to the relational model.

52:30 So what actually constitutes the graph relational model?

52:33 It's first of all, in all of your like rows, all of your tuples in your relational algebra, they essentially have a globally unique key.

52:43 Now, this is a requirement.

52:44 So data independent is just UID essentially.

52:47 Every row in your database will have it.

52:50 This is the first requirement, first modification.

52:52 The second extension is links.

52:55 The idea that links between data is like a first class citizen of the model.

53:00 You don't need join, you don't need foreign key.

53:02 You just know that, hey, if this type links to another type, it's just going to be like a relationship between the unique IDs.

53:09 This is what unique IDs gives you.

53:11 And the second thing is the third thing is that everything is a set.

53:15 This, like if you have an object that is connected to multiple other objects, this is a set of objects.

53:25 If you have an object that has a bunch of properties, then a set of properties.

53:29 Even a single thing is a set as well.

53:31 And this later enables HQL to be super composable.

53:35 But these are just like three simple kind of axioms that are in the core of the model.

53:40 So if you, if we talk about like this schema snippet where we have an object type, a block post with required property content, which is text and required link author, which is another type called user.

53:54 It's going to be compiled to a table in SQL with a column called content with a column ID, which is going to be a unique UUID for every blog post that will have it automatically.

54:05 It's immutable.

54:06 It's read only.

54:07 You don't have to create them manually.

54:08 And a user will also be a table and also have IDs.

54:13 And then we'll have a separate column, which is going to be called author, which you will have IDs of users.

54:19 So ultimately, ultimately like deep beneath what you see in edge DB is like this high level schema.

54:25 It's all compiled properly for the relational model.

54:28 It's all normalized there.

54:30 We are still relational.

54:31 We still like exhibit like the same, the same characteristics as just, we're hiding a lot of this like low level things.

54:37 that you had to bother with with this high level model, just abstracting away the low level stuff.

54:44 Is there a way to directly connect to that relational view?

54:48 You mean Postgres, the underlying Postgres other ways?

54:50 Yeah.

54:51 Yeah.

54:52 Like the underlying.

54:53 I'm not sure even that's necessarily a good idea, but you know, like in SQLAlchemy, there's a way to go like, I just need to get out of here and send raw SQL for a moment.

55:01 Right?

55:02 Like that feels like that's kind of the same.

55:04 I just need to go to the guts for a minute.

55:05 Yeah.

55:06 Yeah.

55:07 Yeah.

55:07 So with edge DB, the goal is for you to never actually need that.

55:10 There is just one exception to this.

55:12 Just one exception.

55:13 Ideally.

55:13 Yeah.

55:14 Okay.

55:14 But basically our goal with edge QL, like we knew that first of all, we have to elevate the data and make it more high level.

55:21 And second of all, we knew that, Hey, order for a relational database to be successful, it just has to have query language.

55:28 Right.

55:29 And because our data was different.

55:31 We have to come up with our own.

55:32 This is how which kill was born.

55:34 Yeah.

55:34 And we spent years designing edge QL.

55:37 And the reason why is because we wanted it to be actually more powerful than SQL in many ways.

55:44 Basically, if you have something that is expressible in SQL, but isn't expressible in SQL, we treat it as a bug immediately.

55:51 If something is easier to do in SQL, it's a bug.

55:53 And this is why we spent so many years kind of refining this thing to make it to be capable of thing.

55:59 So basically you never need to use SQL.

56:02 You don't need to know about SQL or know about its existence.

56:05 And this is a powerful thing because when you use a norm library, you have to know about SQL.

56:10 With SGP, no, you just learn one language.

56:12 You're good to go for the rest of your life.

56:14 Essentially.

56:15 There's just one use case when you might need SQL.

56:18 It's when, let's say you're a big company and you're using some BI tools like Tableau or something like that.

56:26 Graph analysts that already know SQL.

56:29 And we're going to do something about it.

56:30 We're going to open like a special adapter.

56:34 Adapter.

56:35 Exactly.

56:36 We'll allow you to just run SQL against the database in read-only mode.

56:39 That makes a lot of sense.

56:40 Because there are these tools, these big BI tools.

56:42 And you're like, if your data is here, do you really want to like have some job to move it to another Postgres just to run an analysis on it?

56:49 Yeah.

56:50 Exactly.

56:51 I mean, just like with us not attacking this problem all at once and implementing the engine and the language and everything.

56:56 else here, we also understand that we are not going to replace all the business intelligence infrastructure overnight.

57:03 Yeah.

57:04 And yeah, we have to make it be compatible.

57:05 It's not there yet.

57:06 We'll be a part of a future release.

57:09 Eventually.

57:10 You'll have a nice roadmap, which we'll cover in a minute.

57:12 But like, I really love that.

57:13 Oh my God.

57:14 Don't do it.

57:15 Don't do it.

57:16 Don't do it.

57:17 I can just say it out loud.

57:18 Like the ideas that we have.

57:19 But let me, like just for people who want to see if they go there, just visually, the way that you've laid this out of like where you are and where you're going.

57:26 Like so many libraries and products should model this because so often, you know, you'll reach out to the companies.

57:33 Hey, it'd be great if you could do this.

57:34 Oh yeah.

57:35 It's on our roadmap.

57:35 Like, oh yeah.

57:36 Well, what is that?

57:37 Like some, where do you even have this?

57:38 Anyway, I think your roadmap is great, but give us the update.

57:41 It is, it is beautiful and I encourage everybody to go and check it out.

57:44 It's hdp.com/roadmap.

57:46 It's, it is slightly outdated.

57:47 Well, lots of things that are in progress were already done.

57:50 Yeah.

57:51 This formula car here, this is a 2021 series.

57:53 They just redid the Formula One cars for 2022.

57:56 So yeah, that's probably not what you're talking about.

57:58 Yeah.

57:59 All right.

58:00 So tell us what's coming for this.

58:01 What's coming.

58:02 It took us years for building HDB 1.0.

58:04 And during this time, we were almost encouraging people not to use HDB because it's a relational database.

58:10 If you build a business on an alpha version of relational database and goes down, your business will go down with it most likely.

58:16 And people should know you just released 1.0, right?

58:18 Yes.

58:19 That's a huge, huge thing.

58:20 Congratulations.

58:21 We launched 1.0 a week ago.

58:22 It was on Hacker News, number one for like 13, maybe 14 hours.

58:27 Wow.

58:28 It was a pretty, pretty interesting event.

58:29 We also had a live stream, us launching it, talking about architecture of HDB, of the query language, comparing it to SQL.

58:36 It's a great event and I encourage you to check it out if you have a live stream.

58:39 If you're interested, it's YouTube/HDB.

58:42 Check it out.

58:43 You'll find it there.

58:44 But yeah.

58:45 So it took us years to do 1.0 just right.

58:47 To make sure that SQL is right, that its design is sound and that the schema is right and the workflows and CLI and the cloud APIs.

58:56 Everything is just right and that we are confident that, hey, we're not going to be changing it.

59:00 We're not going to be retroactively fixing things.

59:02 Took us a long year, many years, but now it's out.

59:05 And now we don't want to spend many years on HDB 2.0.

59:09 We actually want to make it way quicker.

59:11 We have the solid foundation.

59:12 We can iterate much faster now.

59:14 And this is what we're going to do.

59:16 So our current target, internal target, is to release 2.0 sometime in May 2022.

59:22 So relatively soon.

59:24 2.0 will have a few features.

59:27 One is almost implemented.

59:29 It's a group by statement.

59:31 As I said, the idea of HQL is to actually surpass SQL in capabilities.

59:37 And right now with HQL, it's already incredibly powerful.

59:41 You can fetch that data hierarchies.

59:43 You can compute things.

59:44 You can use aggregate functions.

59:46 You have sub queries.

59:47 You have JSON.

59:48 Like it's an incredibly powerful language right now, but a proper group by statements will give it like proper analytical flavor.

59:56 Now you will be able to actually create reports and we have a great group by design.

01:00:00 By the way, we try to make HQL design process as open as possible.

01:00:04 We have RFCs.

01:00:05 It's a GitHub slash HDB slash RFCs.

01:00:08 So if you're interested to look at how our group by is different from SQL group by and why it's better than SQL group by, you can just go ahead and read an RFC about our group by.

01:00:18 So group by is going to be one thing.

01:00:20 The second thing is going to be a proper explain for your queries.

01:00:23 Like why is my query slow?

01:00:25 We have some ideas on how to make it less critic than the default explain output that you get most databases.

01:00:31 Then there is an exciting thing.

01:00:33 And I hope that we'll have enough time to implement it, which is access control.

01:00:38 So HDB is this like vertically integrated thing.

01:00:41 So you define your schema and in your schema you can define aliases, which is basically a view in your relational database.

01:00:49 You can define fields or object types that are computed dynamically with that scale.

01:00:54 So schema depends on SQL and SQL depends on schema in HDB.

01:00:58 They are intertwined.

01:01:00 So we have this idea.

01:01:01 It's not that it's like super new, but in HDB it's going to be super powerful.

01:01:06 Is that you'll be able to specify different policies on your schema type.

01:01:11 Like allow reading something or allow mutating something or disallow, etc.

01:01:16 Right.

01:01:17 And we don't want to hard code that.

01:01:19 So essentially we are introducing this concept of context in a database.

01:01:23 You'll be able to define sort of like global variables, like context variables in your schema, say a user ID in 64 and something else.

01:01:33 And then when you just get your connection in your Python code, you say with context plus user ID that is automatically passed to the database.

01:01:43 In your schema, you can implement arbitrarily access logic on your schema type.

01:01:49 And this logic will be automatically enforced in order of queries.

01:01:52 So fantastic.

01:01:53 Yeah, that's really cool.

01:01:54 Fetching data for the home page is filtered.

01:01:55 You are fetching data for report and it only includes the data that your business logic allow it to be there.

01:02:02 So basically with HDB, you will have schema and that schema not only will define just the data layout of your application, but also the access patterns and many other things in the future.

01:02:13 Yeah.

01:02:14 I really want to ask you about the query syntax.

01:02:17 Yeah.

01:02:18 I find it super interesting, especially also how it relates to like ORMs and so on.

01:02:24 But Michael out in the audience has a pretty neat question that sort of follows on to the roadmap first.

01:02:29 So since HDB is fundamentally Python, it'd be great to have a way to run user defined functions in Python, I guess, still like stored procedures, but Python.

01:02:40 Yeah.

01:02:40 Not SQL.

01:02:41 Yeah.

01:02:42 Yeah.

01:02:43 It's an interesting question.

01:02:44 I mean, user defined functions.

01:02:45 Well, first of all, there are like a couple of different planes, I would say, of user defined functions in the context of HDB because HDB has this notion of extensions.

01:02:54 The API isn't public yet, but HDB, for example, supports GraphQL natively.

01:02:59 You can just run HDB, let's say on port 555.

01:03:02 This is the localhost:555/db/mydb/graphql.

01:03:07 We want you to be able to also define potentially, eventually, like user defined API handlers there so that with HDB, you would not need a backup at all if your business logic is relatively simple and you don't need like a full blown application.

01:03:24 Oh, interesting.

01:03:25 So if I've got like something on Netlify where it's pure static code, I just write a little JavaScript, some view or whatever, and it could theoretically do read only stuff maybe to an HDB.

01:03:36 To an HDB instance or something like that.

01:03:38 Or even write only.

01:03:39 Yeah, absolutely.

01:03:40 We just want to kind of push this idea of backendless development as far as we can.

01:03:46 And because HDB has this incredibly powerful schema and will soon have access control, that already allows you to eliminate a lot of code, right?

01:03:53 If only you could define some simple server side, database side functions.

01:03:58 A little bit of Python in there.

01:04:00 I'm starting to come around.

01:04:01 Yeah.

01:04:02 A little bit of Python or JavaScript or maybe a Rust or something.

01:04:05 that you can just make that request to Stripe API, do something and then glue things together.

01:04:10 Yeah.

01:04:10 Then maybe you don't need the backend at all.

01:04:12 So this is our vision eventually to allow things like this.

01:04:16 And second plane is user defined functions within the database.

01:04:20 And because there is in Postgres, those functions are going to be running like inside Postgres.

01:04:23 You will be able to call them from the query language.

01:04:25 Like, hey, use NumPy to crunch this data for me.

01:04:29 Like write in SQL.

01:04:30 This is also possible.

01:04:31 There are extensions for Postgres that allow you to do that.

01:04:34 It's possible to define user defined functions in Postgres.

01:04:38 Multiple different extensions for that are there.

01:04:41 So yeah, it's an interesting thing for us to think about.

01:04:45 And we are thinking about it, but probably not for 2.0.

01:04:48 Yeah.

01:04:49 Okay.

01:04:50 Very cool.

01:04:51 So let's take this statement here for a minute.

01:04:52 Yeah.

01:04:53 This query syntax highlights a lot of probably what makes EdgeDB unique and some of your motives

01:05:00 here.

01:05:01 So this, if you wanted to go and get say a movie, which has a relationship to an actor's

01:05:07 table and you want to do some sort of filter type thing, you would say select movie curly

01:05:13 brace, look at that title.

01:05:15 That's the select projection.

01:05:16 So a movie.title basically.

01:05:18 And then actors, curly brace, name and email.

01:05:22 So is that, is this part right here, this, the sub actors, is that traversing the relationship,

01:05:27 that graph relationship?

01:05:28 Exactly.

01:05:29 You're basically traversing the graph.

01:05:30 And then inside the select statement, you say order by.name and you have this cool convention

01:05:34 of dot, which if you're in one of these scopes, like curly bracket actors, then you can say

01:05:41 dot and it means dot name applies back to actors that, right?

01:05:44 Actors.

01:05:45 Yes.

01:05:46 And basically this is just syntax sugar.

01:05:47 Nothing prevents you from spelling it out completely.

01:05:49 Like you say, you can say order by movie.actors.name.

01:05:52 Yeah.

01:05:53 But because you're already in inside the actors, essentially, we're just like giving you this.

01:05:57 Yeah.

01:05:58 Fantastic.

01:05:59 Then another thing that stands out for the query syntax is you can define inline variables

01:06:03 using the walrus operator, by the way.

01:06:05 Yeah.

01:06:06 So you can say average review equals math mean dot reviews of the movie, then dot rating.

01:06:13 And is this also traversing?

01:06:15 Exactly.

01:06:16 What is this?

01:06:17 Yeah.

01:06:18 a movie type has a multi-linked reviews.

01:06:23 So multiple reviews can be attached to movies and every review has, let's say five star rating,

01:06:28 an integer one to five.

01:06:30 And this is how you quickly can say, hey, just calculate the mean number of all linked reviews

01:06:36 and all their ratings.

01:06:38 So somebody on Hacker News years ago aptly called HGL as a child of SQL and GraphQL.

01:06:43 And I mean, it's funny, but there is truth to it because GraphQL made it extremely obvious

01:06:50 to people that working with object hierarchies this way, when you can just have a query that

01:06:55 just select something deep, right, is extremely important.

01:06:58 People suddenly realize this is cool.

01:07:00 Some companies have been trying to make GraphQL work for relational databases such as Hasura.

01:07:05 And they have an amazing product.

01:07:06 The only problem is that GraphQL isn't actually, it wasn't, it wasn't designed for creating databases.

01:07:11 It's an API language, it's a REST replacement.

01:07:14 So while it works for some things, good luck computing something in GraphQL.

01:07:18 You just can't, you can fetch things, but you cannot compute like your average review is,

01:07:22 is not possible to do in GraphQL.

01:07:24 SQL on the other hand is, is very stubborn when you have to select anything nested, like

01:07:31 things in tables, you have to think in tables, you either like select super wide tables,

01:07:35 and then you have to write some Python code to kind of combine it back to your shape or use

01:07:40 a norm, or if you use an advanced database, you can things like, you can use things like

01:07:45 a Reag, but SQL isn't, isn't, it doesn't shine for things like this.

01:07:49 So with, HQL, we're kind of marrying those both worlds.

01:07:53 You have this, deep fetch syntax and, you have an ability to drop computation in any,

01:08:00 at any point of your query.

01:08:02 Now a couple of other like super important things about HQL.

01:08:06 If you want, I can go into them.

01:08:08 Yeah.

01:08:09 We're getting short on time, but yeah, go ahead.

01:08:10 Sure.

01:08:11 As I said before, sometimes the pitch HDB is like this LVM thing, like, compiler.

01:08:15 When we compile HQL query to a SQL, we have one important thing.

01:08:19 Every HQL query, no matter how complex it is, it's always compiled to just one SQL query.

01:08:25 And this is very important in the context of relational databases, because when you have

01:08:29 just one single query, it's atomic.

01:08:31 So you don't need like an explicit transaction.

01:08:33 You're already like working.

01:08:35 You always work with the same snapshot of the essentially.

01:08:37 Interesting.

01:08:38 So you're not in this case, like going, doing a query for the movies and then doing a query

01:08:42 for the actors and then doing a query for the reviews as three steps.

01:08:48 You're just, it's basically a three way join.

01:08:50 And then you're getting the data back out.

01:08:51 Something like that.

01:08:52 It's slightly more complicated than three way join.

01:08:55 Yeah, sure it is.

01:08:56 But yeah, basically, basically, yeah, that's, that's the idea.

01:08:59 We wouldn't, for one HQL query, it's always one SQL query.

01:09:01 It's very important.

01:09:02 We use lots of interesting tricks to make it happen.

01:09:04 And if you're interested about those tricks, YouTube slash HDBs and watch our live event,

01:09:08 we explain this all actually.

01:09:10 But it's an important thing.

01:09:11 And then HQL is actually, it's very composable.

01:09:15 So you can pack multiple different queries into one query.

01:09:18 So you can have a query that reads data, insert data, mutates data, and introspects the schema.

01:09:24 All in one huge thing.

01:09:26 And it will execute quickly for you and return your data like in proper way, ready for you to be consumed.

01:09:32 So HQL is extremely powerful in that regard.

01:09:34 This is what separates it from ORPs because your OR, be it SQLAlchemy or Pre-Ur,

01:09:39 or Prisma or something like that.

01:09:41 They might have a high level API for some operations, but they also don't really restrict themselves

01:09:47 on how many queries it will take to implement that API.

01:09:50 Right.

01:09:51 And sometimes N plus one.

01:09:52 Yeah.

01:09:53 Right.

01:09:54 And if you benchmark it on localhost, for example, databases on your laptop and your code

01:09:57 executes on laptop, it appears to be fast.

01:10:00 So you have three queries instead of one.

01:10:02 So what?

01:10:03 Like there is zero latency between your database and your code.

01:10:05 And probably not full production levels of data.

01:10:08 Sure.

01:10:09 But when you move it to the dot data center, you will have latency between your code and

01:10:13 the database.

01:10:14 And even if you have like one millisecond latency between your queries, suddenly you just start

01:10:19 losing performance a lot because your Python that uses or JavaScript that uses a norm operation,

01:10:26 you can actually fire like 10 queries.

01:10:28 This is easy.

01:10:29 Like 10 queries is fine.

01:10:30 And imagine it just spend 10 milliseconds on just doing that.

01:10:34 Yeah.

01:10:35 Just latency.

01:10:36 Nothing else.

01:10:37 Yeah.

01:10:38 Just losing performance.

01:10:39 So with SGP, it's not a thing.

01:10:40 All right.

01:10:41 So final question here.

01:10:42 When I run this, what do I get back in Python?

01:10:45 Obviously there's a nice async and synchronous Python API to talk to this.

01:10:50 Yeah.

01:10:51 But when I run this query in Python, what do I get?

01:10:54 It depends on how you run it.

01:10:56 Yeah.

01:10:57 We offer you two modes, essentially two output modes.

01:11:00 Any HQ query can be compiled as JSON.

01:11:02 In our Python client, you just say query JSON.

01:11:05 And it will return your JSON data, like ready to be pumped to your front end.

01:11:09 Or you can just say query.

01:11:10 And when you say query, it will return you rich Python objects.

01:11:14 So you'll have movie Python object, which with a title, a string attribute with an actors list,

01:11:19 which will have factors, objects within it, et cetera.

01:11:22 It's also very compact, like on the IO level.

01:11:25 So we are not sending like super fat tables or anything.

01:11:29 The data is neatly serialized.

01:11:31 So no need for any duplication.

01:11:33 Yeah, that matters.

01:11:34 Anything.

01:11:35 It's just like you have your native object data model in the database.

01:11:39 You query it and you get objects out of it.

01:11:43 So you never have to think about like any tables or anything.

01:11:46 It's always high level.

01:11:47 Nice.

01:11:48 All right.

01:11:48 Final question.

01:11:49 Then we really do have to wrap it up.

01:11:50 One of the things that's really nice about ORMs is I can say my thing dot, and I get a list

01:11:56 in my editor of what I should be getting back from the database.

01:11:59 Can I do that with this?

01:12:02 I know like the movie is basically defined in the GraphQL schema definition.

01:12:08 Is there a way to do like a type shed type thing?

01:12:10 Yeah.

01:12:11 EdgeDB.

01:12:12 Sorry.

01:12:13 I don't know.

01:12:14 Maybe it's not a DB schema language, but is there a way to do like a type shed thing to say,

01:12:17 well, that thing you get back looks like this.

01:12:19 Yes.

01:12:20 Unfortunately not in Python yet.

01:12:22 In TypeScript, we just released our query builder and it's insane because the API of the

01:12:27 query builder super closely replicates the layout of the SQL query.

01:12:34 It's basically like one to one correspondence.

01:12:36 It's like almost like same thing.

01:12:38 And in TypeScript, we just focused on TypeScript first, then Python is next.

01:12:42 But for TypeScript, yes, you reflect your schema with just one command line command.

01:12:47 And in VS Code, you now have full autocomplete.

01:12:50 You can express your queries in TypeScript no matter how nested they are, no matter what

01:12:53 kind of computation you do.

01:12:55 It's still the same idea.

01:12:56 Whatever query you build in your TypeScript is going to be just single SQL query, just single

01:13:00 SQL query.

01:13:01 It's going to be fast.

01:13:02 And you have full autocompletion and more.

01:13:05 You actually have full return type inference.

01:13:08 So you don't have to type anything.

01:13:09 You have a query, your VS Code and TypeScript, they will know the type of the data that's going

01:13:15 to be returned.

01:13:16 Interesting.

01:13:17 Okay.

01:13:18 It works like magic.

01:13:19 We're going to see if we can replicate this experience with Python and mypy.

01:13:24 This is going to be our goal to make something like this happen.

01:13:27 Right now, we just have this low level, well, relatively low level client API for Python.

01:13:31 You can run any SQL query.

01:13:32 You can get data for it.

01:13:34 You can do it in async or sync.

01:13:36 Entirely up to you.

01:13:37 But the typing integration specifically isn't there.

01:13:40 And second part of this question is that we are looking in future implementing a language

01:13:47 server protocol for EdgeDB.

01:13:48 So installs it locally and then VS Code would just connect to it.

01:13:52 And then you would have your autocomplete for EdgeQL queries, for schema files.

01:13:56 This is going to be great, but I'm just not sure like what kind of ETA we can put in it.

01:14:00 Probably not for 2.0.

01:14:01 Right.

01:14:02 Okay.

01:14:03 Yeah.

01:14:03 Looking forward to it.

01:14:04 Very neat work on EdgeDB and obviously all the building blocks that we talked about

01:14:07 at the beginning.

01:14:08 Congratulations.

01:14:09 Thank you, Michael.

01:14:10 Yeah.

01:14:11 You bet.

01:14:12 All right.

01:14:13 I'll list my Pi as well.

01:14:14 My Pi is a great thing.

01:14:14 Use my Pi.

01:14:14 Cool.

01:14:15 Right on.

01:14:16 All right.

01:14:17 Final call to action.

01:14:18 People are interested in any of your projects.

01:14:19 Probably primarily EdgeDB.

01:14:19 What do you say?

01:14:19 How to get started?

01:14:20 Yeah, absolutely.

01:14:21 It's ready for you.

01:14:22 It's a 1.0.

01:14:23 It's a stable.

01:14:24 Follow us on Twitter.

01:14:25 It's Twitter edge database without any underscores or dashes.

01:14:25 Just edge database.

01:14:26 Follow us on Twitter.

01:14:26 You will find the discord link right in the Twitter description.

01:14:26 So join our discord.

01:14:26 We try to grow community.

01:14:26 And yeah, build something amazing.

01:14:27 EdgeDB.

01:14:27 I can't really do it.

01:14:28 I can't really do it.

01:14:29 You can't really do it.

01:14:30 I can't really do it.

01:14:31 I can't really do it.

01:14:32 If you want to do it.

01:14:33 If you want to do it, you can't really do it.

01:14:34 And then you can't really do it.

01:14:35 If you want to do it, you can't really do it.

01:14:36 You can't really do it.

01:14:37 If you want to do it.

01:14:38 You can't really do it.

01:14:39 So if you want to do it, you can't really do it.

01:14:41 If you want to do it, you can't really do it.

01:14:42 If you want to do it.

01:14:43 If you want to do it, you can't really do it.

01:14:44 You can't really do it.

01:14:45 If you want to do it.

01:14:46 Find the discord link right in the Twitter description.

01:14:49 So join our discord.

01:14:51 We try to grow community and yeah, build something amazing.

01:14:55 EdgeDB.

01:14:56 I can say it like with full confidence.

01:14:58 EdgeDB is the most amazing things that thing that ever happened to relational databases.

01:15:03 So take a look at it.

01:15:04 This is the beginning of hopefully a big movement.

01:15:07 Yeah, fantastic.

01:15:08 Let me put in one final postscript question.

01:15:11 Sorry.

01:15:12 I really wanted to ask you this and I think it matters for people considering adopting it.

01:15:15 But do keep it super quick.

01:15:17 What's the business model?

01:15:18 Like when you guys released this thing, is it how do people get it?

01:15:21 Will there be a free version?

01:15:23 What's the story?

01:15:24 So EdgeDB is fully open source.

01:15:26 It's Apache two licenses.

01:15:27 It's extremely permissive.

01:15:28 No strings attached.

01:15:29 We'll make money by running EdgeDB for you.

01:15:31 Essentially, we will have a hosted version of EdgeDB.

01:15:34 EdgeDB as a service.

01:15:35 Yeah, absolutely.

01:15:36 And this is how most other these companies make money these days.

01:15:38 It's not anymore about enterprise version of your database so much.

01:15:42 It is about, hey, can you run this?

01:15:44 It's a database for us in the private cloud.

01:15:46 Right.

01:15:47 This is what businesses want.

01:15:48 Back it up, scale it.

01:15:49 Give us all that kind of back.

01:15:50 Exactly.

01:15:51 Okay.

01:15:52 We're actively working on that.

01:15:53 Although you can run EdgeDB right now on top of Aurora Postgres, RDS Postgres, Google Cloud.

01:15:58 We have guides for that.

01:15:59 So if you need to deploy your HDB application, we have your back.

01:16:02 But we will have this native, proper cloud version of HDB with which you will be able to just, with one terminal command, you will be able to bootstrap a cloud database for yourself.

01:16:13 It's going to be amazing.

01:16:14 All right.

01:16:15 Fantastic.

01:16:15 Thanks, Gary.

01:16:16 Thank you.

01:16:17 Yeah.

01:16:18 Bye.

01:16:19 Bye.

01:16:20 Bye.

01:16:21 Bye.

01:16:22 Bye.

01:16:23 Bye.

01:16:24 to check out what they're offering. It really helps support the show. Take some stress out of

01:16:28 your life. Get notified immediately about errors and performance issues in your web or mobile

01:16:33 applications with Sentry. Just visit talkpython.fm/sentry and get started for free. And be sure

01:16:40 to use the promo code talkpython, all one word. Add high performance, multi-party video calls to

01:16:45 any app or website with SignalWire. Visit talkpython.fm/SignalWire and mention

01:16:51 that you came from Talk Python To Me to get started and grab those free credits.

01:16:54 Want to level up your Python? We have one of the largest catalogs of Python video courses over at

01:17:00 Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async.

01:17:05 And best of all, there's not a subscription in sight. Check it out for yourself at

01:17:09 training.talkpython.fm. Be sure to subscribe to the show, open your favorite podcast app,

01:17:14 and search for Python. We should be right at the top. You can also find the iTunes feed at

01:17:19 /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.

01:17:26 We're live streaming most of our recordings these days. If you want to be part of the show and have

01:17:31 your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm

01:17:36 slash YouTube. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it.

01:17:42 Now get out there and write some Python code.

01:17:44 Bye.

01:17:45 Bye.

01:17:46 Bye.

01:17:47 Bye.

01:17:48 Bye.

01:17:49 Thank you.