New course: Agentic AI for Python Devs

SQLAlchemy and data access in Python

Episode #5, published Tue, Apr 28, 2015, recorded Thu, Apr 9, 2015
In this episode we speak with Mike Bayer. Mike created SQLAlchemy in 2005 and over the past 10 years has been building and refining this amazing RDBMS ORM and data access layer.

You'll learn a lot about the history of the project and how it has evolved over time. You'll also here where Mike got some of his inspiration for the design patterns used in the library.

So what are you waiting for? Maybe it's time to

pip install sqlalchemy

Links from the show:

SQLAlchemy Site:
http://www.sqlalchemy.org/

Asynchronous Python and Databases:
http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/

Hybrids and Value Agnostic Types:
http://techspot.zzzeek.org/2011/10/21/hybrids-and-value-agnostic-types/

Book: Essential SQLAlchemy:
amzn.to/1Dw3YHY

Book: SQLAlchemy: Database Access Using Python:
amzn.to/1Dw4m9a

Episode Deep Dive

Guest Introduction and Background

Mike Bayer is the creator of SQLAlchemy, a popular database toolkit and Object Relational Mapper (ORM) for Python. He started developing SQLAlchemy in 2005 and has worked on several other prominent Python libraries, including Mako Templates and the Alembic migrations tool. With deep roots in the Python and open-source communities, Mike has refined SQLAlchemy for over a decade, focusing on making database access both intuitive and flexible.

What to Know If You're New to Python

If you're new to Python and want to follow the discussion in this episode more easily, here are a few pointers:

  • Have a basic grasp of object-oriented programming and how Python classes work.
  • Understand the difference between working with raw SQL queries and using an ORM.
  • Know how to install Python packages with pip (e.g., pip install sqlalchemy).
  • Be aware that Python uses significant whitespace for code blocks and this is normal (the “indentation thing”).

Key Points and Takeaways

  1. What Makes SQLAlchemy Unique SQLAlchemy is both a comprehensive database toolkit and an ORM. Rather than hiding SQL, it embraces it, providing the “Core” (for direct SQL and schema operations) and the “ORM” (for higher-level abstractions). Mike Bayer designed SQLAlchemy to handle all aspects of database interaction—from simple queries to complex migrations—while keeping Python developers closely tied to SQL best practices.
  2. The Two Layers: Core vs. ORM SQLAlchemy consists of two primary layers: the Core (engine, connections, SQL expressions) and the ORM (object-relational mappings, sessions, and higher-level abstractions). Many large-scale users (e.g., Reddit, Dropbox) stick to just the Core for maximum control, while others leverage the ORM for quick model mappings and Pythonic APIs.
  3. Early Challenges and Evolution Initially, SQLAlchemy went through several redesigns, from 0.x versions to the more stable series of releases. Over time, the project introduced a declarative syntax, support for migrations, and performance improvements. Mike emphasized that frequent feedback from real-world usage (including his own jobs) shaped SQLAlchemy’s most robust features.
  4. Declarative Mapping Declarative mapping simplifies creating Python classes mapped to database tables. It uses a special “Base” class, and you declare columns as attributes directly within your Python classes. This pattern makes it straightforward to define relationships and constraints in a single place, matching the approach of many modern frameworks (e.g., Django).
  5. Lazy Loading vs. Eager Loading A classic ORM challenge is deciding how to load related data. SQLAlchemy supports lazy loading (only fetching related objects when needed) and eager loading (explicitly fetching related objects in one go). Lazy loading can lead to the “N+1 query problem,” while eager loading can streamline performance if used thoughtfully.
    • Concepts:
      • N+1 problem: Loading each related record individually, causing a flurry of queries.
      • Eager loading: Loading everything in fewer queries by specifying .options(joinedload(...)) or similar.
  6. Performance Tuning SQLAlchemy performance has improved dramatically since its early versions. Knowing Python’s internals—like avoiding too many function calls or repeated dictionary key checks—helped the library become more efficient. Users can switch between Core-only data fetching and ORM-based approaches to strike the right balance of speed and abstraction.
    • References:
  7. Database Migrations and Alembic Modifying a live database schema—such as adding columns—is typically handled via migration tools. Mike created Alembic (a separate library) for fine-grained database schema changes, including adding or dropping columns, renaming fields, and more. While metadata.create_all() can generate new tables, Alembic or similar tools manage schema evolution.
  8. Versioning and Stability From early 0.x releases to 1.0 and beyond, SQLAlchemy’s API has steadily stabilized. Mike implemented deprecation warnings and clear migration guides to ease upgrades for production projects. Today, major changes rarely break existing code, due to thorough backwards-compatibility layers.
    • Tools / Concepts:
  9. Async Python vs. Traditional Threading While some developers explore asynchronous database calls (e.g., via asyncio), Mike generally recommends using thread pools for SQLAlchemy. Many database interactions—especially in transactions—are more straightforward in a synchronous model. The asynchronous approach can be bolted on at a higher level, often with minimal code changes.
  10. Widespread Adoption and Community Organizations like Reddit, Dropbox, Yelp, and OpenStack use SQLAlchemy for high-traffic environments. Whether they stick to the Core or embrace the ORM, these success stories underline how versatile SQLAlchemy can be. Mike attributes its popularity to a balance between helpful abstractions and direct SQL control.

Interesting Quotes and Stories

"Python really looks like it, but it's got that stupid white space thing. And I just didn't want to get into it. And then one day ... I was like, oh, we do that whitespace anyway when the code is clean." -- Mike Bayer

"When I saw it do that commit the first time and I saw all the rows go, I was like, wow, people are going to love this thing." -- Mike Bayer on seeing SQLAlchemy's “Unit of Work” in action

"The best way to write open source code is to be, you know, what we say, eating your own dog food on the job." -- Mike Bayer

Key Definitions and Terms

  • Core: Low-level SQLAlchemy engine, connections, and SQL expression language.
  • ORM (Object Relational Mapper): Maps Python classes and objects to relational database rows.
  • Unit of Work: A design pattern that tracks changes to objects and batches inserts, updates, and deletes in a single transaction commit.
  • Foreign Key: A database constraint linking one table’s column(s) to another table’s primary key.
  • Lazy Loading: Loading related data only when accessed. Can cause extra queries (N+1 problem).
  • Eager Loading: Pre-fetching related data with the main query to minimize round trips to the database.
  • Migrations: Controlled schema changes (add/drop columns, rename fields) in a database over time.

Learning Resources

Overall Takeaway

SQLAlchemy stands out as a mature and powerful toolkit for Python database work, straddling the line between exposing SQL’s full power and automating away repetitive tasks. By blending the Core layer and the ORM, developers can choose exactly how much abstraction they need. With a robust community, a thoughtful design influenced by real-world use cases, and widespread adoption in some of the biggest Python shops, SQLAlchemy remains a cornerstone of Python data access.

Episode Transcript

Collapse transcript

00:00 Talk Python To Me, episode number five, with guest Mike Baer, recorded Thursday, April 9th, 2015.

00:07 Hello and welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

00:40 This is your host, Michael Kennedy.

00:41 Follow me on Twitter, where I'm @mkennedy, and keep up with the show and listen to past episodes at talkpythontome.com.

00:49 This episode, we'll be talking with Mike Baer about SQLAlchemy.

00:53 Let me introduce Mike.

00:54 Mike Baer is the creator of several prominent Python libraries, including SQLAlchemy,

01:00 Mako Templates for Python, Alembic Migrations, and Dogpile Caching.

01:04 He's been working with open source software and databases since the mid-1990s.

01:09 Today, he's active in the Python community, working to spread good software practices to an ever-wider audience.

01:14 Mike is a semi-regular presenter at PyCon US, and has also spoken at many smaller events and conferences in the United States and Europe.

01:22 Follow Mike on Twitter, where he's Zeke.

01:24 That's at Z-Z-Z-E-E-K.

01:27 Welcome to the show.

01:28 Thanks. Thanks for having me.

01:30 Yeah, I've been a longtime fan of SQLAlchemy.

01:33 And, you know, when I started this podcast, I was thinking, who do I have to have on the show?

01:38 And you were definitely on the list.

01:39 That's very flattering.

01:41 Yeah, you built some great stuff.

01:44 So, before we get into the details of SQLAlchemy and how we should use it and why it's so awesome,

01:49 maybe let's take a trip down memory lane, step back a little bit, and just, you know, how and when did you get into Python?

01:55 I got into Python after kind of hesitating to get into it, meaning I knew about it,

02:00 and it seemed like something that was appealing to me.

02:03 But I kind of resisted mostly because of the white space thing.

02:06 This is back in probably 2003.

02:09 I was working at Major League Baseball.

02:11 Most of my programming career before Python was doing a lot of Java and a lot of Perl.

02:18 And I really liked the idea of object-oriented languages, and I liked the idea of scripting languages.

02:25 So Java was very object-oriented.

02:28 Perl was not too good at objects.

02:30 But it was a scripting language.

02:32 So I really wanted there to be some scripting language where I could write much cleaner code than you get in Perl.

02:37 Yeah, so you kind of had half of what you wanted in one language.

02:40 Yeah.

02:41 But you wanted to bring it all together, right?

02:43 And Python really looked like it.

02:45 Python really looks like it, but it's got that stupid white space thing.

02:48 And I just didn't want to get into it.

02:50 And then one day when I was working at baseball, we were rolling out this client application called WinCVS, I think it was.

02:58 It was a Windows CVS client.

03:00 And we had to roll it out to people that were not programmers.

03:02 And we needed to add hooks to it so that people could tag things and send things to different servers by tagging.

03:10 And WinCVS's scripting language was Python.

03:13 So I had to learn, you know, 10 lines of Python to do this script.

03:18 And basically I spent like two hours with the white space thing.

03:21 And I was like, oh, we do that white space anyway when the code is clean.

03:26 And I'm actually totally fine with it.

03:27 So let me just learn Python now that I've spent 10 minutes getting used to white space.

03:32 That's really funny that it was like a real barrier to you, right?

03:36 And then it just came down.

03:37 It really bugged me because I was a really sloppy coder back then.

03:40 Python really made me a much cleaner programmer.

03:43 Yeah, that's a good point.

03:45 I hadn't really thought about that.

03:46 But it definitely does make you think about that.

03:49 And, you know, a lot of the modern editors make it almost transparent to you.

03:52 But I think there's the mental concept of it.

03:55 And then there's the reality of it.

03:56 And the reality is that it's actually not a big deal at all.

03:58 It's super smooth.

03:59 But when you're coming from, say, Java or C# or C++ or something, then it seems like a huge deal.

04:05 Yeah, you get over it really quickly.

04:08 Excellent.

04:09 Oh, you bring up WinCVS.

04:10 Boy, I used to use that thing back in the late 1990s or something.

04:15 That brings back some memories.

04:16 Yeah, that's kind of what it was, yeah.

04:17 That's awesome.

04:18 Let's talk about something more modern than WinCVS.

04:21 Let's talk about SQLAlchemy.

04:23 Sure.

04:23 So can you just tell, I suspect that, you know, 90% of the listeners are fans of and familiar with SQLAlchemy.

04:31 But there's going to be some folks who don't know or maybe they know it as a buzzword.

04:34 Can you just tell everyone what SQLAlchemy is?

04:37 So SQLAlchemy is this library that is, it's all about Python.

04:41 And it's all about SQL and databases and interacting with databases.

04:47 So it's basically when you need to work with a relational database, say you're going to work with like Postgres or MySQL, there's all kinds of things you need to do with the database.

04:56 You need to write, you know, programs that talk to it.

04:58 You need to get results back.

04:59 You need to create schemas.

05:01 You might need to be given some existing database and look at it.

05:05 There's scripting and migrations of schemas you want to do.

05:08 You probably are writing applications that have higher level business concepts that you want to map to that database.

05:15 So SQLAlchemy started out with a really ambitious goal of having a place, a way to do that, a way to do all those things in Python.

05:24 Not just being able to map like an object to a table, but all the features of managing schemas and creating schemas and reflecting them and working with SQL scripts and everything else.

05:36 Working with data type problems, making all kinds of different database backends look as similar as is feasible.

05:44 You know, like data types in Oracle are very weird compared to those in MySQL and SQLite's a whole different story.

05:51 Those databases are always going to be different, but you can at least try to get some semblance of sanity across all of them.

05:56 So that's why if you go to SQLAlchemy's website, the first thing it says is like the database toolkit for Python.

06:01 It doesn't say it's an ORM.

06:02 That's just one component.

06:04 It's a toolkit that has kind of helpers for whatever you have to do with a database.

06:11 SQLAlchemy can definitely have a role.

06:13 That's excellent.

06:13 You have at least two really nice sort of comprehensive walkthroughs, one for the core and one for the ORM component.

06:20 That's SQLAlchemy.org, right?

06:23 Yes, SQLAlchemy.org.

06:25 So when did you get started with, when did you create SQLAlchemy and what inspired you to create it in the first place?

06:31 So I was kind of writing the first iteration of SQLAlchemy in late 2005, and our first release was early 2006.

06:39 And really when I was writing SQLAlchemy, it was not the first time I was creating a tool like that.

06:45 When you work, or at least in the old days, when you used to work in the 90s in all these various internet shops, we had really little tools to work with.

06:55 I mean, there were a lot of tools, but we were, you know, using Perl and libraries to talk databases were pretty crude.

07:01 And I was using Java from day one when it was, you know, version 1.0, and there was literally no libraries before Hibernate.

07:08 So throughout the 90s, you know, you get into the habit of writing little database access layers yourself.

07:15 And every job you have, you're going to go and, now I'm going to write a really better database.

07:21 You know, the database access layer I wrote last time sucked.

07:23 Now I'm going to write a really good one.

07:25 And you say that statement to yourself like five times, right?

07:29 Yeah, every job you have, it's like, okay, this time we're going to get it right.

07:32 We're going to write the ultimate, you know, set of tools, you know.

07:36 And then as you're doing this, you know, as the 2000s rolled along, ORMs, you know, Hibernate suddenly was around.

07:42 And I was doing a lot of Perl at MLB, and there were some ORM-ish things with Perl that looked terrible to me.

07:48 They looked really simplistic.

07:51 I had worked, by the time I was at MLB, I was working, that was, they were a big Oracle shop.

07:55 I had already had a lot of gigs where I had worked a lot with Oracle.

07:58 I had worked a lot with Postgres.

08:00 I had worked a lot with Sybase, Microsoft SQL Server.

08:03 I had worked a lot with not just MySQL, but its earlier MSQL incarnations in the early 90s.

08:10 So I had, and I had written database code in Perl, Java, and C, C++, you know, because back in the early 90s, you know, we didn't, Perl wasn't always available for some cases.

08:23 So I really had iterated a lot on this whole back and forth with the database, sending a SQL string, getting a result back, getting data back from the result.

08:30 I knew a lot about that.

08:33 So, and I always wanted to have a system that would be the last time I have to do this, you know.

08:40 Yeah, I had that feeling.

08:41 A long time, it was like, I was writing things in Java.

08:43 I was going to write like a big toolkit in Java, like the big, you know, web framework, everything you need Java thing.

08:49 And it never really happened.

08:50 But eventually when I got into Python, that's when I really began doing it.

08:54 I, you know, Python had kind of not a lot of libraries around.

08:58 The only library that was feasible for database access on Python was SQL Object, which is a great library and was a huge influence on SQLAlchemy.

09:08 And I started doing it and it was kind of based on the latest techniques I iterated at baseball.

09:14 Like the idea is that if you want to talk to a database table, you make a data structure that represents what the table looks like.

09:20 And we call that table metadata.

09:22 But one critical thing I did with SQLAlchemy was I actually decided to read a book beforehand.

09:28 So I had this book called Patterns of Enterprise Architecture by Martin Fowler.

09:32 Yeah, that's a great book.

09:33 And I read that book and half of the book was like, oh, that, oh, this, oh, this.

09:38 All these things that I've done.

09:39 And he kind of put names to them.

09:40 And then as he did all that, there were some other patterns that I was not familiar with, like the unit of work pattern.

09:46 I was like, wow, look at that.

09:46 I've never heard of that.

09:47 That's pretty cool.

09:48 And does that appear as the session in SQLAlchemy now?

09:51 Yeah, the session.

09:52 Yeah.

09:52 And actually session is from Hibernate.

09:54 Hibernate's got the same thing.

09:55 So I, you know, really made a go at it.

09:58 And I said, I'm actually going to read a book beforehand and really try to get all these patterns right.

10:02 And that's kind of how it started.

10:05 And it took a very long time for SQLAlchemy to be any good.

10:09 I mean, I think when I first released it, it had a lot of the special things going on that made people like it.

10:15 But it was not executed very well.

10:17 But, you know, over the years, over really 10 years, it's gotten very refined.

10:22 But it's taken a long time.

10:23 Yeah, it's quite a mature product.

10:24 And that's cool.

10:25 Yeah.

10:26 Did you use it internally before you actually released it?

10:29 Or was it kind of, I'm building it out in public?

10:31 Yeah, no, it was funny.

10:33 When I was starting in Python, I was writing open source.

10:35 The first thing I wrote was this template language called Mighty, which was basically almost a line for line port of a template thing in Perl called HTML Mason.

10:42 That was the first open source thing I published.

10:45 And then Mighty was kind of an embarrassment eventually.

10:48 And I wrote Mako to replace it.

10:50 But I was writing these libraries at first without having any job.

10:55 The gigs I was doing were still Java jobs.

10:57 I was still doing, after I left baseball, I was still doing Java work.

11:00 Python was certainly used in the mid-2000s, but it was not as dominant as it is today.

11:06 I didn't actually get to use SQLAlchemy for a real gig until version 0.5.

11:11 So maybe four years into it, I finally was on a gig where like, hey, we can do this thing in Python and we can do this.

11:17 And everyone was like, okay, use the Python thing.

11:20 Yeah, use that toy language.

11:22 Yeah, yeah, yeah.

11:22 It was kind of like that, yeah.

11:24 And it was actually the website for a TV show called Charlie Rose, which is on PBS.

11:28 Oh, yeah, yeah.

11:28 Great.

11:29 Yeah, we wrote, I don't know if it's still up in that incarnation anymore, but at the time, this is maybe 2007, it was written using early version of pylons.

11:38 And that was the first time I used SQLAlchemy on the job myself.

11:40 And SQLAlchemy improved by leaps and bounds on that gig because I had like, oh, this sucks.

11:45 Oh, look at this thing.

11:46 This thing is terrible.

11:47 How could people have been using this for three years?

11:49 Yeah, that's excellent.

11:50 Yeah.

11:51 So it's weird.

11:52 Yeah.

11:53 It was written kind of in a vacuum.

11:55 Like I only knew what my users told me for several years.

11:58 Yeah, very cool.

11:59 The previous show that's not out yet, but was recorded just before this is a guy named Mahmoud Hashemi from eBay PayPal.

12:06 And the whole show is about sort of enterprise Python and sort of using it as a real proper language.

12:12 And I think, you know, it's probably that time frame and that era.

12:15 There's a lot of similar ideas about, hey, we could actually build real websites and, you know, web scale professional apps with this language.

12:23 It's great.

12:24 Yeah.

12:24 Yeah.

12:25 I mean, Python as enterprise was something that I think, you know, it existed in the early 2000s.

12:33 Like I think a lot of people were using, I mean, people were using like Zope or Plone, I think was the early, you know, incarnation of enterprise Python.

12:40 But I think, you know, really you got to hand it to like, I mean, first Ruby on Rails got people more comfortable with scripting languages for enterprise.

12:48 And then Django really helped a lot too.

12:51 Django really brought a lot of people to Python.

12:54 And I think the critical mass kind of started happening for Python kind of in 2005 on forward.

13:02 Absolutely.

13:03 And I think, you know, to shoot your horn and everybody else's a little bit, the stuff that's in PyPy, all those packages out there make it really hard to not consider Python.

13:14 The fact that you can just pip install magic, you know, and build stuff so quickly.

13:19 Like, why would I start from scratch, you know?

13:21 Right.

13:21 Absolutely.

13:22 We now know what SQLAlchemy is, if we didn't.

13:24 Can you kind of give us like a, I know it's hard to talk about code, so don't get too detailed.

13:28 But can you give me a little walkthrough of like what is involved in getting started?

13:32 Like what do I do in code to maybe make a basic query or connect to the database?

13:37 So there's different ways that you might want to use the system.

13:40 I mean, there's the level, the layer of I want to just make a query.

13:43 And that's pretty easy.

13:45 I mean, SQLAlchemy has a lot of different levels that it can be used at.

13:49 Like if you want to just connect to a database and make a query, that's like a really two line.

13:52 You make an object called an engine, which you give it a URL for a database.

13:57 And then the engine can just accept queries as strings directly.

14:01 And right there, you've already saved maybe 10 lines of code versus what it would be with the raw Python database.

14:08 Yeah, that's fantastic.

14:09 Maybe we should take a step back and talk about the layers.

14:11 And then we could talk about, you know, well, this layer, that layer, it looks like such and such.

14:15 So you have the core and you have the ORM.

14:17 Can you talk about that a little?

14:19 Yeah, so I actually wrote SQLAlchemy in this inside-out way where I didn't even worry about the –

14:25 I mean, I kind of knew what I wanted to do for the ORM, but I didn't even worry about that for a few months.

14:28 I wrote first just the engine, just like the concept of here's how to connect to a database using the Python DB API.

14:34 And here's a way that we can execute a query and get results back in a way that's a little bit nicer than what the raw Python database API gives you.

14:44 It's a little bit nicer and a little more consistent.

14:47 Then the next thing I did, which, you know, was soon after, was I wrote the table metadata system, which I had already done many times at different jobs.

14:54 You know, kind of looked in Fowler's book.

14:56 So you wrote – you create this concept of a Python data structure that mirrors what structures you have in your database.

15:06 So when you have a relational database, it kind of – without using it at all, it has a fixed schema.

15:12 You have, like, tables that are kind of – they're not – we might call them physical tables, which they're really not physical.

15:17 They're just on a disk.

15:18 But they're tables, and the tables have columns, and they're kind of like the things that are going to store your data.

15:24 So you model a mirror of all that stuff in Python so that you can write Python code that refers to this object structure that can then relate directly to how the database is built.

15:36 So that's called database metadata.

15:38 So it means it's information about the structure of your database.

15:42 Then from that, the table metadata objects in SQLAlchemy have all these methods on them, like select and update and, you know, table.select.where.

15:52 And when I say, you know, x.y.z, the dot in Python means that we're calling a method on an object.

15:58 And if you have this pattern where you say an object.call this, and then you get something back, and then you say .call that,

16:05 that's a pattern called method chaining, which means that you keep calling methods on this object that keeps returning a new copy of itself.

16:13 So that's a common pattern used in database query libraries because the method chaining is kind of like you're building up this data structure

16:20 that looks like basically a syntax tree.

16:23 A syntax tree basically represents like some code, but it represents the code as a set of objects connected together in a tree.

16:31 Yeah, that's fantastic.

16:32 I really like the fluent API that you put together there where basically you start with a query, and whatever you do to it,

16:37 what comes out of the query and just lets you build up.

16:40 Here's a where clause.

16:41 Here's an order by.

16:42 Here's a distinct or whatever.

16:44 Yeah.

16:44 At this point, I mean, that pattern is pretty commonplace these days.

16:48 I mean, when I first did it, it was not quite as commonplace.

16:50 I mean, basically Java Hibernate kind of has that pattern, but not in a nice way because Java is just not as slick of a language as Python.

16:58 And some of the, you know, it was a pattern that was around.

17:01 I think SQL object had it as well, you know.

17:03 It builds on that, but then we also did this thing.

17:07 I took this thing right from Ian Bicking's SQL object, which was that you overload Python operators like the double equals operator or the greater than operator,

17:16 so that when you have these little objects that refer to columns and values in a database, you can compare them with the Python comparison operators,

17:25 and they kind of auto-magically create another object like a comparison.

17:30 And if you go through the ORM or the core tutorial on SQL object, the core tutorial would really show how this works at the expression level.

17:41 The ORM tutorial kind of refers to it, but it's more about how to get the ORM going.

17:46 But what I'm talking about here is how the core query system works.

17:48 So basically these Python objects can kind of expressly be combined to create SQL queries.

17:56 And that's the core, really.

17:57 That's really what you get, as well as the fact that it returns objects, it returns Python objects.

18:02 Like if you query for dates, you'll get a Python date object back.

18:06 If you query for intervals, you can get a Python time delta back.

18:09 And there's all systems for building your own custom types if you want to get JSON back, things like that.

18:15 And then the ORM builds on top of that.

18:17 So if you're starting out, I mean, if you really want to learn the library from the ground up, I would start with the core and then move to the ORM.

18:22 If you want to get some quick results for a program you're writing, you might want to start with the ORM first and then do a deeper dive later.

18:29 I've kind of switched those directions around over the years, and I think there's just two ways to learn.

18:34 You can learn from the outside in or learn from the inside out.

18:37 And it's kind of your choice as to do you want immediate gratification or do you want more fundamental knowledge first?

18:45 Right.

18:45 Well, and how much are you going to be depending on the library?

18:48 Right.

18:50 If it's your core business app and it's really important that this thing keeps working, maybe you should deeply understand it.

18:55 If it's just a little blog you're throwing together, do it quick and then come back and learn it more deeply.

18:59 Right.

18:59 Start with the ORM and then go to the core to understand that.

19:02 Yeah.

19:02 Well, everyone I've seen, everyone's got a different way to do it.

19:05 So I just try to present both.

19:07 And also for a really long time, the core didn't have that name core, and that was a problem.

19:11 It was just called SQLAlchemy, and then there was SQLAlchemy ORM, and people didn't really, you know, obviously understand.

19:17 There's this whole thing with SQLAlchemy that is not the ORM.

19:20 So I introduced the word core probably about, you know, five years ago to say, hey, there's this whole other thing.

19:27 And I divided the docs into like two big left and right sections so that you can see there's this whole other thing.

19:33 There's only the core, and it's like an entire set, the right half of the docs.

19:37 Yeah.

19:38 I think you have them almost as columns next to each other.

19:40 I do.

19:40 I kind of imitated.

19:41 I think Django's documentation at the time had that thing going on with the two columns.

19:45 So I did that to say like, look, this is whole core thing.

19:48 If you don't like the ORM, because I knew the ORM was going to be, the ORM is more opinionated.

19:52 I knew it would be controversial.

19:54 I knew that I personally didn't like ORMs very much.

19:57 And this was going to be the ORM that I was actually going to like.

19:59 But I knew all the other commercials like me were going to hate it.

20:02 And those people still around today who hate ORMs.

20:04 So I wanted to say like, look, I get it.

20:06 Don't use the ORM.

20:07 Use the core part.

20:08 And then when you kind of use it for a long time, you realize, hey, this redundant thing

20:12 I'm doing over and over again could just kind of be automated by the ORM.

20:14 Then you can use the ORM when you see there's a need for it.

20:17 So that's kind of how I saw it being used.

20:19 But other people go the other way.

20:21 Sure.

20:21 So you've sort of talked about the learning perspective of when I might use the core versus

20:26 the ORM.

20:26 But are there like performance or other patterns?

20:29 Like what other considerations do I choose between the core and the ORM model?

20:33 There's a lot of performance.

20:35 There's a lot of, yeah.

20:36 So kind of the unspoken drama of SQLogamy for all the years it's been out has been the performance

20:42 drama.

20:42 Because coming from a Java background, you know, in the 90s, early 2000s, at least back then,

20:50 when we did Java programming, we really didn't pay too much attention to the way our code

20:55 was written might be slower or faster.

20:57 I mean, obviously, you know, if you're doing like order of N, if you want to do something

21:00 once rather than N times, things like that.

21:02 But just if you do two nested loops or you do this special library call instead of the

21:07 loop, is that faster or slower?

21:09 You don't see that much in Java because all the standard libraries in Java are written in

21:13 Java as well.

21:13 But with Python, it's mostly CPython.

21:19 There's this big gap where if you do something in a for loop, it might be very slow.

21:24 And you do the exact same thing with like the zip call.

21:28 It's a million times faster because the zip call is written in C.

21:31 So there's this whole thing with Python where you have to kind of use the standard library

21:37 as much as you possibly can because you want as much data processing to happen in native C

21:42 functions rather than in Python.

21:45 So I didn't know that in the first few years of writing SQLAlchemy.

21:48 And I didn't really know much at all about what was fast and what was slow.

21:50 And that's pretty common when someone starts with Python.

21:54 So up and through to version three, the thing performed horribly.

21:58 And then there was a lot of blog posts and other competitors that came out to try to challenge

22:03 the performance of SQLAlchemy.

22:04 And I went on kind of a multi-year mission to improve the performance of the system.

22:09 So these days, and the most recent incarnation of that has been more performance improvements

22:14 in version 1.0, but also a lot of new documentation that I've added in version 1.0 that refers specifically

22:20 to the notion of performance.

22:22 It's actually in our FAQ now.

22:23 There's a whole section of my performance.

22:24 There's entire sections of how to profile an application.

22:28 It's, and there's a lot of sample tests, example suites in version 1.0 where you can compare

22:35 different methods of doing the same thing for their performance slash ease of use tradeoff.

22:42 So I have examples of, you know, if you want to do this query, how fast is this query using

22:47 the core by itself, how fast is this query using the ORM, but loading individual columns,

22:52 how fast is this query if we use the ORM full blown, and what are the tradeoffs, and how

22:57 can we switch between those two systems?

22:59 So that's really interesting.

23:01 I have a couple of performance questions I'd like to ask about.

23:04 First, you said you went on this multi-year mission to speed it up, which is, that's a lot

23:09 of dedication, and that's awesome.

23:10 What kind of performance gains did you get?

23:13 Did you like double the speed, or what happened?

23:15 Over the course of, since version 3, I'm sure we've quadrupled the speed of it, because

23:20 it was really, really slow in version 2 and 3.

23:23 I mean, there's things, there's all kinds of Python idioms that change performance so

23:27 dramatically, and it's kind of a way of life for me now, but if you don't know, if you come

23:32 from like C#, it's probably going to be really painful to learn.

23:36 Like, if you do things like, if you want to check if a key is in a dictionary and add

23:40 a value, if you do that check with using, if you basically try to get the key, and then

23:46 it raises a key error, and you catch the key error, and you say, okay, let's put the value

23:49 in now, that's going to be way, way, way, like five times slower than doing it by just doing

23:54 a if key in dict first.

23:56 Right, absolutely.

23:57 Assuming the dictionary is usually not going to have the value.

24:02 Yeah, because it's actually the throwing the exception that is the raise, yeah, it's super

24:06 expensive.

24:06 Yeah, crazy.

24:07 So that's one.

24:08 Then there were things like, if you have an object, and you want, there's this pattern called

24:13 the proxy pattern, it's a programming pattern, where if you have an object, it has a bunch

24:16 of attributes, and as you call those attributes, it's actually sending the message to some other

24:20 object that's kind of embedded into it, it's called a proxy, and that's a very common pattern.

24:24 If you'd use the, there's a really easy way to do the proxy pattern in Python, which is

24:28 use this magic method called get at her, it's double under get at her.

24:31 And you get at her is past the name of the attribute trying to get, and then you say,

24:35 oh, let's call that attribute from our little nested object.

24:38 Crazy slow if you do it that way, compared to if you actually put explicit descriptors on

24:44 the proxy class that go directly to the inner proxy class.

24:48 So very early SQL alchemies, I use a lot of get at her, and people said, hey, this get

24:53 at her is crazy slow.

24:53 This thing is using get at her everywhere, and it's super slow, because they're all function

24:57 calls, and function calls are very expensive in Python.

24:59 So, got rid of all the get at her.

25:01 You do this procedure in programming, you do a lot with Python, which is called inlining.

25:07 Inlining means if you have some operations where method A calls method B, calls method C, calls

25:13 method D, you inline them.

25:15 You take all the code and you unwrap it into one big function, so that there's no function

25:20 calls within it.

25:22 Inlining is something that you kind of get someone for free if you use a newer interpreter

25:28 like the PyPy interpreter, which has a just-in-time compiler that does a lot of inlining for you.

25:33 That's really interesting.

25:33 I was going to ask you how SQLAlchemy works with PyPy, and have you tried Pysten from the

25:38 Dropbox project?

25:41 I haven't used Pysten yet.

25:42 Pysten looks like, I mean, I haven't looked at it much.

25:46 It looks like kind of the next incarnation of a thing called Unladen Swallow, which, if you

25:51 remember, that was, I think, a Google project where they were trying to use the LLVM compiler,

25:56 and I think that's what they're doing with the Pysten thing.

25:59 Oh, interesting.

26:00 The same one that Apple uses, yeah.

26:02 Yeah, I don't know what I'm talking about, but there's a compiler just-in-time platform

26:08 called the LLVM, and I don't know much about it.

26:09 Yeah, that was Chris Lattner's project, and he's the guy who created Swift, and I think

26:14 they're using it there for Swift as well, so interesting.

26:17 Yeah, I haven't used Pysten.

26:19 Yeah, I mean, I see some news about it.

26:22 I know Guido works at Dropbox now, so I guess we'll see.

26:25 Yeah, that should give Pysten a little more credibility than maybe just a random project,

26:31 right?

26:31 Yeah.

26:32 Michael here.

26:52 Thank you so much for listening to and spreading the word about Talk Python TV.

26:57 The response to the podcast continues to be wonderful and humbling.

27:01 I have a quick comment about supporting and sponsoring the show.

27:03 I'm still looking to line up stable corporate sponsorships, but I wanted to tell you about

27:07 a community-based campaign I'm launching to allow listeners to directly support the show.

27:11 We are running a Patreon campaign.

27:14 You might not have heard about Patreon, but it's kind of like Kickstarter for things like

27:17 podcasts, which release frequent small deliverables rather than one-off large engineering projects.

27:23 Visit patreon.com/mkennedy.com/mkennedy and watch the video to see how you can donate as little

27:31 as $1 per episode to support Talk Python To Me.

27:34 This is your chance to ensure that the Python community continues to have a strong public voice.

27:41 Consider supporting us today at patreon.com/mkennedy and thanks for listening.

27:45 It works with pypy though pretty well, Sequel Alchemy.

27:55 Yeah, it took a long time for pypy because the pypy itself had a lot of quirks years ago.

28:12 And yeah, pypy is on our continuous integration system.

28:18 pypy doesn't use the same kind of garbage collection that CPython does.

28:22 And garbage collection is kind of a big deal when you write a database library because it affects

28:30 when you have reference cycles getting garbage collected.

28:34 A lot of our tests would fail because they were relying upon things being garbage collected automatically.

28:39 And that doesn't really necessarily happen as much with pypy.

28:42 So basically, if you have a test that doesn't tear itself down correctly and you still have a connection open

28:47 that access a table and you run Postgres, the teardown of your test will refuse to drop a table

28:54 because Postgres still has a lock on it.

28:55 So there's a lot of issues like that that pypy keeps us very honest on.

28:59 So that's good.

29:00 pypy seems to use more memory, but it runs definitely much faster.

29:06 And I think it's a great interpreter and I'm really hopeful for pypy.

29:09 Yeah, that's really interesting.

29:10 So we're into it.

29:11 Yeah, for sure.

29:12 Cool.

29:13 So one thing that I think is nice about the ORM version or the ORM layer is I can get these classes back from my database

29:23 and make changes to them and just basically call commit on the session.

29:27 So can you talk about the change tracking?

29:29 So that's the unit of work.

29:31 So this is the thing.

29:32 When I first read SQLogamy, that was the part of this whole thing that I was not even familiar with myself

29:36 because I read Fowler's book and he's like, you know, enterprise software uses this unit of work thing

29:41 where you just kind of accumulate the changes and then you just push them all in one shot.

29:44 And I was like, wow, that's crazy.

29:47 And, you know, I could see how it worked.

29:49 I'm like, that's really, you know, pretty easy.

29:50 You just track it so that all your inserts and your deletes and your updates get kind of put into a big bucket and you sort them out.

29:58 And I wrote a very simplistic version of that for SQLogamy 0.1.

30:02 It was really bad because I really didn't quite know how to write it.

30:06 I was like, I mean, I knew how to write it, but I did this kind of coding, which I did more back then probably that I maybe would term seat of your pants coding,

30:15 where you kind of know that this part connected that part and this part connected that part.

30:19 But my brain couldn't see the whole thing at once.

30:23 And it produced, if you look, you can get the code, go look at old 0.2 of the unit of work and you will be like, what is that?

30:30 It's crazy.

30:31 But when I first saw it work, I was like, that's amazing.

30:37 Yeah, it's super nice.

30:38 It's really easy.

30:39 Because I was logging to SQL.

30:40 Yeah.

30:40 And I was like, that's, I think I said, this, that's when it got really, got me really jazzed.

30:44 Because like, this thing's going to be big.

30:46 When I saw it do that commit the first time and I saw it, all the bros go, I was like, wow, people are going to love this thing.

30:53 Yeah, that's really cool.

30:54 You know, I actually don't know the answer to this question, but do you have some sort of like optimistic concurrency tracking?

30:59 Like if two people pull it back, one person makes a change and saves it, the other one takes that object, tries to save it.

31:05 What's the story there?

31:06 There's a little bit of pattern that we have in there that I lifted entirely from what Hibernate does.

31:13 There's a pattern using a version counter that I don't know that it's that widely used because it kind of interacts with the isolation level you're using in your transaction.

31:23 I mean, really, the optimistic concurrency is something you can get in your database just if you set your isolation levels the way you want, if you set them higher.

31:34 The version counter idea is that you load the row from the database and the row will have a special column that's the version and say the version is five.

31:41 Do I have to indicate that?

31:43 Sorry, do I have to indicate that specially on my class?

31:46 Yeah, there's a whole, yeah, it's in the mapping option.

31:49 So you have a column that you declare as part of the versioning and then the mapper will actually load that in that number or it could be a date stamp if you want.

31:58 And it will add, when you go to update the row later, it will actually add that column to the criteria for the update statement.

32:06 So it'll be like update row, set whatever, where primary key is blah and version equals five.

32:14 And then if the update does not match any rows, that means version five is gone.

32:18 It means someone else changed version five underneath you.

32:21 And then they'll throw an error.

32:22 And that's basically one simple way of doing optimistic concurrency control.

32:26 It only applies to updates and deletes.

32:29 It doesn't apply to inserts.

32:32 Inserts is a whole different ballgame.

32:33 People want to be able to insert some value.

32:36 And if another thread tries to insert the same value, they want to use the one that's existing.

32:39 That's a hard pattern.

32:40 There's ways to do that in SQL, but none of them are automatic.

32:44 You have to kind of roll it the way you want to roll it.

32:45 Yeah, that's interesting.

32:46 But I think even that level of updates is really a nice feature to have.

32:51 Yeah.

32:52 So you talked about the mapping a little bit.

32:54 When I create these classes, basically what I do is I create a class that derives from some kind of declarative base instance I've created.

33:00 And then I declare the columns.

33:03 And how do I do that?

33:05 So the declarative system is, again, that was a system that I added to SQLAlchemy back when I did version 5 and I worked for the Charlie Rose website.

33:13 We didn't have any declarative system built.

33:15 And we had a cheap one that was built into the library.

33:18 And then we had a third-party thing called Elixir, which didn't quite do what I needed.

33:22 So I wrote a new version, kind of the third version of the declarative system, which is basically you have this base class that you can inherit from.

33:29 The original idea of SQLAlchemy was that you write – you would declare table metadata using just core.

33:35 You have a table object.

33:36 The table object has these column objects.

33:39 The name of the table would match the table in your database.

33:42 The names of the columns would match the names of the columns.

33:43 They have little data types like a string or a date time or an integer.

33:47 And then you would take your class, which would just be kind of a blank Python class, and say map.

33:52 You'd map the class to the table.

33:56 So the declarative idea is how most ORMs that we deal with today look, where you don't see those two separate things.

34:02 You see kind of this class that has, like, class, my class.

34:05 And then it has, like, id equals integer, and then first name equals string, and last name equals string.

34:11 It looks like Django.

34:12 It looks like every other ORM you see.

34:14 So the declarative thing was added to do that.

34:16 So you basically use the concept of a Python class.

34:19 You say class, and then you my class, and then you put a colon.

34:23 And then you indent, because Python, we're doing white space.

34:26 And then you basically declare variables that refer to the columns that will be in your database table.

34:34 And then as far as how that creates the database or the database exists already, that's kind of a separate concern.

34:39 But you're basically defining what your schema will look like in line with your class definition.

34:46 And it's a pretty common pattern that is familiar to everyone at this point.

34:50 Yeah, I think it is a pretty familiar pattern.

34:52 It's really nice.

34:53 And it also ties into, when you're talking about doing the queries, those columns, basically, you can do operations on them, right?

35:01 So I could, say, create, like, a query of book.

35:03 And I'd say order by, you know, book.name.desc or descending or something like that, right?

35:10 So they kind of play multiple roles, letting you do the queries as well as defining the schema and the constraints and all that, right?

35:16 Yeah.

35:17 So a funny story also is that it wasn't like that in the beginning.

35:21 It used to be that you'd have your mapper and your class, but the class itself wouldn't have these attributes directly on it.

35:28 It used to be you had to still talk to the table object.

35:30 So if you had, you know, class user and it mapped to a table object called users, you would still have to query in terms of users.c.firstname.

35:39 It wasn't actually on the class object because I didn't feel comfortable doing that for a while.

35:43 And even though SQL object did do that, I kind of was hesitant to do that.

35:47 But then once, I think maybe in version 4, I said, hey, you know, actually I can put the attributes on the class and have them be the actual column objects that we – they're actually not column.

35:58 They're actually a wrapper around the column, but it looks like the column object.

36:01 So, yeah.

36:01 And then that led to this whole new thing called the hybrid property, which is like the fact that Python has this amazing symmetry in that everything is ultimately an object.

36:12 If you have a class, it's still an object.

36:14 So if you say myclass.foo versus an instance of myclass.foo, they look the same in code.

36:20 And that kind of blew my mind.

36:23 That was kind of a big discovery I made about Python like maybe five years into it that Python has this – I mean, I knew I loved Python's object model all the way through.

36:31 But really, the hybrid thing really blew my mind.

36:35 And there's a – I think I wrote a blog post a long time ago that I could probably point you to where I show – look at this hybrid thing where you can have a class with a method that does something at the class level and something else at the instance level.

36:46 And it just was amazing.

36:47 And I use it a lot on some subsequent jobs, and we did really amazing stuff with it.

36:51 It's one thing to sort of realize that that feature is there.

36:54 It's the other to say, let's use that to make a data access layer really work super well and make it – you define one thing, and it has this dual purpose, which is really cool.

37:02 Yeah.

37:03 There was a guy – I think his name was Ant Asma, and he was working on SQL for me a long time ago, and he first showed me a pattern that had that hybrid thing.

37:11 And I was like, wow, and I really dove into it in the coming years, and I really realized it's a very handy pattern.

37:16 Excellent.

37:17 So you're talking about creating these columns and stuff.

37:20 What if I've got a field that's – or a column that's not nullable or it needs a default value?

37:26 How do I do that?

37:26 Yeah.

37:27 So the table metadata part – so say we're back in core.

37:29 We're not doing over them again.

37:30 So we're back in core – one of the goals of the core was that it would be this comprehensive system of building a whole schema from scratch.

37:38 So meaning if you have a blank database, you can build up all of your objects in Python and then say, you know, metadata.createAll.

37:45 And that will emit what we call data definition language, DDL, which we know as createTableStatements and createSequence and createIndex.

37:54 It will emit all those things for you.

37:56 So in order to make that feasible, we have to also have support for attributes like if a column is nullable or not, if a column is a primary key, if a column is indexed, if there's a unique constraint on a set of columns.

38:12 So SQL Alchemist schema system has objects and flags that represent all of these concepts.

38:17 And it's pretty comprehensive.

38:20 And we have full support for all kinds of data types and numerics and floats.

38:25 And we have server defaults.

38:28 And we have every common thing except for triggers.

38:31 We don't have a system for triggers.

38:34 You can define your own system for triggers.

38:36 And we don't have store procedures.

38:37 So all the main things that you normally, your day-to-day database schema stuff like tables, columns, constraints, foreign keys, indexes, nullability, and defaults are all first class.

38:49 Other things you can define like sequences we have.

38:52 Other things like triggers and store procedures, you can define your own Pythonic little objects that will do that for you but more in a database-specific way.

39:02 So anyway, when you do the ORM and you're doing declarative mapping, you're still using core column objects in your declarative mapping.

39:10 So when you make a column object, you can still specify all the table attributes as well as you do your declarative mapping.

39:16 So that's kind of the answers.

39:17 They kind of meld together.

39:19 And it was not that way in the beginning, but it evolved that way over the years to work that way.

39:22 Yeah, it's really smooth.

39:23 So I have these classes, and I can map them to existing databases.

39:28 And you also said it emits DDL, which I can create my tables from, right?

39:32 Yeah.

39:33 That's just a function call, isn't it?

39:35 Go to my declarative base.

39:36 Was it create all, something like that?

39:37 Yeah.

39:37 Yeah.

39:38 So the metadata object is kind of this collection of tables and other things, mostly tables.

39:44 And you could say metadata.createAll, and you give it an engine to point at, like a database connection.

39:51 And it will go on that database and see which tables don't exist and create all the tables that don't exist yet.

39:56 That'll create them from scratch.

39:57 What if I say add a column?

39:59 Right.

40:00 So the whole thing with adding a column and doing that, we call that migration.

40:03 So, you know, as we go through this, it's funny.

40:06 Nothing in SQLAlchemy was there in day one.

40:07 Everything was like I had to figure it out and learn it.

40:10 Like I actually did a talk for SQLAlchemy version 4, and they asked me in the audience, what do you do about database migrations?

40:15 I'm like, well, I don't really know much about migrations, Automata.

40:18 I used to do them.

40:19 I do them by hand.

40:20 So I don't really have a migration tool.

40:21 So some people wrote a tool called SQLAlchemy Migrate.

40:25 And that went pretty well for a while and is still in use today.

40:28 And then I eventually wrote a newer version called Alembic.

40:32 So that's kind of a separate package that you can download.

40:35 And Alembic builds on top of SQLAlchemy as does SQLAlchemy migrate, which is still in production.

40:41 And that defines a whole bunch of new operations on top of SQLAlchemy that are specific to being able to add columns, to alter the types of columns, and alter the nullability, you know, to remove columns, rename them, things like that.

40:54 And my database migrations is a whole big issue unto itself.

40:58 So it took me several years to really get my head around that, to, again, use that in production on the job because the best way to write open source code is to be, you know, what we say, eating your own dog food on the job.

41:09 So I've used Alembic a lot.

41:10 Alembic is now really widely used.

41:12 It's very prominent within OpenStack, which I work for.

41:14 And that's kind of the path to doing migration is kind of separate from SQLAlchemy core itself.

41:19 It's kind of built on top.

41:20 Yeah, that's cool.

41:22 How about relationships?

41:24 Like if I've got a bookstore and I've got categories and there's books in the categories and reviews about the books and so on.

41:29 So that aspect of object relational mapping is really where object relational mapping starts to look like this concept that is really going to change.

41:40 The way a relational database looks compared to the way an object jointed data model looks.

41:46 So in relational schemas, if we want to have a table that refers like a row in this table, it's like say it's an account row or something, and it needs to refer to like some other related concept like an address.

42:01 In relational database, we use something called a foreign key constraint, which means that we're going to – you don't have to actually use the constraint, but you would say that this particular column or set of columns in our account table actually refers to the primary key of a different table somewhere else.

42:18 And that's called a foreign key.

42:20 And the constraint in the database makes sure that the value you put in those columns actually exists in the remote table.

42:26 So in ORM, ORMs have this concept of either making a one-to-many or a many-to-many or a many-to-one relationship, which models in object-oriented terms what that looks like in terms of collections.

42:41 So if I have – in all of our tutorials, we have this concept of a user object, and the user has one or more email addresses, and it's a really old example, but I keep going back to it because I just know it really well.

42:53 So your address table in your database will have a foreign key called here's an email address, and what user is this – what user is the owner of this address?

43:04 It'll have a column called user ID or something, which will refer to the primary key of the user table.

43:09 So the user table kind of refers to – I mean, the address table refers to the user table.

43:14 In your model, you want to have the user object has a one-to-many to the address class.

43:23 So it's kind of – if you define a one-to-many, it's kind of defining it in the other direction, like user points to address.

43:29 But in your relational database, it looks more like address points to user.

43:34 But it turns out that every one-to-many relationship in object-oriented land is the same as the many-to-one in the other way.

43:40 So SQLAlchemy basically lets you make the one-to-many and the many-to-one at the same time using this concept called a backref.

43:49 This is a little bit rambly, but if you look at our tutorial, it's laid out in a much more organized fashion.

43:56 So, yeah, there's basically an object – there's a function in SQLAlchemy or called relationship where you associate it with a parent class and then you give it a target class.

44:05 And relationship will make sure that the foreign key setup that is expected exists between those two tables.

44:13 And it will kind of mediate that pattern on behalf of the object model to the database schema.

44:20 Yeah, that makes it really easy to almost not even worry about that.

44:23 So is that like a lazily evaluated thing?

44:26 So if I pull back a book, it's not going to go pull back all of its reviews instantly unless I touch that.

44:31 Right. So that was one of the patterns in Fowler's books called lazy loading, which is the concept that if you have an object and you have some kind of method or attribute that represents data from your database, you might have the object and it didn't actually load that other data.

44:45 It's only when you touch that attribute or method that it suddenly goes, oh, I have to load this stuff and I have to go out to the database and load more information.

44:51 That's an efficiency pattern called lazy loading.

44:53 It means you can load a lot of objects and not have to go through all the time of loading all the related things.

44:59 All ORMs do lazy loading, just every ORM I've ever seen.

45:03 But lazy loading is also at the core of this somewhat controversial concept in ORMs called the N plus one problem.

45:12 Right. That's probably the most common performance problem people run into is they don't realize they're doing this N plus one query.

45:19 So I get one book or I get one book.

45:23 So that's the one.

45:24 And then I try to touch the other items or a set of books.

45:27 I loop through them every time I touch that property.

45:29 I'm doing one more query.

45:30 So for N books, I do N more queries against the database, right?

45:34 Yep.

45:35 That's the N plus one problem.

45:36 And every article ever written that is against ORMs will have that entire example as the cornerstone of its premise.

45:45 So the other thing that SQL Alchemist had from day one, besides the unit of work, the other cool thing it had was this thing called eager loading, which means that you can, if you kind of know you're loading these users and you want the addresses also, you can say to the query, hey, load the user.

46:03 But then also, can you also load the addresses at the same time instead of me doing the N plus one problem?

46:08 And in that way, you will only do, if you do what's called joined eager loading, it will actually do a join, an outer join.

46:14 It will load your user row.

46:16 And while it does the select for the user row, it will do an outer join to the address table so that it gets zero or more address rows at the same time.

46:25 And then the query object knows how to organize the product that we get back.

46:31 Because if you do a join in SQL from one table to a many, you'll get the one row repeated.

46:37 So the ORM knows how to deduplicate those rows on the left side of the join.

46:42 And that's called eager loading.

46:44 And one of the tradeoffs of eager loading is that when we write a query, we say, hey, Mr. Object Database thing we don't know anything about.

46:53 We want users.

46:54 And then later I want addresses.

46:55 If I have to say, hey, I want users, but I'm also going to want in the future these addresses, that is what you would call a leaky abstraction.

47:05 Because it means that the way the system is going to do its load, you need to tell it up front how to do its job to some degree.

47:12 You have a little bit of awareness that this is not just like some weird LDAP database.

47:17 This is actually a relational database.

47:19 And these two things are in two different tables.

47:21 One of the things SQLAlgmy embraced very early on, and one way that is very different from most other ORMs, is that it did not try to get around that issue by pretending the database doesn't exist.

47:34 It's like, you know what?

47:35 This thing is to automate our work.

47:38 It's to make our lives easier and write less code.

47:40 It's not there to hide that there's a database.

47:43 So we need to be aware of things like if we want to load the 1 and also the N and not have N plus 1, we need to give the query a hint up front to do that.

47:53 So that's ego loading in a nutshell.

47:56 Yeah, I really like that philosophy that you guys take looking, you know, saying, look, we're not going to hide this from you.

48:01 We're going to make your life easier and less error prone, but we're not going to treat you like an idiot or hide this stuff.

48:08 Or, you know, you can still get to what you need to.

48:10 That's great.

48:10 Right.

48:11 How do I do that in code?

48:12 So what do I say on my query to say I want to eager load books and reviews?

48:17 So modern SQLAlchemy, I mean, this is, again, APIs have changed dramatically.

48:22 Modern SQLAlchemy, you would say query for a user class, and then there is a call on the query, which is a method chaining system.

48:30 You say query.options, and then you add to the options function a bunch of constructs called join load or subquery load or immediate load.

48:40 There's all these little directives we have that you can say, hey, query for user.options, join load user.addresses, join load address.related something else.

48:50 And you can kind of give these directives in the options of the query as to what other things it should be loading onto the primary thing that you're loading from.

48:59 Oh, very cool.

49:00 You've been talking about the different versions of the API.

49:04 How have you dealt with versioning that API over time?

49:07 Well, I mean, it's been 10 years.

49:09 And what I kind of did in the early days, you know, we were first 0.1, and then we were 0.2, and 0.3.

49:19 And back around 0.3, someone wrote an O'Reilly book for us, and we're like, you should be 1.0.

49:25 And I'm like, no way.

49:26 We are so early.

49:27 Like, this thing is totally not at all how it's going to be.

49:30 So with 0.1, first of all, I didn't even publicize the library at all in those early days.

49:36 I just maybe announced it on one mailing list.

49:38 And people just came, kind of just came and started using it.

49:42 They started using it in production.

49:43 And 0.1, 0.2, I made enormous, enormous changes that just totally ripped the rug from everybody.

49:50 But we were 0.1.

49:51 Sure.

49:51 Did it make you nervous to hear that they were, like, so heavily dependent on this new thing you created?

49:56 People got way too dependent too fast.

49:58 And in early days, I think Reddit started using SQLAlchemy, maybe version 0.4 or 0.5, you know.

50:06 And there were some messages from them years ago.

50:09 I don't know where they were, so whatever.

50:11 Where they were kind of complaining about SQLAlchemy changing its API too much.

50:15 And really, the API, I made the biggest changes before version 0.5.

50:21 0.5, we really started settling down to this new kind of thing with the query, with the attributes linked to the class.

50:29 And then we had to keep the API changes all the time.

50:33 And from 0.5 on forward, even from 0.3 and 4, I always made sure that there was a backwards compatibility path.

50:40 I made sure that everything that changed was always announced.

50:42 And if something happened that was unexpected, that was a regression, and we would address it.

50:48 And it was almost never that I said to someone, you're stuck.

50:52 You'll have to just change your code now.

50:54 Like, I always made sure that there was a backwards compatibility path.

50:57 But it meant you had to kind of, especially in the early days, it meant you had to really kind of watch when we did the next major number, which is really the minor number.

51:04 But we call it the major number.

51:06 You needed to watch what was changing.

51:08 You needed to test your stuff ahead of time.

51:10 Things are better now.

51:12 Now that we have pip is much better.

51:14 And we can put up beta releases that won't automatically install for people.

51:18 And I always put very, very, very comprehensive migration notes for every version.

51:25 You can see them going back on the site, probably back to version 0.4 or 0.3, where I say, here's the things that are going to change.

51:31 Here's the things to look out for.

51:32 Here's the things to test.

51:33 And after version 5 and version 6, version 7, it changes less and less each time.

51:38 People went from version 7 to 8 with very few hiccups.

51:42 You know, 8 to 9, 9 to 1.0.

51:44 There's really nothing you have to change in your code anymore.

51:47 We like people to test because it might be surprises that we didn't expect.

51:51 And we fixed those.

51:54 But APIs very rarely break in any kind of backwards incompatible way.

51:59 The only time it's a problem is that if there's something that was broken in SQLAlchemy 8 or 9 and people were relying upon the broken behavior.

52:07 That's the hardest one is when people rely upon broken.

52:09 And they didn't know it was broken, so you can't blame them.

52:11 When we fix things that were broken, we'll usually make it raise a warning in the current version they're using.

52:18 Like 0.9 has something that's broken.

52:20 We'll have it admit a warning, so they'll update, upgrade to 0.9.9.

52:24 And all of a sudden they see these warnings.

52:25 It'll say, this will be an exception to 1.0.

52:27 So they'll kind of know to change it there.

52:29 But you do need to be watching the versions and the changes.

52:32 I've worked really hard to make all the change logs very easy to find on the site.

52:37 I mean, other projects I can never find what changes.

52:40 It's very hard to find.

52:41 I think our site is very transparent, and you can see everything happening in many ways.

52:47 And it's very highly linked as far as the development log.

52:51 Yeah, that's excellent.

52:52 It is super comprehensive.

52:53 So one thing I think is interesting is maybe talking about some of the notable users of SQLAlchemy.

52:59 I know you've talked a little bit about a couple of them.

53:02 But you've got a whole list at sqalchemy.org slash organizations dot html, right?

53:07 Yeah.

53:08 So, yeah, I think I first saw MongoDB was doing that.

53:12 They had this enormous list of people using their stuff.

53:14 And I'm like, I want that, like a big wall of honor of all my – when some big company uses your product, it's like, wow, I'm important.

53:23 I did this important thing.

53:24 So up there, there's a lot of big – Reddit was the huge one.

53:29 Reddit only uses the core.

53:31 They don't use VRN.

53:32 Their code is very unusual in the way they do things.

53:37 And I'm glad they don't use the ORM because they would really be killing it.

53:40 Yelp uses a lot of SQLAlchemy.

53:42 I've been – I visited Yelp.

53:43 They use the ORM a lot.

53:45 Uber, you know, uses SQLAlchemy.

53:48 Hulu.

53:50 Dropbox is there, right?

53:52 Yeah, Dropbox.

53:53 Yeah, Dropbox is another one that uses the core.

53:55 They're another – you know, they actually wrote some blogs that were a little bit critical of the ORM, which I was kind of annoyed about.

54:01 But they use the core still.

54:02 They use it.

54:03 Yeah, a lot of animation studios use it.

54:08 There's a lot of companies that use it who I know use it, but I don't really exactly have permission.

54:14 You know, one of them is, you know, kind of a Mickey Mouse operation anyway.

54:19 A lot of companies that are up there in organizations, you know, Fedora uses it.

54:23 And, of course, now I work for Red Hat and I work for OpenStack.

54:26 And OpenStack is this cloud computing platform that's open sourced.

54:30 And a lot of companies like PayPal and IBM and AT&T use it.

54:34 And that thing uses SQLAlchemy all throughout.

54:37 It's super heavy SQLAlchemy throughout OpenStack code.

54:40 So it's a very widely used library.

54:42 Yeah, that's really excellent.

54:43 You must be proud of it.

54:44 Yeah.

54:46 Cool.

54:47 And you should be.

54:48 So what about the whole async movement with, like, Twisted and AsyncIO?

54:54 How does that play with SQLAlchemy?

54:56 So the async thing is something, another Python story where it's something that I wasn't really that interested in.

55:03 But it's a big deal in Python, the async.

55:05 Because when I first came to Python, the two, you know, 800-pound gorilla things that existed were Zope and Twisted.

55:12 You know, the old school, you know, the old guard of Python is Zope and Twisted.

55:17 So Twisted is great.

55:18 It's still out there.

55:21 Async, the problem with what we call explicit async is that it's not really compatible with this idea of lazy loading that we see Martin Fowler first described.

55:31 The concept that when you access something on a class, it's going to automatically, behind the scenes, go and fetch data from a database.

55:39 It means that it's going to do I.O.

55:42 And when you do things with the async model, the async model is all about that input-output operations should never be implicit.

55:51 They should always be something that you code into it and say, hey, this is about to do I.O., so please stop what we're doing, do the I.O., and then let the whole interpreter go and do something else.

56:02 And we do that in the newest version of Python's async I.O., which is in Python 3, you do a yield.

56:09 And it's very slick, but it still is a statement that is an interruption in your code.

56:14 Right.

56:15 You have to explicitly say, I'm waiting here, I'm waiting here, I'm waiting here.

56:18 And at any of these points, you can take that thread and go do something else with it until I'm done talking to the database or something, right?

56:23 Right.

56:24 So people have wanted SQLAlchemy to work a twist for a long time.

56:28 I've looked into it.

56:29 I actually had some clients pay me to look into it.

56:32 And every time I look into it, it's like, eh, it's really not worth it because we have to completely rewrite the whole core engine part of this thing to be this kind of inside-out callback routine.

56:44 This is with classic Twisted.

56:46 How about the async I.O.?

57:16 Right.

57:17 So with async I.O., there's a big blog post I just wrote in the past couple of months that's about – it's called Asynchronous Python and Databases.

57:24 And it's gotten a lot of views.

57:27 And it kind of sums up the whole situation.

57:29 I went into async I.O.

57:31 I looked at it.

57:31 I think it's really great.

57:32 And it's definitely a huge improvement.

57:34 Async I.O., I actually could write – a lot of SQLAlchemy could be made to work very seamlessly with async I.O.

57:41 In that it wouldn't require that much code to be replaced.

57:47 What's nice about async I.O. versus the older callback style is that there's kind of some ways you can have the async thing be explicit on the beginning of where a method does something.

57:57 And then up the call chain at the end.

57:59 And you can kind of get away with the calls in the middle not knowing that they're doing async somewhat.

58:05 There's a little bit of Python syntax helping here.

58:08 So I looked into how I could make the core do async I.O. for real.

58:13 And actually, it can be done.

58:14 I started writing parts of the pool.

58:15 And it would be a lot of work, but it could be done.

58:18 But in this blog post, I wrote, like, do we really need to do it?

58:21 Because when you use async packages, even when you use Twisted, and Twisted has been doing this from day one for databases,

58:28 when you have code that really works better with threads and traditional, you know, preemptive, not preemptive, but traditional threading,

58:34 you throw that code into what's called a thread pool.

58:38 And then the asynchronous part of your at library can actually call into routines that are invoked synchronously within a thread pool,

58:46 but they are given kind of an asynchronous facade.

58:48 And I think that is the best way to do it.

58:51 If you're writing a lot of database code, you really don't need to worry.

58:55 You really shouldn't have to be worrying about deferring to I.O. within the database,

58:58 because you really, in the database code, you're in a transaction.

59:00 You definitely need to do A, B, C, D, and E.

59:03 It's not really a mystery when the I.O. has to happen.

59:06 It's very kind of boilerplate-y.

59:08 And one of the entire reasons that SQL Alchemin exists is to reduce the need to type repetitive boilerplate.

59:14 That's the whole point of it.

59:15 You don't need SQL Alchemin to do anything.

59:16 You can do it all by hand, but the point is to reduce boilerplate.

59:20 So to that extent, I think you're better off putting your database code in a thread pool,

59:25 or using a system like eventletg event that is an implicit system.

59:29 And the blog post really goes into this in great detail as to show the benchmarks, why I think it's better.

59:34 And I think a lot of people agree with it.

59:36 Cool.

59:37 Okay.

59:37 You'll have to send me the link to that blog post.

59:39 I'll put it in the show notes.

59:40 Excellent.

59:41 So I think we've been talking for a while.

59:43 It's probably a good place to wrap up the show.

59:45 And a question I ask the guests on my show on the way out the door is, you know, there's a ton of packages on PyPy and out there in GitHub and stuff.

59:53 What ones are notable to you?

59:54 What ones are your favorites?

59:55 Seek walk me, of course.

59:57 Yeah.

59:58 On PyPy?

01:00:00 What's your favorite PyPy package?

01:00:02 I use pytest quite a lot.

01:00:04 I recommend pytest.

01:00:06 I recommend the mock library, which is actually part of the Python standard library now.

01:00:10 Mock and PyPy are the ones I see myself downloading most of the time.

01:00:17 You know, what else?

01:00:19 There's probably others, but they're not coming to mind at the moment.

01:00:22 I'd have to look and be like, oh, yeah, I use that thing a lot.

01:00:24 You know, I mean, obviously, NumPy is the big one.

01:00:27 Yeah, NumPy.

01:00:28 Yeah, NumPy is good.

01:00:30 Like I said, I think Sequel Alchemy is certainly one of the top ones out there.

01:00:34 Yeah, yeah, it is.

01:00:36 So before we call it a show, is there anybody or a thing you want to give a shout out to and call attention to?

01:00:42 Oh, well, my cats for helping me with the show today.

01:00:46 And, you know, just I'm really appreciative of the people that come to see my talks and the people that talk to me on Twitter and tell me how great I'm doing.

01:00:56 And even the people that, you know, naysay me as well.

01:00:58 I get good feedback from them.

01:00:59 So I'm really glad that everyone is kind of following along with what I'm doing.

01:01:03 Excellent.

01:01:03 Can people come and contribute to Sequel Alchemy?

01:01:05 It's on Bitbucket, right?

01:01:07 Yeah, I mean, it's a big code base and people do contribute all the time.

01:01:11 I get pull requests all the time.

01:01:12 through both Bitbucket and GitHub and you can send a pull request.

01:01:15 We haven't done any sprints in a long time.

01:01:18 I mean, that's something that we could do again if I had people to help me with that.

01:01:21 You know, and people in the OpenStack community help out.

01:01:25 So it's a very open project.

01:01:27 If you go to the website, there is a toolbar that has community links and it has a link called Develop.

01:01:33 And it'll show you all the ways that you can get involved with helping develop Sequel Alchemy.

01:01:36 Awesome.

01:01:37 Mike, thanks so much for being on the show.

01:01:39 It was a really interesting conversation.

01:01:41 Yeah, thanks for having me.

01:01:42 You bet.

01:01:44 This has been another episode of Talk Python To Me.

01:01:48 This is your host, Michael Kennedy.

01:01:49 I want to say thank you for listening.

01:01:51 And if you want to support the show, be sure to check out our Patreon campaign at patreon.com slash mkennedy or talkpythontome.com slash sponsor.

01:02:02 Remember to check out the show notes at talkpythontome.com where you can find all the links from the topics we discussed in the show.

01:02:09 And before you go, take a moment to become a friend of the show.

01:02:13 Visit talkpython.com, click on friends of the show, give us your email address, and you'll be in the group.

01:02:19 Smix, take us out of here.

01:02:20 Stay tuned.

01:02:40 Bye.

01:02:41 .

Talk Python's Mastodon Michael Kennedy's Mastodon