« Return to show page
Transcript for Episode #5:
SQLAlchemy and data access in Python
:34 Hello and welcome to "Talk Python To Me" a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy, follow me on Twitter where I am @mkennedy, and keep up with the show0:46and listen to past episodes at talkpythontome.com. This episode, we'll be talking with Mike Bayer, about SQLAlchemy. Let me introduce Mike.
:55 Mike Bayer is the creator of several prominent Python libraries including SQLAlchemy, Mako Templates for Python, Alembic Migrations and Dogpile caching. He's been working with open source software and databases since the mid 1990s.1:09Today, he's active in the Python community, working to spread good software practices to an ever wider audience. Mike is a semi-regular presenter at Pycon US and has also spoken at many smaller events and conferences in the US and Europe.1:22Follow Mike on Twitter at @zzzeek.
1:28 Welcome to the show.
1:29 Thanks. Thanks for having me.
1:31 Yeah, I've been a long time fan of SQLAlchemy and you know, when I started his podcast I was thinking of who do I have to have on the show and you were definitely on the list.
1:40 That's very flattering.
1:43 You built some great stuff. So, before we get into the details of SQLAlchemy and how we should use it and why it's so awesome, maybe let's take a trip down the memory lane, step back a little bit and just, you know- how and when did you get into Python?
1:56 I got into Python after kind of hesitating to get into it, meaning I knew about it and it seemed like something that was appealing to me, but I kind of resisted mostly because of the white space thing, this was back in the probably 2003, I was working at major like Baseball;2:12Most of my programming career before Python was doing a lot of Java and a lot of Perl,and I really liked the idea I've of object oriented languages, now I liked the idea of scripting languages;2:25So Java was very object oriented, Pearl was not too good at objects, but it was a scripting language so I really wanted it to be some scripting language where I could write much cleaner code than you've got in Perl...
2:37 Yeah, so you kind of had half of what you wanted in one language, but you wanted to bring it all together, right?
2:43 Python really looked- Python really looks like it has but it's got that stupid white space thing. And I just didn't want to get into it and then one day when I was working at Baseball we were rolling out this client application called WinCVS I think it was, it was you know2:59Windows CVS client and we had to roll it out to the people who were not programmersand we needed to add hooks to it so that it could- people could tag things and send things to different servers by tagging, and when CVS's scripting language was Python.3:13So I had to learn ten lines of Python to do this script and basically I spent like two hours with the white space thing and I was like "Oh, we do that white space anyway when the code is clean and I'm actually totally fine with it, so let me just learn Python now that I've spent ten minutes getting used to white space"
3:32 That's really funny, that it was like a real barrier to you, right? That just came down.
3:38 That really bugged me, because I was a really sloppy coder back then, Python really made me a much cleaner programmer.
3:44 Yeah, that's a good point, I hadn't really thought about that, but it definitely does make you- if you think about that. Any of a lot of the modern editors make it almost transparent to you but I think there's the mental concept of it and then there's the reality of it. The reality is that it's actually not a big deal at all,it's super smooth but when you are coming from say Java or C Sharp or C++, it seems like a huge deal.
4:05 Yeah, you get over it really quickly.
4:08 Excellent. You bring up WinCvs- boy, I used to use that thing back, in late 1990's or something, that brings back the memories. That's awesome. Let's talk about something more modern than WinCvs, let's talk about SQLAlchemy.
4:24 So, can you just tell- I suspect that, you know, 90% of the listeners are fan or are familiar with SQLAlchemy but there is going to be some folks who don't know, or maybe they know it as a buzz word, can you just tell everyone what SQLAlchemy is?
4:37 So SQLAlchemy is this library that is- it's all about Python and it's all about SQL and databases and interacting with databases.So it's basically when you need towork with the relational databaseyou are going to work like Postgre or MySQL there is all kinds of things you need to do with the database- you need to write, you know, programsthat talk to it,4:58you need toget results back, you need to create schemes you might need to be given some existing database and look at it, there is scripting and migrations of schemes you want to do, you probably are writing applications that have higher level of business concepts that you want to map to that database,5:15so SQLAlchemy started out with the really ambitious goal of having a place, a way to do that, a way to do all of those things in Python.5:23Not just being able to map like an object to a table, but all the features of managing schemas and creating schemas and reflecting them and working with SQL scripts and everything else, working with data type problems, making all kinds of different database backendslook as similar as is feasible,5:44you know like data types in Oracle are very weird compared to those in MySQL and SQL lights a whole different story. Those databases are always going to be different but you can at least try to get some samples of sanity across all them,5:57so that's why if you go to SQLAlchemy website the first thing it says is like a database toolkit for Python. It doesn't say it's an ORM,that's just one component. It's a toolkit that has kind of helpers for whatever you have to do with a database SQLAlchemy you can definitely have a roll.
6:13 That's excellent. You have at least two really nice sort of comprehensive walkthroughs, one for the Core and one for the ORM component. That's the SQLAlchemy.org, right?
6:23 Yes, SQLAlchemy.org.
6:25So, when did you get started with, when did you create SQLAlchemy and what inspired you to create it in the first place?
6:30So, I was kind of writing a first iteration SQLAlchemy in late 2005 and our first release was early 2006 and really when I was writing SQLAlchemy was not the first time I was creating a tool like that, when you work or at least in the old days when used to work in the 1990s and all these various internet shops we had really little tools to work with them.6:55I mean there were tools, but we were, you know, using Perl and libraries to talk databases were pretty crude and I was using Java from day 1, when it was you know version 1.0 and there was literally no libraries just a little before hibernate.7:08So throughout the 1990s, you know, you get the habit of writing little database access layers yourself and every job you have you are going to go and write a really better database. "The database's layer last time sucked, now I'm going to write a really good one"
7:25 And you said that statement to yourself like 5 times, right...
7:29 Yeah, "This time I'm going to do it right, I'm going to write ultimate set of tools" and then as you are doing this you know, as the 2000s roll along, ORM's you know hibernate suddenly was around and I was doing a lot of Perl and MOB and there were some ORMs things with Perl that looked terrible to me, they looked really simplistic.7:49By the time I was at MOBII was working, they were a big Oracle shop, I had already had a lot of gigs where I had worked a lot with Oracle, I had worked a lot with Postgres I had worked a lot with the Sybase Microsoft SQL server, I had worked a lot with not just MySQL but its earlier MSQL incarnations in the early 1990s.8:10So I had, and I'd written a database code in Perl, Java, and C, C++, you know, because back in the early 1990s you know, we didn't, Perl wasn't always available, so I really had iterate a lot on this whole back and forth with the database sending a8:27SQLstring getting the result back, getting data back from the result. I knew a lot about that, so and I always wanted to have a system that would be the last time I have to do this, you know,
8:40 Yeah, I know that feeling...
And for the long time it was like- I was writing things in Java I was going to write like a big toolkit in Java like the big you know web framework everything you need Java thing, and it never really happened, but eventually when I got into Python, that's when I really began doing it, 8:54 Python had kind of not a lot of libraries around, the only library that was feasible for database access in Python was SQLObject which was a great library and it was a huge influence in SQLAlchemy; 9:06 and I started doing it and it was kind of based on the latest techniques I had iterated out of Baseball like the ideas that if you want to talk to adatabaseto be able to make a data structurethat represents what the table looks like and we call that table meta data, but one critical thing I did with 9:25 SQLAlchemy was actually decided to read a book beforehand, so I had this book called "Patterns of Enterprise Architecture" by Martin Fowler-
9:32 Yeah, that's a great book
9:33 And I read that book and half of the book was like "Oh that, oh this, oh this" all these things that I've done and he kind of put names to them; and then, as he did all that there were some other patterns that I was not familiar with like 9:45 "The unit of work pattern" I was like "Wow, look at that, I've never heard of that, that's pretty cool",
9:48 And does that appear as the session in SQLAlchemy?
9:51 Yeah, the session, and actually Session is from Hibernate, Hibernate has got the same thing. So I've- you know, really made a go and I said "Mike you are going to read a book beforehand" and really try to get all these patterns right. 10:03 And that's kind of how it started and it took a very long time for SQLAlchemy to be any good, I mean I think when I first released it, it had a lot of the special things going on that may people like it, but it was not executed very well, but, over the years, over ten years, it's gotten very refined, but it was taking a long time.
10:23 Yeah, it's quite a mature product and that's cool. Did you use it internally before you actually released it or was it kind of "I'm building out in public"
10:31 Yeah, no, it was funny, when I was starting in Python I was running open source and the first thing I wrote wasthis template language called Mighty which was basically almost a line for line port of a template thing in Perl called HTML Mason, that was the first open source thing I published and then Mighty was kind of an embarrassment eventually and I wrote Makoto replace that. 10:50 But I was writing its libraries at first without having any job, the gigs I was doing were still Java jobs, I was still doing after I left Baseball I was still doing Java work. Python was certainly used in the mid 2000s but it was not as dominate as it is today. 11:06 I didn't actually get to use SQLAlchemy for real gig and so version 0.5, so maybe four years into it and I finally was on the gig where like "Hey, we can do this thing in Python, and we can do this" and everyone was like "Okay, use the Python thing"
11:20 Yeah, use that toy language yeah yeah yeah
11:23 I kind of liked that, it was actually the website for a TV show called "Charlie Rose" which is on PBS, yeah we wrote, I don't know if it's still up in that incarnation anymore but at the time, this was maybe 2007,11:34it was written using early version of Pylons, and that was the first time I used SQLAlchemy on the job myself and SQLAlchemy improvedon that gig because I had like "Oh this sucks, oh look at this thing, this thing is terrible, how can people been usingthis for3 years"
11:48 Yeah, that's excellent.
11:51 Yeah, there was written kind of in a vacuum, I only knew what my users told me for several years.
11:59 Yeah, very cool. The previous show that's not out yet but was recorded just before this, is a guy named Mahmoud Hashemi from eBay/ PayPal and the whole show was about sort of enterprise Python and sort of using it as a real proper language and I think, 12:12 you know, it's probably that time frame that there are these similar ideas about "Hey we can actually use it to build real websites and webscale professional apps with this language, it's great".
12:24 Yeah, I mean Python as enterprise was something that, I think you know, existed in the early 2000s, like I think a lot of people were using, I mean people were using like Zope or Plone I think was the early incarnation of enterprise Python;12:39but I think you know, really, you've got to hand it to like, I mean first Rubby on Rails got people working on scripting, languages for enterprise and then Django really helped a lot too,12:51Django really brought a lot of people to Python, and I think the critical mass kind of sort of happening for Python kind of in 2005
13:02 Absolutely. I think,you know to treat your horn and everybody else's a little bit the stuff that's in PyPy all those packages out there make it really hard to not consider Python. The fact that you can just pip install- magic. You know, and build stuff so quickly like why would I start from scratch?
13:21 Right, absolutely.
13:23 We now know what SQLAlchemy is if we didn't; can you kind of give us like- I know it's hard to talk about codes and don't get too detailed, but can you give me a little walkthrough of like what is involved in getting started, like what do I do in code to maybe make a basic query or connect to the database?
13:37 So, there's different ways that you might want to use the system, I mean there's the layer I want to just make a query and that's pretty easy, and the SQLAlchemy has a lot of different levels that it can be used at, like if you just want to connect to the database and make a query that's like a really two lines, just you make an object called an13:54which you give itaurl fordatabaseand then the engine can just accept query's strings directly, and right there you've already stated ten lines of code versus what it would be with the Python database
14:08 Yeah, that's fantastic. Maybe we should take a step back and talk about the layers and then we could talk about what this layer, that layer looks like, and such and such. So you have the core and you have the ORM, can you talk about that a little?
14:19 Yes,soactually14:22brought me in this inside out way where I didn't even worry about the- I kind of did what I wanted to do for the ORM but I didn't even worry about that for a few months,I wrote first just the engine just like theconcept thatconnects to a database using the Python db API and here's the way that we can execute the Query and get results back in a way that's a little bit nicer than what the raw Python database API gives you,14:44it's a little bit nicer and a little more consistent. Then the next thing I did which you know, was soon after, was I wrote the table meta data system which I had already done many times at different jobs, you know kind of looked in Fowler's book,14:56so you create this concept of a Python data structure that mirrors what structures you have in your database so when you have a relational database it kind of without using it at all it has fixed schema, you have like tables, we might call them physical tables but they are really not physical, and tables with columns and they are kind of like the things that are going to store your data.15:23You model a mirror of all that stuff in Python so that you can write Python code that refers to this object structure that can then relate directly to how the database is built. So that's called database metadata, so it's meaning, it's information about the structure of your database.15:41Then from that, the table metadata objects in SQLAlchemy have all these methods on them like select and update, and you know table.select.where, and when I say, you know, x.y.z the dot in Python means that we are calling a method on an object and if you have this pattern when you say an object.call(this) and when you get something back and then you say .call(that) - that's the pattern called method chaining which means that you keep calling methods on this object that keeps returning a new copy of itself.16:13So it's a common pattern using in database Query libraries because the method chaining is kind of like you are building up this data structure that looks like a syntax tree- a syntax tree basically represents like some code but it represents the code as connected together in a tree.
16:31 Yeah, that's fantastic. I really like the flew in API that you put together there where you basically you start the Query and whatever you do to it comes out in another Query and just lets you build that- here's a where clause, here's an order by, here's the distinct or whatever.
16:45 At this point, I mean that pattern is pretty common place these days and when I first did it it was not quite as common place that mean basically Java Hibernate kind of has that pattern but it's not the nicer way, Java is not as sleek language as Python.16:57It was the pattern that was around, I think SQL Object had it is well, you know. It builds an app but then we also did this thing I took this thing away from SQL Object which was that you over load Python operator's like the double equal operatoror the greater than operator, so when you have these objects that refer to columns and values in the database you can compare them with the Python comparison operators and kind of auto magically create another object, like a comparison.17:29And if you go through the ORM or the Core tutorialon SQLAlchemy docs, the Core tutorial would really show how this works at the expression level, the ORMs is kind of reverse to it but it's more about how to get the ORM going. But one time how the core query system works. So, basically these Python objects can kind of be combined together to create SQL Queries. 17:55 And that's the core, really, that's really what you get as well as the fact that it returns objects, it returns Python objects like if you Query for dates you'll get a Python date object back, if you Query for interolsyou'll get a Python timedelta back and these are all systems for building your own custom types, and if you want to get Json back things like that.18:14And then the ORM built on top of that, so if you are starting out I mean if you really want to learn the library from the ground up I would start with the Core and then move to the ORM if you want to get some quick results for a program you are writing you might want to start with the ORM first and then do a deeper dug later.18:30I've kind of switched those directions around over the years, and I think there is just two ways to learn- you can learn from the outside in or learn from the inside out, and it's kind of your choice, do you want immediate gratificationor or do you want more fundamental knowledge first.
18:45 Right, well, and how much are you going to be depending on the library? If it's your core business app and it's really important but this thing is working maybe you should deeply understand it, if it's just a little blog you turn together do it quickly and then come back and learn it more deeply, right, start with the ORM and then go to the core and understand that.
19:02 Yeah, well everyone I've seen, everyone's got a different way to do it, so I just try to present both; and also for a very long time the Core didn't have that name "Core" and that was a problem, it was just called SQLAlchemy and then there was SQLAlchemy ORM and then people didn't really, obviously understand that this whole thing SQLAlchemy- that is not the ORM. 19:20 So I introduced the word "core" probably about, you know, five years ago to say "hey this is whole other thing" and I divided the docs into two big left and right sections so that they could see that there is this whole other thing, like entire right half of the docs.
19:39 Yeah, you have columns
I do, I find them imitated I think Django documentation had that thing going on with the two columns so I did that to say 19:47 "Look, this is whole Core thing, if you don't like the ORM", because I knew the ORM was going to be- the ORM is more pinated I knew it would be controversial, I knwe that I personally didn't like the ORMs very much, and this was goingt o be the ORM that I was actually going to like but I knew all the other people like me are going t o hate it and those people are still around today, so I said like20:05"Look, I get it, don't use the ORM, use the Core and when you kind of use it for a long time you realize hey, we've done this thing over and over again, it could just kind of be automated by the ORM" then you can use the ORM, when you see there is a need for it. So, that's how I saw it being used, but other people got the other way.
20:21 Sure, so, you've sort of talked about the learning perspective of when to use the Core versus the ORM but are there like performance or other patterns, like what other considerations do I choose between the Core and ORM model?
20:34 There's a lot of performance, there's a lot, yeah, so, kind of the unspoken drama of SQLAlchemy for all the years that's been out has been the performance drama. Because, coming from Java background, in the 1990s, early 2000s, at least back then, when we did Java programming we really didn't pay too much attention to the way our code was written might be slower or faster;20:56I mean, obviously, you know, if you are doing like order event things like that but just if you do two nested loops or you do this special library call instead of the loop is that faster, slower, you don't see that much in Java because all the libraries in Java are written in Java as well, 21:13 but with Python is mostly C Python- there is this big gap where if you do something in a for loop it might be very slow and if you do the exact same thing with like the zip call is million times faster because the zip call is written in C. 21:31 So there is this whole thing in Python where you have to kind of use the standard library as much as you possibly can because you want as much as data processing to happen in native C functions rather than in Python so I didn't know that in the first few years writing SQLAlchemy and I didn't really know much at all what was fast and what was slow, and that's pretty common when someone starts with the Python.21:53So, up into version 3 the thing performed horribly, and then there was a lot of blog post another competitors came out to try to challenge the Perl in SQLAlchemy and I went on kind of a multiyear mission to improve the performance of the system so, these days and the most recent incarnation of that has been more performance improvements in version 1.o and also a lot of new documentation that I've added, in version 1.0 that refers specifically to the notion of performance, it's actually in22:23now there is a whole section in my performance there is entire section of how to profile an application and it's a lot of sample tests example suits in version 1.0 where you can compare different methods of doing the same thing for their performance/ ease of use trade off.22:41So I have examples of you know, if you want to do this query, how fast is this query using the Core by itself, how fast is this query using the ORM but loading individual call, how fast is this query if we use the ORM full blown, and what are the trade offs and how can we switch between those two systems?
23:00 So, that's really interesting, I have a couple of performance questions I would like to ask; first you said you went on this multiyear mission, to speed it up which is that's a lot of dedication and that's awesome. What kind of performance gains did you get, did you like double the speed, or what happened?
23:16 Over the course since version 3 I'm sure we have quadrupled the speed of it because it was really really slow. In version 2 and 3 I mean there thing is all kinds of Python idioms that change performance so dramatically and it's kind of a way of life for me now but if you don't know, if you come from like23:33C Sharp it's probably going to be really painful to learn like if you do things like if you want to check at the keys in the dictionary and add a value if you do that check with using, if you basically try to get a key that raises the key error and you catch the key error and you say "Ok let's put the23:50now" that's going to be wait wait wait like five times slower than doing it by just doing it if keyindectfirst.
23:58 Assuming that the dictionary is usually not going to have a value.
Yeah, because it's actually the exception, yeah it's super expensive.
24:06 That's one. Then there are things like if you have an object and you want this pattern called the proxy pattern, if you have an object it has a bunch of attributes and as you call the attributes it's actually sending the message to some other object that's kind of embedded into it, it's called the proxy and that's a very common pattern. There is a really easy way to do the proxy pattern in Python which is used this magic_getattr and your getattr is past the name of the attribute you are trying to get and you say "oh let's call that attribute from our little nested object". Crazy slow if you do it that way compared to if you actually put explicit descriptors on the proxy class that goes directly to the inner proxy class. In the SQLAlchemy I use a lot of the getattr and the people say hey this getattr is crazy slow, this thing is using getattr everywhere it's super slow.24:56Because they are all function calls and function calls are very expensive in Python. So,you do this procedure in programming using a lot of Python which is called inlining, inlining means if you have some operations where method a calls method b calls method c calls method d you inline them you take all the code and you unwrap it into one big function.25:17So that there's no function calls within it. Inlining is something that you kind of get somewhat for free if you use a newer interpreter like the PyPy interpreter which has just in time compiler that does lining for you
25:33 That's really interesting, I was going to ask you how SQLAlchemy works with like PyPy and have you tried Pyston from Dropbox projects
25:40 I haven't used Pyston yet. Pyston looks like kind of the next incarnation of a thing called Unladen Swallowwhich if you remember that was the Google project where they tried to use the LLVMcompiler and I think that's what they are doing with the Pyston thing.
25:59 Interesting, the same one that Apple uses, yeah
26:02 Yeah, I don't know what I'm talking about. There's a compiler just the time platform called the LLVM and I don't know much about it, I know that it's...
26:10 Yeah, that was Chriss Lattner's project , he is the guy who created swift and I think they are using it there for swift as well, so, interesting.
26:18 yeah, I haven't used Pyston, yeah, I see some news about it, I know how it works at dropbox now so I guess we'll see.
26:26 Yeah, That should give Pyston a little more credibility than maybe just a random project, right?
26:53 Michael here, Thank you so much for listening to and spreading the word about "Talk Python To Me" The respond to the podcasts continues to be wonderful and humbling. I have a quick comment about supporting and sponsoring the show. I'm still looking to line up stable corporate sponsorships, but I wanted to tell you about a community based campaign I'm launching to allow listeners to directly support the show. We are running a Patreon campaign. You might not have heard about Patreon, but it's kind of like kick start of the things like podcast which release frequent small deliverables rather than one off large engineering projects. Visit patreon.com/mkennedy and watch the video to see how you can donate as little as one dollar per episode to support "Talk Python To Me". This is your chance to ensure that the Python community continues to have a strong public voice. Consider supporting us today at patreon.com/mkennedy and thanks for listening.
It works with PyPy though pretty well, SQLAlchemy.
28:08 Yeah, it took a long time for PyPy because PyPy itself had a lot of quarks years ago and yeah PyPy continues the integration system. PyPy doesn't use the same kind of garbage collector that CPython does, but garbage collection is kind of a big deal when you write a database library because it affects when you have referent cycles getting garbage collected a lot of our tests for its fail because they were relying upon things being garbage28:38collected automatically and that doesn't really necessarily happen as much with PyPy so there is basically if you have a test that doesn't tear itself down correctly and you still have a connection open that access a table and you run Postgres the tear down of yourtestwill refuse to drop the table because Postgres still has a lock on it; so there is a lot of issues like that that PyPy keeps on so that's good. PyPy seems to use more and more memory but it definitely is much faster and I think it is great interpreter and really hopeful for PyPy. We are into it, for sure.
29:09 Cool. So one thing that I think is nice about the ORM version or the ORM layer is I can get these classes back from a database and make changes to them and just basically call commit on the session. So can you talk about the change tracking?
29:30 That's the unit of work, so that's the part of this whole thing that I was not even familiar with myself because I read Fowler's book and he was like you know enterprise software uses this unit of work thing where you just kind of accumulate the changes and then you just push them all in one shot and I was like "Wow, that's crazy" and you know, I can see how it works and I'm like- That's pretty easy, just track it so that all there are your inserts and your deletes and your updates get kind of put into a big bucket and you sort them out. And I wrote a very simplistic version of that, the SQLAlchemy 0.1 30:03, it was really bad because I really didn't quite know how to write it, I was like a, I knew how to write it but I did this kind of coding which I did more back then probably that I maybe would term "seat of your pants coding"where you kind of know that this part connected that part, this part connected that part, but my brain couldn't see the whole thing at once. You can get the code, go look at all 0.2 of the unit of work and you will be like "What is that!? It's crazy" But when I first saw it work I was like "That's amazing"
30:37 That's super nice. It's really easy.
30:39 Because I was logging the SQL yet I said that's when it got really- got me really jazzed because I was like "This thing is going to be really big" When I saw it commit the first time then I saw all the bros go "Wow, people are going to love this thing."
30:53 Yeah, that's really cool, You know, I actually do not know the answer to this question but do you have some sort of like optimistic concurrency tracking like if two people pull back one person makes changes and the other one takes the object and tries to save it what's the story there?
31:07 There is a little bit of pattern that we have and I lifted it entirely from what hibernate does; there is a pattern using a version counter that, I don't know that it is that widely used because it kind of interacts with the isolation level you are using in transaction, I mean really the optimistic concurrency is something you can get in your database just if you set your isolation levels the way you want, if you set them higher the versioncount on ideas that you load the row from the database and the row will have a special column that's the version and say that the version is five.
31:42 Do I have to make that especially on my class?
31:46 Yeah, there's a whole, yeah, it's in the mapping option, so you have a column that you declare as part of the versioning and then the map will actually load that in that number or it could be a date stamp if you want and it will add when you go to update the row later it will actually add that column to the criteria for the update, so it will be like update row set whatever where primary key is blah and version equals five.32:13And then if the update does not match any rows, that means version five is gone. It means someone else changed version five underneath you. And then I'll throw in airand that's basically one simple way of doing optimistic concurrency control. It only applies to updates and deletes. It doesn't apply to inserts, inserts is different people want to be able to insert some value and if another thread tries to insert the same value they want to use the one existing; that's a hard pattern.32:40There are ways to do that in SqlAlchemy but none of them are automatic.
Yeah, that's interesting, but I think even that level of updates is really a nice feature to have. 32:52 So you talked about the mapping a little bit; when I create these classes, basically what I do is I create a class that derives from some kind of declarative base and then I declare the column and how do I do that?
33:04 So, the declarative system is again that was the system that I added to SQLAlchemy back when I did version five and I worked at the Charlie Rose website and we didn't have any declarative system built and we had some cheap one that was built into the library and then we had a third part I think called Elixirwhich didn't quite do what I needed, so I wrote a new version kind of a third version of the declarative which is basically you have this base class that you can inherit from.33:29The original idea of SQLAlcheny was that you write you declare table meta data using this core you have a table object, table object has this column objects, the name of the table will match the table in your database and the manes of the columns will match the names of the columns they have datatypes like a string or a date time orinteger, and then you would take your class which will just be kind of a blank Python class and say33:52"Map the class to the table!" So the declarative idea is how most ORMs that we deal with today look where you don't see this two separate things, you see kind of this class that has like class/ my class and then it has like id= integer and then first thing = string and last name =string and that looks like Django looks like every other ORM you see.34:14So the declarative thing was added to do that so you basically use the constanceof the Python class you say class and then you my class and then you put a call in and then you indent because Python we are doing white space and then you basically declare variables that refer to the columns that will be in your database table. And then as far as how that creates the database or the database exists already that's kind of a separate concern, but, you are basically defining what your schema will look like in line with your class definition, and it's pretty common pattern that is throughout every of this point.
34:50 Yeah, I think it is pretty familiar pattern, it's really nice, it also ties in to when you are talking about doing the queries those columns basically you can do operations on them right? So I could create like a query of book and say order by you know book.name.des or something like that right and should they kind of play multiple roles letting you do the queries as well as define the schema and the constraints and all that, right?
36:52 It's one thing to sort of realize that that feature is there as the other to say let's use that to make a data access layer really works super well and make it - you define one thing and it has its true purpose which is really cool.
37:03 Yeah, there was I think, there was this guy I think his name was Azmo and he was working on SQLAlchemy a long time ago and he first showed me the pattern that had the hybrid thing and I was "Wow " and I really dove into itin the coming years and I realized it is a very handy pattern.
37:17 Excellent. so, you are talking about creating these columns and stuff, what if I have got a feel that a column is notnullable or needs a default value, how do I do that?
37:26 Yeah, so the table metadata parts, so say we are back in core we are not doing the ORM again, so we are back in core, one of the goals of the core was that it would be this comprehensive system of building a whole schema from scratch so meaning if you have a blank database, you build up all of your objects in Python and then say, you know, metadata.create_all and that will omit what we call data definition language, DDL, which you know as create table statement and 37:52 create sequence and create index. it will omit all those things for you so in order to make that feasible, we have to also have support for attributes like if the column is nullable or not, if a column is a primary key, if a column is index it is unique constraint on set of columns... So SQLAlchemy's schema system has objects and flags that represent all of these concepts. It's pretty comprehensive and we have a full support for all kinds of datatypes and the numerics and floats and we have server defaults and we have every common thing except for triggers; we don't have system for triggers, you can define your own system for triggers and we don't have procedures, so all the main things that you normally- your day to day database schema and stuff like tables, columns, constraints, indexes, nullability, and defaults are all first class. Other things you can define like sequence- sequences we have, other things like triggers you can define your own Pythonic little object set will do that for you but more in a database specific way.39:02So anyway, when you do the ORM and you are doing declarative mapping you are still using core column objects in your declarative mapping so when you make a column of this you can still specify all the table attribute as well as you do your declarative mapping, so that's kind of the answers, they kind of melt together. And it was not that way in the beginning but it evolved over the years to work that way.
39:23 Yeah, it's really smooth. So, I have these classes and I can map them to existing databases and you also said that it meant the DDL which I can create my tables from, right? That's just a function call, go to my declarative base and create something like that?
39:38 Yeah, so the metadata objects is just kind of this collection of tables and other things, mostly tables. And you could say metadata.create_all and you give it an engine to point at like a database connection and it will go on a database and see which tables don't exist and create all the tables that don't exist yet.
39:56 That will create them from scratch, what if I say add_a_column?
Right, so the whole thing with adding a column and doing that that we call that migration, so as we go this funny nothing in SQLAlchemy was there in day one, everything was like I had to figure it out, learn it, like I actually did a talk for SQLAlchemy version 4 and they asked me from the audience "What do you do about database migrations" and I was like "I do not really know much about migrations, I used to do them by hand, so I don't really have migration tools", so some people wrote a tool called SQLALchemy Migrate and that went pretty well for a while and it is still used today and then I eventually wrote a newer version called Alembic. So that is kind of a separate package that you can download and Alembic builds on top of40:37SQLALchemy as the SQLAlchemy might vary because it's still in production. And that defines a whole bunch of new operations on top of SQLALchemy that are specific to being able to add columns, to alter the types of columns and alter the nullability, you know to remove columns we name and things like that, and my database migrations is a whole big issue so it took me several years to really get my head around that41:01to again use that in production on the job, because the best way to write open source code is to be what they say "eating your own dog food" on the job so I used Alembic a lot, Alembic is now really widely used as very prominent within open sack which I work for and that's kind of the path to do migrations just kind of a separate from SQLAlchemy core itself, it's kind of built on top.
41:21 Yeah, that's cool. How about relationships, if I got a book store and I've got categories and there's books in the categories and reviews about the books and so on...?
41:29 So, that aspect of Object relational mapping is really where object relational mapping starts to look like this concept that is really going to change the way of relational database looks compared to the way anotherjoint of data model looks. So in relational schemas if we want to have a table that refers like a row in this table say it's an account row or something and it needs to refer to like some other related concept like like an address.42:01In the relational database we use a foreign key constraint which means that-- you don't have to actually use the constraint, but you would say that this particular column or set of columns in our account table actually refers to the primary key of a different table somewhere else and that's called the foreign key, and the constraint in the database makes sure that the value you put in those columns actually exists in the remote tale. So, and in the ORM, ORMs have this concept of either making the one to many, or many to many, or many to one relationship, which models in42:40terms what that looks like in terms of collection, so if I have in all of the tutorials we have this concept user object and user has one or more email addresses and it is a really old example but I will go back to it as I just know it really well. SO, your address table in your database will have a foreign key called here's an email address and what user is this what user is the owner of this address?43:04It'll have con called user id or something which will refer to the primary key of the user table. So the user table kind of refers to I mean the address table refers to the user table. In your model, you want to have the user object has a once and many to the address class. So, it's kind of if you define one and many it's kind of defining it in the other direction like user point to address. But in your relational database it looks more like address points to user but it turns out that everyone in many relationship in object jointed 43:38 is the same as the many to one in the other way, so SQLAlchemy basically lets you make the one-to-many and many-to-one at the same time using this concept called the backref. This is a little bit rambling but if you look at ORM tutorial it is laid in much more organized fashion. So yeah, there is basically an object there is a function in SQLAlchemy ORM called relationship where you associated with a parent class and then you give it a target class and relationship will make sure that the foreign key set up that is expected to exist between those two tables. And they will kind of mediate that pattern on behalf of the object model to the database schema.
44:20 Yeah, that makes it really easy to almost not even worry about that. So is that like a lazily evaluated thing, so if I pull back a book it's not going to go pull back all of its reviews instantly unless I touch that?
44:30 Right, so that was one of the patterns in Fowlers book called "Lazy loading" which is the concept that if you have an object and you have some kind of method or attribute that represents data from your database, you might have the object and it didn't actually load that other data, it's only when you touch that attribute or method that it suddenly goes44:48"Oh, I have to load the stuff and I had to go out of the database and load more information" That's inefficiency pattern called "Lazy loading" it means you can load a lot of objects and not have to go through all the time of loading all the related things. ORM is doing lazy loading just every ORM I've ever seen. But lazy loading is also at the core of this somewhat controversial concept in ORMs called the N+1 problem.
45:13 Right, that's probably the most common performance problem people run into it, they don't realize they are doing this N+1 query, so I get one book and I try to touch the other items, I loop through them every time I touch that property I'm doing one more Query so for N books I do N more Query to get into database, right?
45:35 Yeah, that's the N+1 problem and every article ever written that is against ORMs will have that entire example as the corner stone of its problems. So the other thing that SQLAlchemy had from day one besides the Unit of Work,the other cool thing it had was this thing called Eager-loading which45:53means that you can- if you kind of know you are letting these users and you want the addresses also, you can say to the Query "Hey, load the user but then also can you also load the addresses at the same time instead of me doing the n+1 problem" and in that way, you will only do if you do what's called join the eager loading, it will actually do the join, it will load your user row and while does the select for the user row it will do an46:21attrjointo the address table so that it gets zero or more address rows at the same time and then the query object knows how to organize the product that we get back, because if you do a join in SQL, from one table to a many, you'll get the one row repeated so the ORM knows how to de-duplicate those rows on the last side of the join and that's called eager-loading and one of the trade offs of eager-loading is that when we write a query we say46:50Hey, mister object database thing we don't know anything about, we want users and then later I want addresses, if I have to say hey I want users but I am also going to want in the future these addresses, that is what you call a leaky abstraction because it means that the way the system is going to do its load, you need to tell it upfront how to do its job to some degree; you have a little bit of awareness that this is not just like some weird database,47:17this is actually a relational database and these two things are in two different tables. One of the things SQLAlchemy embraced very early on in one way that is very different from most of the ORMs is that it did not try to get around that issue by pretending the database doesn't exist, you know what: this thing is to automate our work, it's to make our lives easier and write less code,47:40it's not there to hide that there is a database, so we need to be aware of things like if we want to load the one and not have the n+1 we need to give the query a hint upfront to do that, so that's eager-loading in a nutshell.
47:55 Yeah, I really like that philosophy that you guys take, looking you know, saying look we are not going to hide this from you, we are going to make your life easier and less error prone but we are not going to treat you like an idiot or hide the stuff or you know, you can still get what you need to, that's great.
48:12 How do I do that in code, so what do I say on my query to say I want to eager-load books and reviews.
So, modern SQLAlchemy, I mean, this is again APIs have changed dramatically modern SQLAlchemy you would say query for user class and then there is a call on the query which is a method chaining system.48:30You say query.options and then you add to the options function, a bunch of constructs called joined load, where subquery load or immediate load there is all these little directives we have that you can say "hey, query for user.options joined load user.addresses joined load address. related something else" and you can kind ofgive these directives in the options of the query as to what other things that should be loading on to the primary thing that you are loading from.
49:00 Very cool. You have been talking about the different versions of the API, how have you dealt with versioning that API over time?
49:07 Well, I mean, it's been ten years, and what I kind of did in the early days you know we were first 0.1 and then we were 0.2 and 0.3 and back around 0.3 someone wrote an O'Reilly book for us and were like "you should be 1.0" and I'm like "No way, we are so early" like this thing is totally not at all how it is going to be.49:30So, with 0.1 first of all we didn't even publicize the library at all in those early days I just maybe announced it on one mailing list, and people just came and kind of start using it, and they started using it in production, and 0.1, 0.2 I made enormous, enormous changes
49:52 Did it make you nervous to hear that they were like so heavily depending on this new thing you created?
49:56 People were way too dependent too fast. In early days I think Reddit started using SQLAlchemy maybe version 0.4 or 0.5 you know and there were some messages from them where they were kind of complaining about SQLAlchemy changing its API too much, and really, the API I made the biggest changes before version 0.5. 0.5 we really started selling down to this new kind of thing with the query, with the attributes link to the class, and then we had the key that API changes all the time and from 0.5 on forward50:35even from 0.3 and 0.4 I always made sure that there was a backwards compatibility path, I made sure that everything that change was always announced and if something happened that was unexpected, that was the regression and we would adjust it and it was almost never that I said to someone "You are stuck, you'll have to just change your code now" like I always made sure that there was a backwards compatibility path.50:57But, it makes you have the kind of watch when we did a next major number which is really the minor number but we call it the major number. You needed to watch what was changing, and you needed to test ahead of time. Things are better now, now that we have Pip is much better and we can put a beta releases that won't automatically install to people and I always put a very very comprehensive migration notes for every version that you can see them going back on site, probably back to version 0.4, 0.3 where I say51:29"Here are the things that are going to change, here's the things to look at here's the things to test... " and after version 5 version 6 version 7 it changes less and less each time. People went from version 7 to 8 with very few hiccups, you know, 8 to 9, 9 to 10 there is really nothing you have to change in your code anymore, we let people to test because there might be surprises that we didn't expect,51:52and we fix those but APIs very rarely break in any kind of backwards in compatible way, the only time that it's a problem is that if there is something that was broken in SQLAlchemy 8 or 9 and people were relying upon that broken behavior, that?s the hardest one, when people rely upon broken- and they don't know it's broken, so you can't blame them.52:11When we fix things that were broken, we'll usually make it raise a warning, in the current version of using like 0.9 has something that's broken we'll have an update, upgrade to 0.9.9 and this will be an exception in 1.0.52:27So, they'll kind of know the change is there, but you need to be watching the versions and the changes, I work really hard to make all these changes that were very easy to find on the site. I mean, other products you can never find what changes, it's very hard to find, I think our site is very transparent and you can see everything happening in many ways and it's very highly linked as far as the development log.
52:50 Yeah, that's excellent, it is super comprehensive. So, one thing I think is interesting is maybe talking about some of the notable users of SQLAlchemy, and there you've talked a little bit about a couple of them but you've got a whole list at SQLAlchemy.org/organizations right?
53:07 Yeah. So, yeah there it's I think I first saw like MongoDB was doing there was enormous list of people using their stuff, and like I want that, like a big wall of honors like oh my, you know, when some big company uses your product it's like wow, I'm important, you know, I did this important thing, so, up there, you know I had, there's a lot of big you know, Reddit was the huge one,53:28Reddit only uses the core, they don't use the ORM, their code is very unusual in the way they do things and I'm glad they don't use the ORM because they would really be killing it. Yelp uses a lot of SQLAlchemy, I've visited Yelp, they use the ORM a lot, Uber, you know uses SQLAlchemy,
53:53 Dropbox is there, right?
Yeah, Dropbox, Dropbox is another one that uses the Core, they are another you know, they actually wrote some blogs about ORM which I was kind of annoyed about, but they use the core still they use it yeah a lot animation studios use it, there is a lot of companies that use it who I know use it but I don't really exactly have permission. One of them is kind of mickey mouse operation anyway.54:20Fedora uses it, and of course now I work for red hat and I work for Open Stack, and Open Stack is this cloud computing platform that's open source to lot of companies like PayPal and IBM and AT&T use it, and that thing uses SQLAlchemy all throughout, it's super heavy SQLAlchemy throughout Open Stack code. So, it is very widely used library.
54:42 Yeah, that's really excellent. You must be proud of it?
Cool, and you should be so. What about the whole Async movement with like Twisted and Asyncio, how does that play with SQLAlchemy?
54:56 The Acync thing is something another Python story where it told me that I wasn't really that interested in but it's a big deal in Python, the Async, because when I first came to Python, the two a hundred pound gorilla things that existed were Zoup and Twisted. The old school, the old guard of Python is Zoup and Twisted; so Twisted is great, it's still out there.55:19The problem that we call explicit Async is that it's not really compatible with this idea of lazy-loading that we see Fowler first described, the concept that when you access something on a class it's going to automatically behind the scenes go and fetch data from the database it means that it's going to do IO and if you do things with the Async model, the Async model is all about that in put out put up operations should never be implicit.55:50They should always be something that you code into it and say hey this is about to do IO so stop what you are doing, do the IO. The newest version of Python's Async IO which is in Python 3 you do a Yield and it's very sleek, but it still is a statement that is an interruption in your code.
56:15 Right, you do explicitly say I'm waiting here, I'm waiting here, I'm waiting here, and any of these points you can take that thread and go do something else with it until I'm done talking to the database or something, right?
56:23 Right, so people have wanted SQLAlchemy for long time, I've looked into it, I actually had some clients pay me to look into it, and every time I look into it it's like "It's really not worth it because we have to completely rewrite the whole core engine part of this thing to be this kind of inside out call back routine" It would cost way more functions because remember, the internal source of56:52SQLAlchemy is the performance is that if something takes two functions instead of one, it's a big deal especially in the core where I literally count functions on the regular basis. I do a lot of profiling so just making one thing take three functions instead of one is a huge disaster for me within the core. And the classic approach would mean a lot of this going on.
57:14 How about the Asyncio variation?
Asyncio variation, there is this big blog post I just wrote in the past couple of months that is about, it's called Asynchronous Python in databases, and it's got a lot of views, and it kind of sums up the whole situation. 57:29 I went into Asyncio I looked at it, I think it's really great and it's definitely a huge improvement, I actually could write a lot of SQLAlchemy could be made to work very seamlessly with Asyncio and that it wouldn't require that much code to be replaced what's nice about Asyncio is versus the older call back style is that there is kind of some ways you can have the Async thing be explicit on the beginning of where a method does 57:57 something and then up the call chain at the end you can kind of get away with the calls in the middle not knowing that they are doing Async Python syntax helping here, so I looked into how I could make the core do Asyncio for real and actually it can be done I started writing parts of the poll and there will be a lot of work but it could be done. But in this blog post I wrote like "do you really need to do it" because when you use Async even when you use Twisted and Twisted has been doing this from day one for databases, you have code that really works better with threads and traditional threading,58:34you throw that code into what's called the thread pool, and then the asynchronous part of your library can actually call into routines that are invoked synchronously within a thread pool but they are given kind of an asynchronous to start. And I think that is the best way to do it, if you are writing a lot of database code, you really don't need to worry you really shouldn't have to be worrying about deferring to IO because in the database code it's not really a mystery when the IO have to happen. 59:06 One of the entire reasons the SQLAlchemy exists is to reduce the need to type repetitive boilerplate, it's the whole point of it, you don't need SQLAlchemy to do anything, you could do it all by hand but the point is to reduce boilerplate so to that extend I think you are better off putting your database code in a thread pool or using a system like eventlet gevent that is an implicit system. And the blog post really goes into detail as to choose benchmarks why I think it's better, and I think a lot of people agree with it.
59:36 Cool, ok, you have to send me the link to that blog post I'll put it in the show notes.
Excellent. So, I think we have been talking for a while, it's probably a good place to wrap up the show and question I ask the guests on my show on their way out the door is you know, there is a tone of packages on PyPy and GitHub and stuff, what ones are notable to you, what ones are your favorite? SQLAlchemy of course and...
59:59 Yeah, PyPy, I use PyTest quite a lot, I recommend Pytest, I recommend the mock library which is actually part of Python standard library now. Mock and PyPy are the ones I see myself tumbling most of the time, you know, what else? There are probably others but they are not coming to mind at the moment I would have to look and be like oh yeah I use that thing a lot. I mean obviously numpy is the big one.
Yeah, numpy is good, like I said I think SQLAlchemy is certainly one of the top ones out there.
Yeah, yeah, it is.
60:36 So before we call it a show, is there anybody or thing you want to give a shoot or two or call attention to?
Oh, well, my cats for helping me with the show today and I am really appreciative of the people that come to see my talks and the people that talk to me on Twitter and tell me how great I am doing and even the people that nay say me as well I get good feedback from them so I'm really glad that everyone is kind of following along with what I'm doing.
61:03 Excellent. Can people come and contribute to SQLAlchemy?
Yeah, I mean, it's a big codebase and people do contribute all the time, I get requests all the time, through both Bitbucket and GitHub, we haven't done any sprints in the long time I mean that is something we could do again, you know people in the Open stack community help out so it's a very open project if you go to the website there is a toolbar that has a community links61:32and it has link called develop and it will show you all the ways you could help develop SQLAlchemy.
Awesome. Mike thanks so much for being on the show, it was really interesting conversation.
Yeah, thanks for having me.
61:43 This has been another episode of Talk Python To Me. This is your host Michael Kennedy, I want to say thank you for listening, and if you want to support the show be sure to check out our patreon campaign at patreon.com/mkennedy or talkpythontome.com/sponsor. Remember to check out the show notes at talkpythontome.com where you can find all the links from the topics we discussed in this show. 62:10 Before you go, take a moment to become a friend of the show, visit talkpython.com, click on friends of the show, give us your email address and you'll be in the group. Smixx, take us out of here.