#151: Gradual Typing of Production Applications Transcript
00:00 Michael Kennedy: I hope you're using Python 3 these days, because one of its powerful new features is type annotations. These let you build and maintain large-scale Python projects with much more ease and confidence. This episode, you'll meet Łukasz Langa who helped migrate some very large Python projects. We'll discuss how Python uses the concept of gradual typing to slowly expand the sections of your Python code that are type-checked. This is Talk Python to Me, Episode 151, recorded January 31, 2018. Welcome to Talk Python to Me, a weekly podcast on Python, the language, the library the ecosystem and the personalities. This is your host Michael Kennedy. Follow me on Twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @TalkPython. Łucasz welcome to Talk Python.
01:00 Łukasz Langa: Hello.
01:02 Michael Kennedy: It's great to be here with you again. We were just recently at PyCascades, a nice little conference where you get to pretty much meet everyone who's there.
01:10 Łukasz Langa: Oh yeah, for me it was a really personal trip since I used to live in Vancouver for quite a while. So, you know, Western culture doesn't really allow grown men to cry publicly. But I'm shedding a tear here all the time seeing all the places I used to visit and used to frequent, so, yeah, it was a great event. I enjoyed it a lot.
01:31 Michael Kennedy: Yeah, Vancouver's a wonderful city.
01:35 Łukasz Langa: It is just the perfect size. It's not as big and scary as New York City. But it's perfectly urban-like with proper public transport, unlike San Francisco. So, yeah, it's one of my favorite places on earth.
01:47 Michael Kennedy: Is that where you are now?
01:47 Łukasz Langa: Yes, I am now in the Bay Area. So, yeah, I miss Vancouver terribly.
01:52 Michael Kennedy: Yeah, the Northwest is a nice place indeed. Before we get into our topic of the day, the type annotations and all the amazing stuff and your story of how you applied this on a large code base, let's just get started with your story. How did you get into programming in Python?
02:08 Łukasz Langa: Programming is sort of a memory that I have from pretty early childhood since I remember the day when my dad bought me a Commodore 64. It was supposed to be a surprise Christmas present, but the package was big enough and he just wasn't able to hide it properly so I knew I'm going to get it. That was, wow, I very vividly remember the moment when I realized I'm getting a computer. And even if you want to play games at that point you have to type in a bunch of comments. So it was very welcoming to just try out doing something more and I was pretty young at this point. I must have been like six or seven, so most of what I did was just retyping programs that were published in computer magazines at that point. But that's instilled in me this realization that this is not some dark magic that normal people cannot do. To the contrary, I felt like, if only I had enough time, I could type, make any program like the War Games movies sort of artificial intelligence or whatever, like, you know?
03:16 Michael Kennedy: That's right, the WOPR core.
03:17 Łukasz Langa: Yes, exactly. Kids are very imaginative that way. It's gotten me quite a while to actually get into real programming. I went through Pascal. I went through Java and whatnot during college. And in autumn 2004 I was starting computing science at the university in Poland and I had trouble with some courses I took. There was, particularly hard for me was linear algebra. And I was, in fact, scared that I would be just let go. Like I would not be able to pass exams that were just coming. So, a friend showed me some scripts he wrote in Ruby and some linear algebra library to check whether the results of the exercises that we were doing were correct. That helped you solving homework assignments. So I badly needed some reassurance, so I got excited about this, but, for some reason, for some random reason, the RubyInstaller just refused to work on my Windows XP box. It just crashed midway, couldn't install Ruby. So, as a test was scheduled the very next day or so, I started looking, pretty nervously, for alternatives. And I literally typed Ruby alternative in Google. And that's how I found Python. And this installed cleanly. I quickly found a functional lin algebra library. That was, I think, before NumPy, because what I was using at this point was Numerical. So either way, I got hooked. That was such a departure from Java and Pascal that I'd known before.
04:57 Michael Kennedy: Oh yeah, this is really an interesting story. Just like, all right, forget it, Ruby's not working. What else? I got to take a test. And then, you're like, "Wait a minute. This is pretty cool, this stuff." I'm pretty sure that must have been before NumPy, yeah. That's cool
05:08 Łukasz Langa: Yeah.
05:09 Michael Kennedy: So one of the things that I, I know you didn't say that you were learning math for programming, per se, but a lot of times people feel like, if you are a programmer you have to know a lot of math. They conflate math and programming. It's really interesting how little programming, or how much little math we do as programmers, even though they're very similar.
05:31 Łukasz Langa: Yeah, so that depends really on your particular niche in programming that you're into. I was always coming from this background of, like, building Legos essentially. So, composing smaller pieces into bigger tools. So, yes, I'm not one of those guys who come up with new, exciting algorithms that are way more efficient than something else than was done before. I'm mostly a re-user of things that were invented by other people. But, in this sense, complexity is something that I enjoy more than people that program on, in a notebook, essentially.
06:13 Michael Kennedy: It's a very different way of programming. I'm with you. I like thinking of the big architecture, how all the services and database stuff fit together. It's really fun. So, speaking of which, you work on a moderately-large project right now. What do you do day-to-day? I work for Facebook. When I started working there almost 4 1/2 years ago, what I did was, I was a production engineer, which is, essentially, like a person managing complexity and making sure that all the software written by, I don't want to say naive, but like the sort of feature-focused software engineers, actually runs at the scale that we need it to. So I started with the cache infrastructure. Then I moved to automated remediation of alarms that we were getting and whatnot. But, all of this was using Python. So, with my core development background, I was always just putting my nose into other peoples' problems to try to help them out with Python issues. Especially if you heard that Python sucks for some reason. Try to not take it personally, but if somebody really is commenting negatively on something that you personally worked on, then you want to know why. So, I was slowly making this my job and, two years ago, we actually got a team formed around this idea, and now I lead this team as a tech lead. It's called Python Foundation and it's essentially managing the runtime for both Facebook and Instagram. That is such a cool thing. That's sounds like such a fun job.
07:43 Łukasz Langa: What we are doing is, we're trying to actually make Python, which is one of the most popular languages at Facebook, probably in the top three, to really feel like a first-class language. So, to do this, you cannot really stick to a ancient version of it that's going to actually be end-of-life in two years. So, one of the core missions behind this team is to move the entire company off of Python 2. That we would like, means we would like to move everything to Python 3, but also, if people move to other tools that strong in the meantime that work better, we're fine with this. But we really want to get rid of Python 2 as a thing. We managed to help Instagram move to Python 3, which I think is a pretty big deal. It's over a million lines of code. And we did this in the year of the biggest growth for Instagram, both user-wise and feature-wise. This is when stories were launched, when ranked feed was launched and we didn't have any incidents. We didn't go down while doing it, so I think, if anything, that should be a very big reassurance to other people struggling with this now that it is, in fact, possible to do and it is worthwhile.
09:02 Michael Kennedy: Is this the same basic story that was told at the keynote at PyCon 2017 around Instagram upgrading?
09:10 Łukasz Langa: Yes, I was actually in the team that worked on this. So, when the keynote was prepared, I was one of the reviewers of the keynote when Lisa and Hui were practicing runs for it. So, yeah, I've known the keynote almost by heart before it went live at PyCon. Yeah, it was also a pretty personal story for us because we spent a pretty significant time and effort on it but it was totally worth it.
09:36 Michael Kennedy: It was such an inspiring story and, when I think of other companies and other people saying, "Well my project's too complicated to move to Python 3" or, "We can't possible do this. It's too much effort." I look at what they did on a single branch, switching Django versions majorly and switching Python versions. That was just incredible, it was awesome.
09:58 Łukasz Langa: Yeah, pretty much you just need to know what you want to do and how, and I think the key to success there was the process that we took. So, pretty much, if you just have an idea to move to Python 3 and you just start randomly stabbing at it, you are more likely to fail. But if you actually figure out, "How are we going to do this?" It's a tractable problem. You can absolutely pull it off. And now, with enough projects, actually going through this transition, there's a lot of resources, like the keynote that you mentioned that other people can just address and, I don't know, listen to, watch to see what were the processes that actually worked out for a big project.
10:40 Michael Kennedy: Yeah, could you maybe just give people the really high-level steps they went through? So things like, kept it on Python 2 in production but started doing the testing in two and three. I think the steps were really nice. You remember them?
10:52 Łukasz Langa: We had to realize that we need to make all the code work on Python 2 and Python 3 at the same time. So, we embraced Six as a library to actually write polyglot code inside this delivery which is the big Instagram back-end repository. Once we had this, we pretty much had to start testing. And to start testing, we whitelisted a small amount of unit tests that we new are passing on Python 3. And, gradually, we were just extending this whitelist when those modules that were being unit-tested were made compatible with both Python 2 and Python 3. We were gradually just extending this whitelist. What happened at some point is that this whitelist was big enough that we could switch to a blacklist. And so, there were just some straggling tricky places that we needed to address. What I missed is, before we even started, we had to upgrade to a newer Django version because the one that we were using was written long before anybody thought of Python 3 compatibility. So that was like a prerequirement. Fortunately there was a version that supported Python 3 and most of the other dependencies that we had were also already ported by the time we started doing our internal transition. So, pretty much the unit tests were important. Once those unit tests were in good-enough shape, we started pretty much running a Python 3 version on developer boxes. So, Instagram, and, I think, most of Facebook, doesn't really work on the laptops that you're getting. Everybody's working on their assigned developer server that they have somewhere. Mine is, for example, in North Carolina. So pretty far from where I am, but doesn't really matter. You just work on it. It's sort of your computer. You work on a terminal anyway. It doesn't matter where that console actually is. So we switched people to just run on Python 3. We did notify them that this is happening but, for them, it should be, it should be a noop. It should just work. Obviously it didn't for all of the cases, but that's the thing. If you treat Python 3 incompatibility as a bug that needs to be approached to actually fixing it is quite different than when you're just seeing it as an intractable problem that is unlikely to ever work. Then you just complain and you throw your hands up in the air and say, "I don't know what to do." So when we said, "Hey, any bug that you see, we just need to fix it." By then we had a pretty extensive wiki page on typical issues, how to solve those. So once we were comfortable running the entire app on developer boxes on Python 3, we started shadow testing, so with this sort of fleet that we have in production for Instagram, obviously you can do A/B tests. You can release stuff in one cluster of machines and not the other and whatnot. That was important. But even if you have a smaller-scale deployment, I think you should never just release on a single server so you have some sort of load balancing. You have some sort of way of releasing gradually. So we just started minimally releasing Python 3 and seeing what happens. So we saw some tragic performance regressions and later we found out that either it was some library that was very poorly ported or something stupid that we did, or actually actual problems because Python 3 cached differently and we have to switch to do something a bit different. In the end, we cut down the memory usage of our Python processes by 1/3 and cut down the CP usage by 12%. So, for just switching a Python version, I think, in our scale, that was a worthwhile investment right there.
14:41 Michael Kennedy: That's a really cool story and I think it definitely serves as a cool roadmap for people going forward. And it's only going to get better. Like these new web frameworks and API frameworks are, a lot of powerful Python 3-only ones. I'm thinking API Star and some of the async-enabled web frameworks that are only accessible to these newer platforms.
15:02 Łukasz Langa: Actually Instagram is now looking at marrying Django with asyncio in ways where we can utilize two. This is not an easy problem to do because Django is just built around the idea of single process for our requests whereas I asyncio is totally the opposite, using coroutines to concurrently serve many requests. But in the end I think this is going to be a transition that is sort of radical because asyncio is viral in the sense that, to actually use it, you have to pretty much just give up using any blocking APIs. Like, if you want to have native asyncio, you have to switch to using coroutines and non-blocking APIs everywhere. But, the alternative is really just switching to Go or Rust or whatever else. And this is what many teams are pondering. We actually want to have better performance. Python doesn't give it to us, so let's just switch to a totally different infra and to a totally different language that has a different set of compromises that they might not even fully realize before they start using a language fully. So, I think, instead of burning all the bridges, we can burn some of the bridges and switch to asyncio to actually enable performance that we haven't really seen much in Python. There was, obviously, Twisted, but it was a pretty sort of separate community for the longest time. And I hope asyncio is going to be more mainstream than that.
16:32 Michael Kennedy: For sure and a lot of the libraries that are just out there, the packages, it's very likely they standardize on async and await and so they would just plug into these Augustine things. It's definitely exciting. The final question on this before we get to the official topic. This is really, really interesting. You mentioned being radical. Do you happen to work with Jason Fried, as well? Is he on your team?
16:51 Łukasz Langa: Yes.
16:52 Michael Kennedy: He gave such a great talk called Rules for Radicals: Changing the Culture of Python at Facebook.
16:58 Łukasz Langa: He was actually one of the people that made me stand up and try to make the Python situation better at Facebook. He's been at the company two years longer than me. So, he was always one of those guys that we were just working with as a grass-roots movement before I had a team that does this full-time, which is why when my director asked me, "Who would you see on your team?" Jason Fried was like the first name that I gave him. That's the right person to do this job. He's just enough of a mixture of being rational and a fan of the language. So, I think you have to be invested, but at the same time, you have to recognize the limitations behind the technology. He, just he's the right sweet spot.
17:47 Michael Kennedy: Yeah, and that's cool. The reason I bring that up is I think his talk, which I'll link to, the Rules for Radicals, about changing the mindset to wanting Python 3, to making it sort of the default behavior and within a large organization. And then the keynote from Instagram is the concrete steps that you take to actually make that happen. I think put those two together and any organization can, that's on Python 2 can, pretty much find a roadmap there.
18:12 Łukasz Langa: Yeah, we hope so.
18:12 Michael Kennedy: Yeah, I do as well. All right, speaking of Python 3. One of the really cool features. What was this, 3.5 when it came out? The type annotations.
18:22 Łukasz Langa: Uh huh.
18:23 Michael Kennedy: All right, so this is PEP 484. Tell people, for those who don't know, what it is.
18:27 Łukasz Langa: Starting with Python 3.0, we had a feature, a syntactic feature to apply annotations to function arguments and return values. It was always envisioned by Guido to be fundament to build static typing for Python. However, at the time, it was very unclear what that meant. So, he pretty much left this as an exercise to the reader to come up with a sensible syntax and a type-checker for Python. And there were a few toy attempts at this, but, fundamentally, nothing caught on and there was no big advancement there.
19:08 Michael Kennedy: Yeah, like you could do like a docstring-type thing. There's a couple ways. You could do like a type colon and it works on some tooling and not others.
21:24 Michael Kennedy: Wow, okay.
21:25 Łukasz Langa: Yeah, and until this day, that's the type-checker that we're using for this. So that's like this sort of ancient history in the project. I visited Guido, working at Dropbox at the time. He visited the Facebook campus. That was like, I guess, four years ago or more now. Yeah, so we started actually filling this gap.
21:49 Michael Kennedy: This portion of Talk Python to Me is brought to you by Linode. Are you looking for bullet-proof hosting that's fast, simple and incredibly affordable? Look past that bookstore and check out Linode at talkpython.fm/linode. That's L-I-N-O-D-E. Plans start at just $5.00 a month for a dedicated server with a Gig of RAM. They have 10 data centers across the globe. So, no matter where you are, there's a data center near you. Whether you want to run your Python web app, host a private git server, or file server, you'll get native SSDs on all the machines, a newly-upgraded 200-gigabyte network, 24/7 friendly support, even on holidays, and a seven-day money-back guarantee. Do you need a little help with your infrastructure? They even offer professional services to help you get started with architecture, migrations and more. Get a dedicated server for free for the next four months. Just visit talkpython.fm/linode.
22:43 Łukasz Langa: The main goal behind this was always to provide annotations to people so that this sort of semi-formal docstring syntaxer would not, We wanted to put it a right place and annotations is just the right place for it.
22:58 Michael Kennedy: Yeah, it definitely is. One of the things that I don't like about the docstring style is, if you have like four arguments, the docstring becomes, I don't know, like eight lines. It's got the name and then the type and it just, it gets really long. And if you've got a function that is three lines long and you put this huge docstring in it just so you can see the type, it's like you've almost made it less readable. It's a real big trade-off at that point anyway, to put that extra stuff in there. Whereas, if it's just a little bit, you know a colon int colon str type of thing at the end of your variables, it's much more compact.
23:35 Łukasz Langa: I agree with you, so that's one concern. But the bigger concern is really just, most comments in your code base are going to be wrong and are going to be essentially lies after some time. So, it doesn't actually take very long to those docstring-based types to get out of date with just small changes to your code base. Small diffs that people need to fix an issue or introduce a small feature. So, for us, the human factor was very important, but also without the help of technology to tell us that, "Hey, this annotation is out of date now." We would know that those annotations are bound to get useless after some point.
23:35 Michael Kennedy: Yeah, they're worse than useless 'cause you would trust them maybe, even if they're wrong. They're misleading.
23:35 Łukasz Langa: Oh, yes, yes, that's really true.
23:35 Michael Kennedy: Yeah, so I think maybe one place we should start a conversation is, what's the real benefit of these type annotations? On sufficiently small projects, maybe you don't need them. I do find them to be really helpful at certain parts of my code just to help editors and things like that, but you know I have this really nice example of a function. It was called processAll. It took in items, and it just had "for i in items: i.children.process" calls the function on it. Even though there's only three lines, it's PEP 8-compliant, it's completely, it is nearly impossible to make sense of what its doing.
23:35 Łukasz Langa: Yes, so, Python programmers really like to have concise code. And that concise code pretty often uses very generic names for variables and methods. When you do, when your method is called "process" and then you grep for it in your big project, you're going to find that there's maybe 58 of them and actually figuring out what is exactly called, argument that you pass that is just called items actually is. You can sometimes get it from context, you can sort of assume what it is, but you can never be sure.
23:35 Michael Kennedy: Yeah, you might throw a print type of items. Print and it just, all right, what the heck is? I'm just going to do some print statements. This is out of control. What is this thing, right?
23:35 Łukasz Langa: Yes, totally. So those very generic functions tend to be misleading 'cause different people are reading this code and you are a different person six months from now. So even if you wrote this function, you might be mislead by what you wrote some time ago. So, this is one of the fundamental problems with having a very dynamic language. And people will sometimes say that it doesn't matter for a small project. But, even a small project just gets out of your head after some time. And when you're coming back to it, like fixing a pull request that somebody gave you as a puppy, as a gift. So, you need to put all this information back in your head. And when you have to follow how the types actually work in your project, you have less space to actually review the change. So, fundamentally, like function annotations, type annotations are just a way to cut this short so that you don't really have to keep the entire program in your head to make informed decisions about what you're doing with your function.
23:35 Michael Kennedy: That's interesting. It's like a form of distributed cognition. Like more of the thinking is stored on the page and it leaves more space for algorithms and consequences and stuff. Yeah, yeah, pretty cool, so let's talk about first where these appear, where they're useful. So they're useful in editors. They're useful in continuous integration. They're useful in upgrading. What are some of the tools around all of this? For example, if I want to do a check to make sure my code is hanging together. You mentioned Mypy. That's pretty much the primary tool.
23:35 Łukasz Langa: This is not the only too, but it's the, sort of the, I don't know, like all but the official type-checker for Python by now. It's Python organization on GitHub. It has the most, I guess, manpower behind it. It's the most mature. So pretty much everything standardizes around it. But it's not the only one. So the point of PEP 484 was not to create a small walled garden of a single technology, it was more of a standardized syntax so that any piece of technology that wants to do typing, can share. And we share it with, for example, PyCharm which is the most advanced IDE that we have for Python. It does implement its own form of type-checking that is using exactly this syntax. It's sharing the annotations for third parties in the same library that we keep in the Typeshed project. It is kept separate from Mypy just for this reason so that other projects can use it. There is a project by Google which generates types by inferring what your code base actually does, which is called pytype. And, again, that uses exactly the same syntax that we formed with PEP 484. So there's number of projects that sort of revolve around typing, but as far as type-checking goes Mypy is the go to type-checker that we have for Python.
23:35 Michael Kennedy: That's interesting. I didn't realize it was so baked into PyCharm through Typeshed and stuff like that. So, we'll talk about Typeshed. That's pretty cool. Yeah, so I find that it's useful for adding to your program, to add another level of check at certain levels, perhaps. As you cross, say, to a data layer, in and out of a data layer, for example. That's pretty helpful. The continuous integration is really important. Yeah, do you know what editors support, I know PyCharm does, but I don't know what other editors take this into account.
23:35 Łukasz Langa: Mypy, as a type-checker, was formed around the idea that it's a almost full-fledged Python interpreter. So it analyzes the entire program. And, for the longest time, it actually had to spend the time to analyze everything, which takes time. It is also written in Python so it is not the most efficient thing that we could actually come up with. But, it was very important that we can move it very fast because we were only really learning about all the edge cases of the Python-type system when we working on PEP 484 and, later on the tongue. So, it is not the greatest technology to use within an editor, which is why PyCharm really implements its own thing. If you are using an editor, you really want a type comments base to tell you what available methods you have. And you don't want to wait 30 seconds for an answer there. You really need something right away. Same for just telling you whether you have any type errors. You would like the curly red line to appear right away as you're typing something wrong and not after three minutes. So, that way, Mypy was only being gradually made compatible with this use case. I don't think it's there yet. But there are features being implemented towards this goal. There was an incremental mode introduced at some point where modules that were analyzed for type information were kept in a database, essentially a bunch of JSON files, so that we didn't have to analyze them again if they didn't change. Now, there's a mode introduced to Mypy where it's going to live as a daemon that is running on your process so you don't even have to restart Python which, on its own, just starting up the entire Mypy type-checker takes around a second. So, you're just counting down on this is already a win. And then reading all the types.
23:35 Michael Kennedy: Could it do like a continuous analysis, just watch all the files and just in the background, analyze it and periodically report its output or something like this?
23:35 Łukasz Langa: That would be the point. How that exactly works? I'm not sure yet. We are not, in fact, using this feature yet. We've only recently adopted incremental mode, which was pretty experimental and, at times, unstable. So, Mypy is a living project. But it's being actively developed by a group of, I think, four full-time developers at Dropbox and a bunch of volunteers are on the project from outside of Dropbox. So, this feature is pretty much very new. I do hope it's going to work like you're describing. Since this is exactly what then enables a language server protocol in say, Adam, or Visual Studio Code, to actually talk to the type-checker and get typing information right away. What we have today is, there is a Flake8 plugin that I wrote that is a sort of basic version of Mypy, which is, "Let me just tell how am I doing file." What does that mean? If you're doing full-program analysis and you're importing stuff from different files, you're going to know their types. You're going to be able to tell whether you're using an API right or wrong regardless of the file it's in. What that requires is full-program analysis that takes, sometimes, minutes, right?
23:35 Michael Kennedy: Yeah.
23:35 Łukasz Langa: So, instead, we can run Mypy in a special mode that just says, "Assume every import is fine. Whatever I'm importing I'm using it correctly, but just look at my functions in the file that I'm editing right now." And it turns out that this can be done in around a second but that's as long as Mypy's process starting pretty much. And because of Typeshed, which is our collection of types for the standard library and a bunch of third parties, we can still provide very meaningful information about how you're misusing some built-in type or some built-in library. So, for example, like the example I always give is, newbies very often confuse sorted and do the sort method on a list. Like they would think that the sort method also returns something, but it doesn't. This very simple plugin for Flake8 will already tell you that, "Hey, you meant to actually use sorted and not use that sort because that sorts in place and it doesn't return anything."
23:35 Michael Kennedy: Yeah, and those are the kinds of things that can be really helpful to get picked up there. So, let's take a moment and talk just about the syntax real briefly. So, there's a real simple version, like I say, a variable, then colon, then the type. So I could say id:int or name:str. That's totally straight forward but as soon as it gets a little more interesting and you actually have to bring in the typing module. So if I want an, maybe I want to return a user, or it might be empty, it might be None because there's no user at that id. So you might have a optional user. You might have a list of optional strings. Like these are pretty interesting. So do you want to talk a bit about the typing module?
23:35 Łukasz Langa: Obviously go beyond just simple classes. So, as annotations, you can use any built-in type, any user-defined class. But beyond this, you start having complex types like you mentioned, optionals, which is, this actually is usually an int, but maybe it's a None. Maybe the user just didn't provide it at all. Or, maybe it's bytes and maybe it's a string. So you want essentially what we call a union of multiple types. You can have other things like, "I want a list, but I want to specifically tell you that this is a list that holds just strings." So, these are collections with generics. And the built-in collections in Python don't support generics because the runtime doesn't really work like this. Compared to statically-typed languages, Python really implements classes of just factories of objects. Those objects just have attributes on them, and, as long as you are calling the right attributes, like calling the right methods, everything is fine and the runtime doesn't really care what particular type an object has. So, if you want to actually have this as a feature of the type system, then well we have to create our own versions of the built-in collections that includes ABCs, that includes things like all the things in collections, like Ordered Dictionaries, like Named Tuples. Everything that essentially you can instantiate in a standard library, there is a generic variant of it. So for this reason, we have the typing module that you import those complex types from. There's many other complex types like, any, which essentially tells the type-checker that, "I don't really know what this is." Obviously any is a name that describes sort of your state of knowledge. It doesn't really say that any type is going to be fine. It just says that, "As far as I know, whatever is passed should be okay." That's pretty much a way to silence the type-checker. So this special any type is also in the typing module.
23:35 Michael Kennedy: This portion of Talk Python to Me is brought to you by, us! As many of you know, I have a growing set of courses to help you go from Python beginner to novice to Python expert. And there are many more courses in the works. So please consider TalkPython Training for you and your team's training needs. If you're just getting started, I've built a course to teach you Python the way professional developers learn. By building applications. Check out my Python Jumpstart by Building 10 Apps at talkpython.fm/course. Are you looking to start adding services to your app? Try my brand new Consuming HTTP Services in Python. You'll learn to work with RESTful HTTP Services as well as SOAP, JSON and XML data formats. Do you want to launch an online business? Well, Matt Makai and I have built an entrepreneur's playbook with Python for Entrepreneurs. This 16-hour course will teach you everything you need to launch your web-based business with Python. And, finally, there's a couple of new course announcements coming really soon. So, if you don't already have an account, be sure to create one at training.talkpython.fm to get notified. And for all of you who have bought my courses, thank you so much. It really, really helps support the show.
23:35 Łukasz Langa: So there's number of features there so whenever you need a situation like a union, like an optional type, like generics or whatnot, you would use this. Generics are special because sometimes you really want to say, for example, "I don't care. We're taking it as an argument and I am returning this same type."
23:35 Michael Kennedy: Yeah, that one was surprising to me. That one was surprising to me because I hadn't seen that before. I know you could have a concrete generic, like a list of strings, which sort of specify like, "It is a list and its internal type is this." But, to say it takes a list of T and it returns a T, that was a pretty unexpected thing I saw coming out of the typing module. That's cool.
23:35 Łukasz Langa: That's pretty much like a very basic version of sort of templating for Python, but it's fundamentally very often used. So, it's very often that you would have a function that operates on a collection and, I don't know, like returns the first truthy value of it or whatnot. And, like just typing this, would be impossible without a typed variable, so this is where they come in useful. There's a number of other more-advanced features. So, there all documented on docs.python.org. But essentially the necessity for the typing module comes from the fact that there's more to types than just simple classes.
23:35 Michael Kennedy: One of the things that also surprised me when I first stated using these, and it was 'cause I was getting an error when I had a method that I said return. Like let's say a user, and I was returning None when the user wasn't found or if the id was improperly specified. And I was getting an error saying, "You can't return None when you say return a user." And then I realized, eventually, you have to do optional user if you're going to have None. And most languages don't distinguish between a pointer type and whether it's nullable or not. Maybe some of them do for value types. The one that I do know that does that is Swift. So, what was the thinking around this concept of just actually making it explicit that you have to say its only the type but we guarantee it's not, or at least we proclaim that it's not None, it actually points to a real object.
23:35 Łukasz Langa: I personally knew about two languages that approach this problem from the opposite ends. So, Java, for example, doesn't type-check for null and nullpointer exceptions are sort of the bane of existence of a Java programmer because this word compiler is not really helpful. You need to figure it out on your own. And the opposite thing was Hack, which is the typed PHP version used at Facebook, actually has this concept and it turns out that that is the most popular class of errors found by the type-checker, where the user of a function doesn't expect it can ever return null, but for some reason, it does. So, it was very natural to me to introduce this for Python, especially that with the logging information we gather from running Instagram and other systems in Python at Facebook. We knew that Attribute Error: 'NoneType' doesn't have some attribute is a very, very popular exception that stems from the fact that sometimes, I don't know, an API call doesn't work or some helpful function tries to just not raise an exception. Actually, raising meaningful exceptions is the Pythonic way to do this. If you are unable to fetch a user raising a lookup error is the more natural thing. Its going to read better when somebody is faced with this sort of problem. It is what all the internals of Python do itselves. So we have dictionaries doing exactly this and so on and so on. So this is sort of what the typing gently nudges you to do because putting any sort of type union including optional which is essentially a union of your type, and a None as a return value, actually makes using your function so much more difficult. Any user of your function now has to check whether the return value of your function was None or not. And this is pretty painful pretty fast, especially if the situation in which your function can return None is very unlikely. People are going to complain very loudly that, "You make me check for this stupid None value. I know that it will never be None in production. But it can ever be None only with the mock-up database or whatever." So that actually makes you think, "Maybe I should not change the API so that None doesn't appear there at all." And I think, ironically, the very verbose nature of the Optional type makes you think twice whether--
23:35 Michael Kennedy: Yeah, it definitely does make it long. And if it's for parameters that you have many of 'em, it gets even worse. Yeah, yeah that's interesting. So you talked about finding errors in production and stuff and at your presentation you spoke about Instagram and the sort of success story that you guys had in terms of actual runtime errors. And maybe they're unexpected.
23:35 Łukasz Langa: Yes!
23:35 Michael Kennedy: The unexpected results you got. Could you maybe cover that real quick?
23:35 Łukasz Langa: When you're adopting types you want to see the value that they give you. You want to recognize whether it was worth the time. So, first of all, we, as the authors of PEP 484 believe that putting type information, even if you're not doing anything else with it, is already worthwhile because its a form of documentation. But with additional type-checking you want to see that actually there is a change in the number of errors that you see in production.
23:35 Michael Kennedy: Right, you'd like to see runtime errors become continuous integration errors instead. Or, even before then.
23:35 Łukasz Langa: Yes, so you're looking for some sort of metrics that you can look at to prove that this entire effort makes sense. So the simplest thing that you can do, obviously, is just track the adoption. What is your adoption? So we obviously did that and now Instagram is close to 30% typed functions. So, pretty much at this point where we already see a lot of value, this is not something that you see from day one. If you just type a bunch of functions, you're going to maybe find a bit of problems in those particular functions, but for the typical Instagram developer, for the typical engineer, they will not really see how that changes their life. But as soon as your north of 10% of functions, random people start noticing type errors that the type-checker tells them about before they ship something. So, a metric that I was very interested in is how is this going to affect the, I don't know, average number of attribute errors and type errors that happen in production? Are we going to see fewer exceptions?
23:35 Michael Kennedy: Right, and those two that you named, those would be the types you run into when you assume there's one type but it's actually the other. You thought it was a list but it's a dictionary or something like that.
23:35 Łukasz Langa: Yeah, so type is in the special case of the attribute errors, mostly the None type, right, when you try to do something with a None type that it's not prepared to do. So we wanted to see whether there's going to be fewer exceptions after adopting typing and suddenly this correlation was just not seen. We couldn't really detect that this is very easy to prove that, "Oh typing helped us with lowering the floor of exceptions at runtime". But what I didn't personally notice and Carl Meyer, who was pretty much spearheading the typing effort at Instagram, he noticed that, "Yes, it's not about the sort of floor of exceptions that pretty much describe mostly very unlikely scenarios that happen for an unlucky user of Instagram. Its more about shipping a bad change." So, it's about those very short spikes of type errors and attribute errors that just go out after an unsuccessful change. So, we had some number of them in the past and now that number is almost 10 times lower. So, it's 10 times less likely that you're going to ship some bad dev to production that introduces a type error than it was before. And this is a metric that was hard for me to notice from just looking at graphs in a linear fashion. But, in fact, yeah, that pretty much proves that this actually impacts quality in exactly the right way.
23:35 Michael Kennedy: Oh yeah, that's a really interesting one. And those are the releases that you're like, "Oh no, it's crashing." And you're just freaked out and you're scrambling to roll it back and those are the worst kinds of errors, not the one's that happen in one in a million, but the one where it happens one for one.
23:35 Łukasz Langa: Yup.
23:35 Michael Kennedy: Yeah, pretty interesting, pretty interesting. So, let's talk for a moment about this concept of gradual typing. For example, you said you guys are really successful and you've got 30% of a million lines of code, the functions there have typing or something to that effect, and I find this in my code as well. I love type annotations but I don't annotate everything. There's like a core set of functionality and this is really what I want to annotate. This is really important that this is clear. But this other part it can kind of, just derive the benefits from having the other stuff really stable. So, you want to talk about the rules of gradual typing and how the order actually affects what is caught and what is not? That's surprising.
23:35 Łukasz Langa: Yes, but even if that were your dream, if you wanted to actually type the world, you have to start somewhere. And if you cannot reap the benefits until everything is fully typed, then, pretty much, the feature is useless for the longest time and I think people would get discouraged way quicker than they would see anything worthwhile from it. So, gradual typing essentially is this notion that you can slowly annotate function by function and by doing this you're just increasing the footprint of typing and increase the usability, usefulness of the project. So, the ordering there is important in one important way. So, I would advise everybody to look at how their function call graph looks like in their program and start annotating from the functions that are most used, are very deep in the stack, like everybody calls them. The reason for it is that once a function like this is annotated, all users of it can be validated whether they are using this function correctly or not. If you didn't annotate this very central function and went on and annotated a bunch of leaf functions then you might not know whether they're correct or not. And the reason why not is that, as long as a function is not annotated, the type-checker necessarily has to assume that anything is fine. Any argument type passed is okay. The function can also return any type from it. So, pretty much, that means it's going to stay quiet regardless of what you're doing. So if you are annotating your core function first, you're going to get the benefit of being warned about invalid usage way faster than if you would actually wait with this core functionality to the very end. It gets even worse. If you do this, then that might cause errors to appear on some functions that you didn't touch. Like you annotated a core function and suddenly you see 40 new errors from the type-checker on functions that you don't even know about. But these are the functions that we're using, what you just annotated, and they were using it wrong. So now you are faced with the problem, "What am I supposed to do? "Am I supposed to fix all those 40 functions? I didn't even know that we had functionality like this and now the type-checker yells at me." So, the right ordering can save you a lot of time and a lot of stress with actually making the adoption smoother.
23:35 Michael Kennedy: It's a good point. You have a really nice graphic in your talk which Ill put in the show notes of course. And it looks a little bit like the game called Whac-A-Mole. You hit a thing and it pops up another play. You know what I mean. You just, every one, you fix one and then two more errors pop up. You fix those, one goes away, another pops up. And it's sort of like, as you add this type-checking the pieces that were just ignored before are now actually getting validated. So, it can be a little bit funky like that, so, like you said, I think starting at the right levels, the important functions, and then sort of slowly build your way out, is pretty nice.
23:35 Łukasz Langa: Yeah, so there's ways to automate parts of this. So the pytype project that I mentioned can infer type annotations from just looking at your project. It is pretty Python 2-centric still, so it might not work on the latest Python 3.6 features or whatnot. Like that's sort of, your mileage may vary. Always patches accepted. But it can actually go a long way to create this initial body of annotations for your big project. It does some sort of magic that you might, or might not agree with like figuring that, "Oh you are using an append method. Within your entire program, the only type that has an append method is a list, so I guess what you're using here is a list." All sorts of things like this. So this is what inference is all about. But it's actually a very worthwhile project that sort of boosts adoption of types in new projects. What you can also is you can maybe gather those types at runtime. At some point I thought it was a crazy idea that would just slow everything down and it would never work on unit tests because you're mocking stuff so types are different. You would also have issues with types being returned as those massive unions of 50 things and whatnot, so I had this pretty apocalyptic view of this that that would never work. And, usually, when somebody says that it's impossible, somebody else that doesn't know this is going to just go ahead and implement this. This is exactly what happened at Instagram. We had Matt Page and Carl Meyer working on this project. It's open source now. It's called a MonkeyType that does exactly this. It hooks through your program, records the types of arguments to functions. It records the return types as well. And then generates the typing stubs from what it gathered and you can apply those types back to your code base. So that way you can pretty much just remove a lot of the work, the initial work that has to be done. And even though I envisioned garbage collected that way, in fact, it turns out that most people don't actually use Python in crazy dynamic ways all the time, because that's also very unreadable. And Python is all about being runnable pseudo code. It has to be readable. So the types are, for the most part, very sane. You can use them and, pretty directly, just apply them back and you're done. We had a very big spike in typing adoption at the time where we started using MonkeyType since it's actually producing very high-quality types. Sometimes it's funny. So sometimes it will tell you that this option argument that has a default value of None, has a type of None, which essentially means that you have some very special optional argument and nobody ever uses it. In the entire code base, nobody ever actually populates this optional argument. So you might as well just remove it.
23:35 Michael Kennedy: All right, so you could just get rid of it. That's right, how funny. Yeah, there is a bunch of crazy ideas and those all do sound pretty interesting. Another one has to do with actual performance optimizations like actually going, "No, this is a list and so we're going to do some kind of shortcut or something to that effect."
23:35 Łukasz Langa: Yeah, so originally, not only me, but I think Guido as well, we thought that, "This is a dead end. We're not going to be able to do any thing useful there." The reason for it was two-fold. First of all, we saw that Python, its runtime, doesn't actually utilize typing information at all, it just tries to find attributes on your objects and does things with them. And, actually, the most performant Python runtime that we have, PyPy, is all about dynamically finding what are you doing and it's being able to find this in ways that are way more precise than type information that you put in will ever be, because, very often, the types that you're going to describe is that, "I want to have an iterable of string. I don't care what iterable that is. It should be an iterable." So that is not very useful for PyPy. What would be useful is, "Yes it's an iterable, but ... or tuples." And so that way it can actually put guards and JIT things away and it becomes way faster. So we were very negative in terms of seeing value in this. But this is exactly what Cython does. And Cython can sometimes accelerate your function by like 20, 50 times by knowing that, "Oh this is only ever a string, or This is only ever an int." so I can maybe not even box it in a Py object and just do C-level computation that way. So, combining this information with and ahead of time compilation step is what is very interesting. And I talked with Jukka Lehtosalo, the original author of Mypy, about this idea during the last PyCon, and he has a project that is sort of spearheading this for Python. So, I really do hope that by this PyCon we'll probably hear from DropBox that, "Hey, this actually works out. This actually accelerates Python in this sort of automatic way." So I don't know if you remember--
23:35 Michael Kennedy: Ah, that would be awesome.
23:35 Łukasz Langa: I don't know if you remember, but originally PyPy started out as this crazy import that you just put in your project, like import cycle. And, suddenly, your code became way faster. You didn't know why. So, we might actually be back into this world where, you don't maybe even have to perform any imports in some future Python version. But we are actually going to attempt to do some ahead-of-time compilation for you and type information is going to be useful there.
23:35 Michael Kennedy: That would actually be really, really interesting. So, I'm looking forward to that. All right, we're getting sort of short on our time, so I want to just cover one more really quick thing and maybe just leave it there for the type annotations. It's really awesome work and the more I use them, the more I like them. But you have one other interesting piece of news to do with just Python more in general and you.
23:35 Łukasz Langa: Yes.
23:35 Michael Kennedy: Yeah, so you were just chosen as the Release Manager for Python 3.8 and 3.9. And 3.7's coming really soon, right? So, you're on deck and you'll be up really quickly.
23:35 Łukasz Langa: Yes, so pretty much the development of Python 3.8 just started just yesterday. So, yes, it's going to be developed for the next 18 months, pretty much. And Python 3.7 is in beta stage now. What it means is we don't add new features to it anymore. Or we're going to pretty much harden it now. Find all the possible bugs and problems with whatever we implemented at this stage. Release for betas, then release, hopefully very nice release candidate that we can then bless as the gold version. If not, then there's going to be another release candidate. And at some point we're going to release Python 3.7. It sounds like this is very close now, but, in fact, that is going to happen late June. So, the beta stage actually takes quite a bit of time. But, yeah, this is how a mature project like Python operates, so like with Python 3.8, the beta stage and the layer release candidates or whatever, are going to happen after PyCon in 2019. So, this is going to be quite a while from now unless we change how we do things which I might sort of influence a bit. Like, this is the timeline for the Python project. That sort of stability is good for the average programmer. Because the average programmer doesn't want to have backwards-incompatible changes all the time. He's not interested in some subtle new features all the time. Being able to run code that you wrote 10 years ago, is a very important feature. And I don't think Python did the greatest job at this with the Python 2 and Python 3 dichotomy. Like with a lot of smaller changes that end up being incompatibilities, I'm always amazed how, like Java was able to pull this off and was still able to just, perfectly fine, compile projects that I wrote in college and they still work perfectly fine all these years later. On a different platform, on a different Java version, it's still just okay. So, we do hope that from now on, there's not going to be a very far-off Python 4 that breaks compatibility in crazy ways again. So we pretty much learned from this experience that, "Hey, we don't want to do this to people anymore." That doesn't work for anybody, including the stress that it actually builds on core developers. So, yeah, I'm pretty happy that I'm going to be up for 3.8 and 3.9. If I know my luck, Python 3.9 is going to be the last Python 3 version, so, again, it's going to become like the new Python 2.7 and I'm going to release it for the next 15 years, so--
01:00:36 Michael Kennedy: Yeah, this is like into retirement you're going to be workin' on 3.9 release.
01:00:39 Łukasz Langa: Yeah, so that might happen, but hopefully not. Hopefully it's going to be a gig that is going to end like eight years from now. So you have to understand, because of all the security fixes that you still release for old versions or whatnot, it's a pretty long-time commitment. But it doesn't take too much time a week, so I do hope I'm going to have to, I'm going to be able to pretty much combine this with every other activity I'm doing.
01:01:03 Michael Kennedy: I want to be cognizant of your time and not taking it all up, but, just really quickly, what features would you like to see in these new versions, 3.8 and 3.9?
01:01:10 Łukasz Langa: In particularly, I wrote the single dispatch, like generic functions, in Python and, ever since I was just poked by everybody, to actually go full-on on multiple generic dispatch. So, I think it's time for that and it would be nice for Python 3.8 to fully implement that. What else? Performance. It's sort of always a second-priority feature for Python.
01:01:36 Michael Kennedy: Maybe we'll see that performance optimization that you're talking about with types make its way into one of these.
01:01:41 Łukasz Langa: Oh, it would be great if that actually shipped in 3.8. That would be very optimistic for me to say that it will. But there's other areas of interest there like, for example, speeding up startup time. So, for command line utilities, for bigger projects that have thousands of files that are involved, startup time in Python is not great and could be improved. So, I would actually very much like to see progress on this. There were a bunch of crazy ideas, again, that are very likely to happen during the course print last September. They didn't quite end up being ready for Python 3.7, but it is pretty likely that they're actually going to land.
01:02:23 Michael Kennedy: That sounds really awesome and I'm lookin' forward to your overseeing that whole process. That's great. Let me hit you with the last two questions, the two questions before we get out of here. If you were going to write some Python code, what editor would you open up?
01:04:32 Michael Kennedy: That sounds really cool. It definitely sounds like a good remote work setup instead of just like sshing over, 'cause that sounds tough in terms of latency. All right, so, and notable PyPI package.
01:04:43 Łukasz Langa: Notable PyPI package. I think this is super underutilized and you should all use it. It's called attrs. Hynek Schlawack wrote it. It is actually a way of creating full-featured types in Python, so full-featured classes without all the boilerplate. So, by just specifying, essentially like a schema for your class, saying, "This class is going to have those fields." what you're getting back is a ready-made init method, a ready-made repr of being able to compare those objects of this class meaningfully. You can configure it to slots. You can configure it to create immutable classes and whatnot and whatnot. So, it is a very powerful package that sort of feels like next-generation Python. It removes a lot of the boring setting arguments to solve.argumentname and the init method and so on and so on. And, more importantly, it's always correct. So by removing the boilerplate that you create manually you make sure that it is going to be fine every time. So, I can't recommend this high enough. If you wish to wait for Python 3.7, this is getting included, a rewrite of it, essentially, is getting included in the standard library called Data Classes. But attrs is out there now. It is pretty mature by now. It's been maintained by Hynek for a number of years now. We use it extensively. I can't recommend it high enough.
01:06:14 Michael Kennedy: Sounds really nice and it's definitely a cool, cool project. Thanks for recommending that one. All right, final call to action. People are excited about types and the benefits to their, sort of upgrading their code, finding all these bugs. How do they get started? What's the final call to action for you?
01:06:29 Łukasz Langa: If you're afraid of types and think they don't fit in Python and they're not Pythonic, you should think about this. This is information that you were already putting somewhere in your documentation. Maybe in your docstrings and whatnot. Type annotations is a piece of technology that only formalizes where you're supposed to do this. And on top of this, it will help you to fix bugs and find future ones. That's great. That only makes it more usable for you. So in the Pythonic thing, it's still very new, but the tooling is now mature enough for actual adoption by random, non-expecting users. So, if you're afraid, just try it out, see how that actually looks for you. It's not going to cause your code to look like Java or Scala. It is still very much Python. It doesn't actually cause changes to how you code Python. I think you should make an informed decision basically by trying it out yourself.
01:07:26 Michael Kennedy: Yeah, it's great advice and I definitely second it. Łukasz, thank you for being on the show. It was great to talk to you about all this stuff.
01:07:32 Łukasz Langa: All right, happy to be here. Thank you very much.
01:07:34 Michael Kennedy: Yup, bye. This has been another episode of Talk Python to Me. Today's guest was Łukasz Langa and this episode has been brought to you by Linode and TalkPythonTraining. Linode is bullet-proof hosting for whatever you're building with Python. Get four months free at talkpython.fm/linode. That's L-I-N-O-D-E. Are you or a colleague trying to learn Python? Have you tried books and videos that just left you bored by covering topics point by point? Well, check out my online course, Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python. And, if you're looking for something a little more advanced try my Write Pythonic Code course at talkpython.fm/pythonic. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /iTunes, Google Play feed at /play and direct rss feed at /rss on talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now, get out there and write some Python code.