Gradual Typing of Production Applications

Episode #151, published Fri, Feb 16, 2018, recorded Sat, Jan 27, 2018

Episode Deep Dive Links Transcript

I hope you using Python 3 these days. One of its powerful new features is type annotations. This lets you build and maintain large-scale Python projects with much more ease and confidence.

This episode you'll meet Łukasz Langa who has help migrate some very large Python projects. We'll discuss how Python uses the concept of gradual typing to slowly expand the sections of your code that are type checked.

Episode Deep Dive

Guest Introduction and Background

Lucas Langa is a seasoned Python developer and member of the Python community who worked on migrating large-scale Python 2 codebases to Python 3 at organizations like Instagram and Facebook. He helped form and lead the Python Foundation team at Facebook, whose mission was to ensure Python was a first-class language across the company. He has been heavily involved in efforts to make Python typing (PEP 484) more robust and workable in real-world, large-scale projects. Lucas also served as the release manager for Python versions 3.8 and 3.9, playing a key role in shaping Python’s modern capabilities, especially around type hints and other language features.

What to Know If You're New to Python

If you're just beginning your Python journey and want to benefit from this discussion on type annotations and large-scale code, it's helpful to have a baseline understanding of Python 3 syntax and the concept of functions and classes. Knowing how docstrings work and having some experience reading or writing simple Python scripts will also help you follow the more advanced type-hinting conversation.

Python for Absolute Beginners: A complete introduction to Python, covering essential programming concepts, data types, and more.

Key Points and Takeaways

Gradual Typing in Python
- The central theme of the episode was how Python supports gradual typing, allowing teams to add type hints to a subset of their code without adopting it everywhere at once. This incremental approach is critical for large applications with millions of lines of Python code. Type hints help catch errors before they reach production and serve as inline documentation for developers.
- Links and Tools:
  - Mypy (github.com)
  - PEP 484 (peps.python.org)
Benefits of Type Annotations
- Type annotations provide clarity in the codebase, offering immediate benefits like improved IDE auto-completion, static analysis for errors, and better maintainability. Although Python remains dynamically typed at runtime, type hints can prevent shipping bugs by shifting many errors from runtime to development time.
- Links and Tools:
  - PyCharm (jetbrains.com)
  - Atom Editor + Nuclide (nuclide.io)
Python 2 to Python 3 Migration Lessons
- Lucas and his team at Facebook (and specifically Instagram) migrated millions of lines of code from Python 2 to Python 3 while Instagram continued to grow rapidly. They demonstrated that even large codebases can smoothly transition without major downtime by carefully planning the migration and gradually introducing compatibility layers (e.g., the six library).
- Links and Tools:
  - Six Library (pypi.org)
Ordering Matters When Adding Type Hints
- Lucas emphasized that you should start annotating core functions or modules that many other parts of the application depend on. This approach surfaces issues across your code more effectively than starting with leaf functions. Annotating widely called functions reveals incorrect usages in one step, rather than chasing errors file by file.
- Links and Tools:
  - PEP 483 (peps.python.org) – Describes the theory behind gradual typing
Using Tools like MonkeyType and PyType
- Tools such as MonkeyType (created by the Instagram team) can infer types at runtime by analyzing the arguments and return types as your code executes. Google’s PyType uses a static inference approach, scanning the codebase to guess types. Both simplify the initial adoption of type annotations for large or unfamiliar projects.
- Links and Tools:
  - MonkeyType (github.com)
  - PyType by Google (github.com)
Python Typing vs. Performance
- Initially, many Python core developers believed type hints would never boost runtime performance because Python’s runtime doesn’t natively enforce or optimize for them. However, experimental projects that leverage ahead-of-time compilation (or frameworks like Cython) show promise for performance improvements when type information is known.
- Links and Tools:
  - Cython (cython.org)
Advantages of Type-Annotated Code in CI/CD
- While the total number of runtime exceptions (like AttributeError) in production might not drastically drop, typed code showed a 10x reduction in shipping major bugs or regressions. Catching errors in continuous integration before they ever hit real users was a primary motivation for adopting type hints.
- No direct link here, but references:
  - Continuous Integration with Mypy
Type Checker Ecosystem and TypeShed
- The Python community has rallied around the PEP 484 syntax for type annotations, allowing multiple tools (Mypy, PyCharm, PyType) to share the same stubs for common libraries. These stubs are maintained in TypeShed, ensuring consistent behavior across different type-checking solutions.
- Links and Tools:
  - TypeShed Repository (github.com)
Release Management and Python’s Evolution
- Lucas also served as release manager for Python 3.8 and 3.9, giving him a unique perspective on the stability and new features introduced into Python. He reiterated that Python 3 aims to maintain strong backward compatibility and that the language’s big break was the 2 to 3 transition, not anything after 3.x.
- Links and Tools:
  - Python 3.8 Release Notes (python.org)
  - Python 3.9 Release Notes (python.org)
Practical “Type-Friendly” Pythonic Patterns
- The conversation highlighted practical patterns for using Pythonic code along with type hints, from using library-based code generation (dataclasses or the third-party attrs library) to setting up “polyglot” code that runs on both Python 2 and Python 3. These patterns aim to minimize repetitive boilerplate and reduce confusion among team members.
- Links and Tools:
  - attrs (a.k.a. “attrs” or “adders”) (attrs.org)
  - dataclasses (python.org)

Interesting Quotes and Stories

"I literally typed ‘Ruby alternative’ in Google and that's how I found Python." -- Lucas Langa on discovering Python when Ruby wouldn’t install on Windows

"If you treat Python 3 incompatibility as a bug, then you just fix it. That is a totally different approach from assuming it's an intractable problem." -- Lucas on how a bug-fixing mindset helped Instagram move to Python 3

"We didn't see fewer total exceptions, but shipping a bad change became ten times less likely." -- Lucas discussing the real impact of static typing on production systems

"Kids are very imaginative that way. It’s gotten me quite a while to actually get into real programming, but that first Commodore 64 moment was huge." -- Lucas recalling his early days

Key Definitions and Terms

Gradual Typing: A typing strategy where type hints can be introduced piecemeal into a Python codebase, rather than requiring complete coverage from the start.
Mypy: The de facto standard static type checker for Python (created by Jukka Lehtosalo) that enforces annotations defined by PEP 484.
Optional Type: A shorthand for union types including None. For example, Optional[str] means it can be a str or None.
TypeShed: A collection of type stubs for the Python standard library and popular libraries, used by type checkers like Mypy and IDEs.
MonkeyType: A tool developed at Instagram to dynamically collect runtime type information and suggest function annotations.

Learning Resources

Python for Absolute Beginners: Perfect if you’re new to Python and want a solid foundation before diving into more advanced topics like typing.
Rock Solid Python with Python Typing: Focused on typing, covering best practices for type hints and how to use them effectively in your code.
Effective PyCharm: Learn how to leverage PyCharm’s powerful features for working with Python typing and static analysis.

Overall Takeaway

Adding type annotations to a Python codebase can yield significant benefits in both code clarity and catching errors early, especially as projects grow in complexity and size. The episode shows that teams, even at the scale of Instagram and Facebook, successfully and incrementally adopted Python typing without sacrificing performance or productivity. Whether you use tools like Mypy, PyCharm, MonkeyType, or PyType, the key is taking advantage of Python’s flexible approach to gradually integrating type hints. This approach ultimately leads to fewer production issues, improved maintainability, and a more standardized codebase for both current and future developers.

Links from the show

Łukasz Langa on twitter: @llanga
Łukasz's presentation: youtu.be
Instagram keynote talk: youtube.com

Where to get help
Read this first: mypy.readthedocs.io
#typing on gitter
For PEP 484/526/544/536 issues: github.com/python/typing
Type checker issues: github.com/python/mypy
Standard library and third-party annotations: github.com/python/typeshed
Episode #151 deep-dive: talkpython.fm/151
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #151 deep-dive: talkpython.fm/151

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 I hope you're using Python 3 these days, because one of its powerful new features is type annotations.

00:05 These let you build and maintain large-scale Python projects with much more ease and confidence.

00:10 This episode, you'll meet Lucas Lenga, who helped migrate some very large Python projects.

00:16 We'll discuss how Python uses the concept of gradual typing to slowly expand the sections

00:21 of your Python code that are type-checked. This is Talk Python To Me, episode 151,

00:26 recorded January 31, 2018.

00:29 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the

00:47 ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter,

00:52 where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm,

00:57 and follow the show on Twitter via at talkpython. Lucas, welcome to Talk Python.

01:02 Hello.

01:02 It's great to be here with you again. We were just recently at PyCascades,

01:06 a nice little conference where you get to pretty much meet everyone who's there.

01:10 Oh, yeah. For me, it was a very personal trip since I used to live in Vancouver for quite a while.

01:16 Western culture doesn't really allow grown men to cry publicly, but shedding a tear all the time,

01:23 seeing all the places I used to visit and used to frequent. So yeah, it was a great event. I enjoyed it a lot.

01:31 Yeah. Vancouver's a wonderful city, right?

01:34 It is just the perfect size. It's not as big and scary as New York City, but it's perfectly urban,

01:41 like with proper public transport, unlike San Francisco. So yeah, it's one of my favorite

01:46 places on earth.

01:47 Is that where you are now?

01:47 Yes. I am now in the Bay Area. So yeah, I miss Vancouver terribly.

01:52 Yeah. The Northwest is a nice place indeed. Before we get into our topic of the day,

01:58 the type annotations and all the amazing stuff and your story of how you sort of applied this on a

02:04 large code base, let's just get started with your story. How'd you get into programming in Python?

02:08 Programming is sort of a memory that I have from pretty early childhood, since I remember the day

02:15 when my dad like brought me a Commodore 64. It was supposed to be like a surprise Christmas present,

02:22 but like the package was big enough and like he just wasn't able to just hide it properly. So I knew

02:29 I'm going to get it. Like that was wow. Like I very vividly remember the moment when I sort of realized

02:35 I'm getting a computer and even if you want to play games at that point, like you had to type in a bunch

02:40 of commands. So it was very welcoming to just try out doing something more. And I was pretty,

02:47 pretty young at this point. I must've been like six or seven. So most of what I did was just retyping

02:54 programs that were published in like computer magazines at that point. But that sort of instilled

02:59 in me this realization that this is not some dark magic that like normal people cannot do. Like to

03:06 the contrary, I felt like if only I had enough time, I could type like, you know, make any program like the

03:12 war games movies or the artificial intelligence or whatever, like, you know, that's right, the whopper core.

03:17 Yes, exactly. Kids are very, I don't know, imaginative that way. It's gotten me quite a while to actually get into real programming. Like I went through Pascal, I went through Java and whatnot, you know, during college.

03:32 And in autumn 2004, I was studying computing science at the university in Poland, and I had trouble with some courses I took. There was a particularly hard line for me, which was linear algebra.

03:47 And like, I was, in fact, like, you know, scared that I would be just let go, like I would not be able to pass exams that were just coming.

03:56 So a friend showed me some scripts he wrote in Ruby, in some linear algebra library to check like whether the results of the exercises that we're doing were, were correct, right? Like, so that sort of helped you solving homework assignments.

04:10 So I badly needed some reassurance that so I got excited about this. But for some reason, for some random reason, the Ruby installer just refused to work on my Windows XP box, like it just crashed midway, couldn't install Ruby.

04:25 So as a test was scheduled, like, you know, the very next day or so, like I started looking like, you know, pretty sort of nervously for alternatives.

04:33 And I literally typed Ruby alternative in Google. And that's how I found Python. And this instilled cleanly, I quickly found a functional like lin algebra library.

04:45 That was, I think, that was, I think, before NumPy, because what I was using at this point was numerical. So either way, like, I got hooked. Like, that was such a departure from Java and Pascal that I that I've known before.

04:57 Oh, yeah, that's this really interesting story. Just like, all right, forget it. Ruby's not working. What else? I got to take a test. And then, you know, you're like, wait a minute, this is pretty cool. This stuff. I'm pretty sure that must have been before NumPy.

05:08 Yeah, that's cool. Yeah. So one of the things that I know you didn't say that you were learning math for programming per se, but a lot of times people feel like, if you are a programmer, you have to know a lot of math.

05:21 They like, it's really interesting how little programming or how much little math we do as programmers, even though they're very similar, right?

05:31 Yeah. So that depends really on like, your particular niche and programming that you're into. Like, I was always coming from this background of like, building Legos, essentially, right?

05:45 So composing smaller pieces into bigger, like tools, right? So yes, like, I'm not one of those guys who come up with new, exciting algorithms that are like, are way more efficient than something else that was done before.

05:59 I'm mostly a reuser of things that were invented by other people. But in this sense, the complexity is something that like, I enjoy more than people that, you know, sort of program on, like in a notebook, essentially, right?

06:12 It's a very different way of programming. And I'm with you, I like sort of thinking of the big architecture, how all the services and database stuff fit together. It's really fun. So speaking of which, you work on a moderately large project right now, right? What do you do day to day?

06:26 I work for Facebook. When I started working there, like, almost four and a half years ago, what I did was I was a production engineer, which is essentially like a person managing complexity and making sure that like all the software written by, I don't want to say naive, but like the sort of feature focused software engineers actually runs at the scale that we needed to.

06:49 So I started with the cache infrastructure, then I moved to sort of automated remediation of alarms that we're getting and whatnot. But all of this was using Python. So with my core development background, I was always just like, you know, putting my nose into other people's problems to sort of try to help them like out with, you know, Python issues, especially if you know, you heard that Python sucks for some reason, you know, like,

07:14 try to not take it personally. But if like, somebody really is commenting negatively on something that you personally worked on, then you want to know why, right? So I was slowly sort of making this my job. And two years ago, we actually got a team formed around this idea. And now I lead this team, like as a tech lead, it's called Python Foundation. And it's essentially managing the runtime for both Facebook and Instagram.

07:40 That is such a cool thing. That sounds like such a fun job.

07:43 What we're doing is we're trying to actually make Python, which is one of the most popular languages at Facebook, probably in the top three to like, really feel like a first class language. So to do this, you cannot really stick to a ancient version of it, right? That's going to be actually end of life in two years. So one of the core sort of, I don't know, like, missions behind this team is to move the entire company off of Python, too, that we would like.

08:13 Sort of means we would like to move everything to Python three, but also like, if people move to other tools that sprung in the meantime, that work better, like we're fine with this, like, but we really want to get rid of Python two, as a thing, we managed to help Instagram move to Python three, which I think is a pretty big deal. It's over a million lines of code.

08:35 And we did this in the year of the biggest growth for Instagram, both user-wise and feature-wise.

08:42 This is when stories were launched, when ranked feed was launched.

08:47 And we didn't have any incidents.

08:49 We didn't go down while doing it.

08:52 So I think, if anything, that should be a very big reassurance to other people struggling with this now

08:59 that it is, in fact, possible to do.

09:01 And it is worthwhile.

09:02 Is this the same basic story that was told at the keynote at PyCon 2017 around Instagram upgrading?

09:10 Yes, I was actually in the team that worked on this.

09:13 So when the keynote was prepared, I was one of the reviewers of the keynote when Lisa and Hui were practicing runs for it.

09:22 So yeah, I've known the keynote almost by heart before it went live at PyCon.

09:28 And yeah, it was also a pretty personal story for us because we spent a pretty significant time and effort on it.

09:35 But it was totally worth it.

09:36 Oh, it was such an inspiring story.

09:38 And when I think of other companies and other people saying, well, my project's too complicated to move to Python 3.

09:44 We can't possibly do this.

09:45 It's too much effort.

09:47 I look at what they did on a single branch, switching Django versions majorly and switching Python versions.

09:56 That was just incredible.

09:57 It was awesome.

09:57 Yeah.

09:58 Pretty much, you just need to know what you want to do and how.

10:02 And I think the key to success there was the process that we took.

10:06 So pretty much, if you just have an idea to move to Python 3 and you just start randomly stabbing at it, you are more likely to fail.

10:14 But if you actually figure out, how are we going to do this?

10:17 Then it's a tractable problem.

10:20 You can absolutely pull it off.

10:22 And now with enough projects actually going through this transition, there is a lot of resources.

10:27 I like the keynote that you mentioned that other people can just sort of address and listen to, watch, to see what were the processes that actually worked out for a big project.

10:40 Yeah.

10:40 Could you maybe just give people the really high-level steps they went through?

10:44 So things like kept it on Python 2 in production, but started doing the testing in 2 and 3.

10:50 I think the steps were really nice.

10:52 Do you remember them?

10:52 We had to realize that we need to make all the code work on Python 2 and Python 3 at the same time.

10:59 Right?

11:00 So we embraced 6 as a library to actually write polyglot code inside Distillery, which is the big Instagram backend repository.

11:09 Once we had this, we pretty much had to start testing.

11:13 And to start testing, we whitelisted a small amount of unit tests that we knew are passing on Python 3.

11:22 And gradually, we were just extending this whitelist when those modules that were being unit tested were made compatible with both Python 2 and Python 3.

11:32 We're gradually just extending this whitelist.

11:35 What happened at some point is that this whitelist was big enough that we could switch to a blacklist.

11:39 So there were just some struggling, tricky places that we needed to address.

11:45 What I missed is, yeah, before we even started, we had to upgrade to a newer Django version because the one that we were using was written long before anybody thought of Python 3 compatibility.

11:56 So that was like a pre-requirement.

11:59 Fortunately, there was a version that supported Python 3.

12:03 And most of the other dependencies that we had were also already ported by the time we started doing our internal transition.

12:10 So pretty much the unit tests were important.

12:13 Once those unit tests were in good enough shape, we started pretty much running a Python 3 version on developer boxes.

12:22 So Instagram and I think most of Facebook doesn't really work on the laptops that you're getting.

12:27 Everybody is working on their assigned developer server that they have somewhere.

12:32 Mine is, for example, in North Carolina.

12:34 So pretty far from where I am.

12:36 But it doesn't really matter.

12:37 You just work on it.

12:38 It's sort of your computer.

12:39 You're working on a terminal anyway.

12:41 It doesn't matter where that console actually is.

12:44 So we switched people to just run on Python 3.

12:47 We did notify them that this is happening.

12:50 But for them, it should be.

12:52 It should be a no-op.

12:54 It should just work.

12:55 Obviously, it didn't for all of the cases.

12:57 But that's the thing.

12:59 If you treat Python 3 incompatibility as a bug that needs to approach to actually fixing it is quite different

13:05 than when you're just seeing it as an intractable problem that is unlikely to ever work.

13:12 Then you just complain and you sort of throw your hands up in the air and say, you know, I don't know what to do.

13:18 So when we said, like, hey, any bug that you see, we just need to fix it.

13:22 By then, we had a pretty extensive wiki page on, like, typical issues, how to solve those.

13:27 So once we were comfortable running the entire app on developer boxes on Python 3, we started shadow testing.

13:34 So with this sort of fleet that we have in production for Instagram, obviously, you can do A-B tests.

13:41 You can release stuff in one cluster of machines and not the other and whatnot.

13:45 And that was important.

13:47 But even if you have a smaller scale deployment, I think, like, you should never just release on a single server.

13:53 So you have some sort of load balancing.

13:57 You have some sort of way of, you know, releasing gradually.

14:01 So we just started minimally releasing Python 3 and seeing what happens, right?

14:05 So we saw some, like, tragic performance regressions.

14:09 And later we found out that either it was some library that was very poorly ported or something stupid that we did.

14:16 Or actually, like, actual problems.

14:18 Like, you know, because Python 3 behaves differently and we have to switch to do something a bit different.

14:23 And in the end, we cut down the memory usage of our Python processes by one third and cut down the CPU usage by 12%.

14:33 So for just switching a Python version, I think, like, in our scale, like, that was a worthwhile investment right there.

14:41 That's a really cool story.

14:42 And I think it definitely serves as a cool roadmap for people going forward.

14:46 And it's only going to get better, right?

14:48 Like, these new web frameworks and API frameworks are, there's a lot of powerful Python 3 only ones.

14:54 I'm thinking API star and some of the async enabled web frameworks that are only accessible to these newer platforms.

15:03 Actually, Instagram is now, like, looking at, like, marrying Django with async.io in ways where we can utilize too.

15:11 Like, this is not an easy problem to do because Django is just built around the idea of a single process for a request.

15:19 Whereas async.io is totally the opposite, using coroutines to concurrently serve many requests from.

15:24 But in the end, I think this is going to be a transition that is sort of radical because async.io is viral.

15:32 In the sense that to actually use it, you have to pretty much just give up using any blocking APIs.

15:38 Like, if you want to have native async.io, you have to switch to using coroutines and non-blocking APIs everywhere.

15:45 But the alternative is really just switching to Go or Rust or whatever else.

15:50 And this is what many teams are pondering, right?

15:53 Like, we actually want to have better performance.

15:56 Python doesn't give it to us.

15:58 So let's just switch to a totally different infra, to a totally different language that has a different set of compromises that, you know, they might not even fully realize before they start using a language fully.

16:10 So I think, you know, instead of burning all the bridges, we can burn some of the bridges and switch to async.io to actually enable performance that we haven't really seen much in Python.

16:23 And that was obviously twisted, but like it was a pretty sort of separate community for the longest time.

16:29 And I hope async.io is going to be more mainstream than that.

16:32 For sure.

16:32 And a lot of the libraries that are just out there, the packages, if, you know, it's very likely they standardize on async and await.

16:39 And so they would just plug into these existing things, right?

16:42 Like, it's definitely exciting.

16:43 And final question on this before we get to the official topic.

16:45 This is really, really interesting.

16:47 You mentioned being radical.

16:48 Do you happen to work with Jason Fried as well?

16:51 Is he on your team?

16:52 He gave such a great talk called Rules for Radicals, Changing the Culture of Python at Facebook.

16:58 He was actually one of the people that made me sort of stand up and try to make the Python situation better at Facebook.

17:07 He's been at the company two years longer than me.

17:10 So, like, he was always one of those guys that we were just working with, like, as a grassroots movement before I had a team that does this full time.

17:20 Which is why when my director asked me, like, who would you see on your team?

17:25 Like, Jason Fried was, like, the first name that I gave him.

17:27 Like, that's the right person to do this job.

17:30 He's just enough, like, of a mixture of, you know, being rational and, like, a fan of the language.

17:38 So I think, like, you have to be invested.

17:40 But at the same time, you have to recognize the limitations, right?

17:43 Behind the tech.

17:44 He just, like, hits the right sweet spot.

17:47 Yeah, that's cool.

17:48 Yeah.

17:48 So the reason I bring that up is I think his talk, which I'll link to, The Rules for Radicals, about changing the mindset to wanting Python 3, to making it sort of the default behavior within a large organization.

18:00 And then the keynote from Instagram is the concrete steps that you take to actually make that happen.

18:06 I think put those two together in any organization that's on Python 2 could pretty much find a roadmap there.

18:12 Yeah, we hope so.

18:12 Yeah, yeah, I do as well.

18:14 All right.

18:15 So speaking of Python 3, one of the really cool features, what was this, 3.5 when it came out, the type annotations?

18:22 Uh-huh.

18:22 All right.

18:22 So this is PEP484, right?

18:24 Tell people what, for those who don't know, what it is.

18:27 Starting with Python 3.0, we had a feature, a syntactic feature to apply annotations to function arguments and return values.

18:38 It was always envisioned by Guido to be fundament to build static typing for Python.

18:44 However, at the time, it was very unclear what that meant.

18:48 So he pretty much left this as an exercise to the reader to come up with a sensible syntax and a type checker for Python.

18:58 And there were a few toy attempts at this, but fundamentally nothing caught on.

19:05 And there was no big investment there.

19:08 Yeah, like you could do like a docstring type thing.

19:11 I mean, there's a couple of ways.

19:12 You could do like a type colon, and it works on some tooling and not others, right?

19:17 At this point, the annotation syntax was Python 3 only.

19:21 So yes, like people wanting to sort of formalize argument types for any reason were using docstrings or whatnot.

19:28 But we actually discovered that in this case, the docstrings were pretty much sort of best effort.

19:35 They were mostly meant for human readers and not machines to check.

19:39 So very often they were out of date or incomplete or didn't even quite parse because the syntax was correct.

19:47 And like all of this caused there to be some sort of, I don't know, like the adoption was very low of this idea.

19:54 So PyCharm had its own syntax for this.

19:58 Sphinx, the documentation generator, had some sort of syntax that it accepted.

20:02 Doxygen and other systems had its own again.

20:06 But none of these really settled on using the annotations that Python 3 had.

20:12 What didn't help is also like before Python 3.5, like the adoption of Python 3 was super low, right?

20:19 So nobody actually was using Python 3.

20:21 So nobody was thinking about using function annotations.

20:24 That pretty much meant your program is now Python 3 only.

20:27 And in this time, that was like around 2013, 2014.

20:32 Like that was a very radical idea.

20:35 Like most people would just not be ready for this yet.

20:38 So we were thinking about this and I, in particular, after joining Facebook, saw how much this changed the culture of like PHP to hack.

20:47 And how type annotations really made the code base so much better at Facebook.

20:52 We even extended this to JavaScript with Flow.

20:55 So for me, like having this syntactic feature that is not utilized in any way was just a call to action, right?

21:03 So when I found out that actually Guido is interested in pursuing this, I reached out to him and I drafted like Pep484.

21:12 And then we started working on this like more heavily, actually basing on work by Yuka Leto Stalo, who wrote a prototype Python interpreter that then became my Python type checker.

21:26 Yeah.

21:26 And until this day, that's like, that's the type checker that we're using for this.

21:30 So that's like this sort of ancient history in the project.

21:34 Like I visited Guido, like, you know, working on Dropbox at the time.

21:38 He visited the Facebook campus.

21:40 It's, that was like, I guess, four years ago or more now.

21:45 Yeah.

21:45 So we started actually filling this gap.

21:48 This portion of Talk Python To Me is brought to you by Linode.

21:51 Are you looking for bulletproof hosting that's fast, simple, and incredibly affordable?

21:55 Look past that bookstore and check out Linode at talkpython.fm/Linode.

22:00 That's L-I-N-O-D-E.

22:02 Plans start at just $5 a month for a dedicated server with a gig of RAM.

22:07 They have 10 data centers across the globe.

22:09 So no matter where you are, there's a data center near you.

22:12 Whether you want to run your Python web app, host a private Git server, or file server,

22:17 you'll get native SSDs on all the machines, a newly upgraded 200 gigabit network,

22:23 24-7 friendly support, even on holidays, and a seven-day money-back guarantee.

22:27 Do you need a little help with your infrastructure?

22:29 They even offer professional services to help you get started with architecture,

22:34 migrations, and more.

22:35 Get a dedicated server for free for the next four months.

22:39 Just visit talkpython.fm/Linode.

22:41 The main goal behind this was always to provide annotations to people, right?

22:49 So that this sort of semi-formal docstring syntax or whatnot, we wanted to put it in the right

22:55 place, and annotations is just the right place for it.

22:58 Yeah, it definitely is.

22:59 One of the things that I don't like about the docstring style is if you have four arguments,

23:04 the docstring becomes eight lines.

23:08 It's got the name and then the type.

23:10 It's just, it gets, it gets really long.

23:12 And if you've got like a function that is three lines long and you put this huge docstring in

23:18 it just so you can see the type, it's like you've almost made it less readable.

23:22 It's a real big trade-off at that point anyway, to put that extra stuff in there.

23:27 Whereas if it's just a little bit, you know, a colon int, colon stir type of thing at the end

23:32 of your variables, it's much more compact.

23:34 I agree with you.

23:35 So that's one concern.

23:37 But like the bigger concern is really just like most comments in your codebase are going

23:42 to be wrong.

23:43 And like you're going to be essentially lies after some time.

23:46 So it doesn't actually take very long to those docstring based types to get out of date with

23:53 just like small changes to your codebase, small diffs that like people need to fix an issue

23:59 or introduce a small feature.

24:00 So for us, the human factor was very important, but also without the help of technology to tell

24:08 us that, hey, this annotation is out of date now, we would know that like those annotations

24:13 are bound to get, you know, useless after some point.

24:16 Yeah.

24:17 The worst than useless because you would trust them maybe if they're wrong.

24:20 They're misleading, right?

24:21 Yes.

24:22 That's very true.

24:23 Yeah.

24:24 So I think maybe one place we should start the conversation is like, what's the real benefit

24:29 of these type annotations on, you know, sufficiently small projects?

24:33 Maybe you don't need them.

24:35 I do find them to be really helpful at certain parts of my code just to help editors and things

24:40 like that.

24:41 But you had this really nice example of a function.

24:44 It was called like process all.

24:45 It took an items and it just said for I in items, item dot children dot process calls a function

24:50 line.

24:50 Like, even though there's only three lines, it's Pepe compliant.

24:54 It's completely, it's nearly impossible to make sense of what it's doing, right?

24:59 Yes.

24:59 So like Python programmers really like to have concise code, right?

25:04 And like that concise code pretty often uses very generic names for variables and methods.

25:12 Like when you do, when your method is called process and then you grab for it in your big project,

25:17 you're going to find that there's maybe 58 of them.

25:20 And actually figuring out what is exactly called argument that you passed that is just called

25:25 items actually is like you can sometimes get it, you know, from context.

25:29 You can sort of assume what it is, but you can't never be sure.

25:33 Yeah.

25:34 You might do throw like a print type of items, print type of I and just you're like, all right,

25:41 what the heck?

25:41 I just going to do some print statements.

25:42 Like this is out of control.

25:43 What is this thing?

25:44 Right?

25:45 Yes, totally.

25:45 So those very generic functions tend to be misleading, right?

25:50 Because like different people are reading this code and you are a different person six months from now.

25:56 So even if you wrote this function, you might be misled by what you wrote some time ago.

26:01 So this is one of the fundamental problems with having a very dynamic language.

26:07 People sometimes say that it doesn't matter for a small project, but even a small project just gets out of your head after some time.

26:15 Right?

26:15 And when you're coming back to it, like fixing a pull request, like, you know, that somebody gave you as a puppy, right?

26:22 As a gift.

26:23 So like you need to actually put all this information back in your head.

26:27 And when you have to follow how the types actually work in your project, you have less space to actually review the change.

26:35 So fundamentally, like function annotations, type annotations are just a way to cut this short so that you don't really have to keep the entire program in your head to make informed decisions about what you're doing with your function.

26:49 That's interesting.

26:50 It's like a form of distributed cognition.

26:52 Like more of the thinking is like stored in on the page and leaves more space for algorithms and consequences.

26:58 Yeah.

26:59 Yeah.

27:00 Yeah.

27:00 Pretty cool.

27:01 So maybe let's talk about first where these appear, like where they're useful.

27:06 So they're, they're useful in editors.

27:08 They're useful in continuous integration.

27:10 They're useful in upgrading.

27:12 Like what are some of the tools around all of this?

27:14 Like, for example, if I want to do, do a check to make sure my code is hanging together, you mentioned mypy.

27:20 That's pretty much the primary tool, right?

27:22 This is not the only tool, but it's the sort of, I don't know, like all but the official type checker for Python by now.

27:29 It's a Python organization on GitHub.

27:31 It has the most, I guess, manpower behind it.

27:35 It's the most mature.

27:37 So pretty much everything standardizes around it, but it's not the only one.

27:41 So the point of PEP 484 was not to sort of create a small walled garden of a single technology.

27:47 It was more of a standardized syntax so that any piece of technology that wants to do typing can share.

27:52 And we share it with, for example, PyCharm, which is the most advanced IDE that we have for Python.

27:58 It does implement its own form of type checking that is using exactly this syntax, is sharing the annotations for third parties and the standard library that we keep in the typesheet project.

28:10 It is kept separate from mypy just for this reason so that other projects can use it.

28:14 There's a project by Google which generates types by inferring what your code base actually does, which is called PyType.

28:23 And again, that uses exactly the same syntax that we formed with PEP 484.

28:29 So there's a number of projects that sort of revolve around typing.

28:34 But as far as type checking goes, mypy is the go-to type checker that we have for Python.

28:40 That's interesting.

28:40 I didn't realize it was so baked into PyCharm through TypeShed and stuff like that.

28:45 So we'll talk about TypeShed.

28:47 That's pretty cool.

28:48 Yeah, so I find that it's useful for adding to your program to add another level of check at certain levels, perhaps, right?

28:59 As you cross, say, to a data layer in and out of a data layer, for example.

29:02 That's pretty helpful.

29:04 The continuous integration is really important.

29:06 Yeah.

29:06 Do you know what editors support it?

29:08 I know PyCharm does, but I don't know what other editors take this into an account.

29:12 mypy as a type checker was formed around the idea that it's an almost full-fledged Python interpreter.

29:18 So it analyzes the entire program.

29:21 And for the longest time, it actually had to spend the time to analyze everything, which takes time.

29:27 It is also written in Python.

29:29 So it is not the most efficient thing that we could actually come up with.

29:34 But it was very important that we can move it very fast because we were only really learning about all the edge cases of the Python type system when we were working on PEP484 and later on.

29:47 So it is not the greatest technology to use within an editor, right?

29:52 Which is why PyCharm really implements its own thing.

29:55 If you are using an editor, you really want to type command space to tell you what available methods you have, right?

30:01 And you don't want to wait 30 seconds for an answer there.

30:04 You really need something right away.

30:06 Same for just telling you whether you have any type errors, right?

30:11 You would like the curly red line to appear right away as you're typing something wrong and not after three minutes.

30:19 So that way, mypy was only being gradually made compatible with this use case.

30:25 I don't think it's there yet, but there are features being implemented towards this goal.

30:31 There was an incremental mode introduced at some point where modules that were analyzed for type information were kept in a database,

30:40 like essentially a bunch of JSON files so that we didn't have to analyze them again if they didn't change.

30:46 Now, like there's a mode introduced to mypy where it's going to live as a demon that is running on your process.

30:53 So you don't even have to restart Python, which on its own, like, you know, just starting up the entire mypy type checker takes around a second.

31:00 So just cutting down on this is already a win.

31:03 And then reading all the types.

31:05 Could it do like a continuous analysis and just watch all the files and just like in the background, analyze it and periodically report its output or something like this?

31:16 That would be the point.

31:17 Like how that exactly works.

31:19 I'm not sure yet.

31:20 Like we are not, in fact, using this feature yet.

31:23 We've only recently adopted incremental mode, which was pretty experimental and at times unstable.

31:29 So mypy is a living project, right?

31:31 It's being actively developed by a group of, I think, four full-time developers at Dropbox and a bunch of volunteers around the project from outside of Dropbox.

31:42 So this feature is pretty much like very new.

31:46 I do hope it's going to work like you're describing, since this is exactly what then enables a language server protocol in, say, Atom or Visual Studio Code to actually, like, talk to the type checker and get typing information right away.

32:00 What we have today is there is a flakied plugin that I wrote that is a sort of basic version of mypy, which is let me just tell how am I doing file.

32:13 What does that mean?

32:14 Like, if you're doing full-time analysis and you're importing stuff from different files, you're going to know their types.

32:20 You're going to be able to, like, tell whether you're using an API right or wrong, regardless of the file it's in.

32:28 But that requires this full-time analysis that takes sometimes minutes, right?

32:33 Yeah.

32:33 So instead, we can run mypy in a special mode that just says, assume every import is fine, like, whatever I'm importing, I'm using it correctly.

32:40 But just look at my functions in the file that I'm editing right now.

32:45 And it turns out that this can be done around a second, like, as long as mypy's process is starting, pretty much.

32:52 And because of TypeShed, which is our collection of types for the standard library and a bunch of third parties, we can still provide very meaningful information about how you're misusing some built-in type or some built-in library, right?

33:07 So, for example, like, the example I always give is newbies very often confuse sorted and the sort method on a list, right?

33:16 Like, they would think that the sort method also returns something, but it doesn't.

33:20 This very simple plugin for Flacate will already tell you that, hey, you meant to actually use sorted and not use that sort because that sort's in place and it doesn't return anything.

33:31 Yeah.

33:31 And those are the kinds of things that can be really helpful to get picked up there.

33:35 So let's take a moment and talk just about the syntax real briefly.

33:40 So there's a real simple version.

33:42 Like I say, variable, then colon, then the type.

33:44 So I could say ID colon int or name colon stir, right?

33:50 And that's totally straightforward.

33:52 But as soon as it gets a little more interesting, you actually have to bring in the typing module.

33:57 So like if I want and maybe I'm going to return a user or it might be empty.

34:01 It might be none because there's no user at that ID, right?

34:04 So you might have an optional user.

34:06 You might have a list of optional strings.

34:08 Like these are pretty interesting.

34:11 So do you want to talk a bit about the typing module?

34:13 Obviously, go beyond just simple classes.

34:16 So as annotations, you can use any built-in type, any user-defined class.

34:21 But beyond this, you start having like complex types.

34:25 Like you mentioned optionals, which is this actually is usually an int, but maybe it's none.

34:32 Maybe the user just didn't provide it at all.

34:34 Or maybe it's bytes and maybe it's a string, right?

34:38 So you want essentially what we call a union of multiple types.

34:43 You can have other things like I want a list, but I want to specifically tell you that this is a list that holds just strings.

34:51 So these are collections with generics.

34:54 And the built-in collections in Python don't support generics because the runtime doesn't really work like this, right?

35:01 Compared to statically typed languages, Python really implements classes as just factories of objects.

35:08 Those objects just have attributes on them.

35:11 And as long as you are calling the right attributes, like calling the right methods, everything is fine.

35:17 And the runtime doesn't really care what particular type an object has.

35:23 So if you want to actually have this as a feature of the type system, then, well, we had to create our own versions of the built-in collections.

35:34 That includes ABCs.

35:36 That includes things like all the things in collections, like order dictionaries, like name tuples.

35:42 Everything that essentially you can instantiate in the standard library, there is a generic variant of it.

35:48 So for this reason, we have the typing module that you import those complex types from.

35:54 There's many other complex types, like any, which essentially tells the type checker that I don't really know what this is.

36:02 Obviously, any is a name that describes sort of your state of knowledge.

36:07 It doesn't really say that any type is going to be fine, right?

36:09 It just says that as far as I know, like whatever is passed should be okay.

36:14 That's pretty much a way to silence the type checker.

36:17 So this special any type is also in the typing module.

36:20 This portion of Talk Python To Me is brought to you by us.

36:25 As many of you know, I have a growing set of courses to help you go from Python beginner to novice to Python expert.

36:31 And there are many more courses in the works.

36:33 So please consider Talk Python training for you and your team's training needs.

36:38 If you're just getting started, I've built a course to teach you Python the way professional developers learn by building applications.

36:44 Check out my Python jumpstart by building 10 apps at talkpython.fm/course.

36:49 Are you looking to start adding services to your app?

36:52 Try my brand new consuming HTTP services in Python.

36:55 You'll learn to work with RESTful HTTP services as well as SOAP, JSON, and XML data formats.

37:01 Do you want to launch an online business?

37:03 Well, Matt McKay and I built an entrepreneur's playbook with Python for Entrepreneurs.

37:07 This 16-hour course will teach you everything you need to launch your web-based business with Python.

37:12 And finally, there's a couple of new course announcements coming really soon.

37:16 So if you don't already have an account, be sure to create one at training.talkpython.fm to get notified.

37:21 And for all of you who have bought my courses, thank you so much.

37:25 It really, really helps support the show.

37:28 So there's a number of features there.

37:30 So whenever you need a situation like a union, like an optional type, like generics or whatnot, you would use this.

37:36 Generics are special because sometimes you really want to say, for example,

37:41 I don't care.

37:42 I'm taking it as an argument and I'm returning the same type.

37:45 Yeah, that one was surprising to me.

37:47 That one was surprising to me because I hadn't seen that before.

37:51 Like I could, I know you could have like a concrete generic, like a list of strings would sort of specify like it is a list and its internal type is this.

38:00 But to say it takes a list of T and it returns a T like that.

38:06 That was a pretty unexpected thing I saw coming out of the typing module.

38:09 That's cool.

38:10 That's pretty much like a very basic version of sort of templating for Python.

38:15 But it's fundamentally very often used, right?

38:19 So like it's very often that you would have a function that operates on a collection.

38:24 And I don't know, like returns the first truthy value of it or whatnot.

38:28 And like just typing this would be impossible without a type variable.

38:34 So this is where they come in useful.

38:35 There's a number of other more advanced features.

38:38 So they're like all documented in the on docs python.org.

38:41 But essentially the necessity for the typing module comes from the fact that there's more to types than just simple classes.

38:50 One of the things that also surprised me when I first started using these, it was because I was getting an error when I had a method that I said return.

38:58 Like let's say a user and I was returning none when the user wasn't found or if the ID was improperly specified.

39:03 And I was getting an error saying you can't return none when you say you return a user.

39:08 And then I realized eventually you have to do optional user if you're going to have none.

39:12 And most languages don't distinguish between a pointer type and whether it's nullable or not.

39:20 Maybe some of them do for value types.

39:22 The one that I do know that does that is Swift.

39:25 So what was the thinking around this concept of just actually making it explicit that you have to say it's not only this type, but we guarantee it's not or at least we proclaim it.

39:35 It's not none.

39:36 It actually points to a real object.

39:38 I personally knew about two languages that approach this problem like from the opposite ends.

39:46 So Java, for example, doesn't type check for null and like null pointer exceptions are sort of the bane of existence of a Java programmer, right?

39:55 Because this word compiler is not really helpful.

39:58 You need to figure it out on your own.

40:00 And the opposite thing was hack, which is the typed PHP version used at Facebook, actually has this concept.

40:07 And it turns out that that is the most popular class of errors found by the type checker, where the user of a function doesn't expect it can ever return null, but for some reason it does.

40:20 So it was very natural to me to introduce this for Python, especially that with the logging information we gather from running Instagram and other systems in Python at Facebook, we knew that like attribute error, none type doesn't have some attribute is a very, very popular exception, right?

40:40 That stems from the fact that sometimes, I don't know, an API call doesn't work or some helpful function tries to just not raise an exception.

40:49 Actually, raising meaningful exceptions is the Pythonic way to do this, right?

40:54 Like if you are unable to fetch a user, like raising a lookup error, like is the more natural thing.

41:02 Like it's going to read better when somebody is faced with this sort of problem.

41:05 It is what all the internals of Python do itself, right?

41:09 So like we have dictionaries doing exactly this and so on and so on.

41:14 So this is sort of what the typing gently nudges you to do because putting any sort of type union, including optional, which is essentially a union of your type and a none,

41:26 as a return value actually makes using your function so much more difficult, right?

41:32 Any user of your function now has to check whether the return value of your function was none or not.

41:39 And this is pretty painful pretty fast, especially if the situation in which your function can return none is very unlikely.

41:47 People are going to complain very loudly that you make me check for the stupid non-value.

41:52 I know that it will never be non-introduction, right?

41:55 It can never be non-only with the mock-up database or whatever.

41:59 So that actually makes you think maybe I should not change the API so that none doesn't appear there at all.

42:06 And I think like sort of ironically, the very verbose nature of the optional type sort of makes you think twice.

42:14 Yeah, it definitely does make it long.

42:17 And if it's for parameters that you have many of them, it gets even worse.

42:20 Yeah, yeah, that's interesting.

42:22 So you talked about finding errors in production and stuff.

42:25 And at your presentation, you spoke about Instagram and the sort of success story that you guys had in terms of actual runtime errors.

42:35 And maybe they're unexpected.

42:37 Yes.

42:37 The unexpected results you got.

42:39 Could you maybe cover that real quick?

42:40 When you're adopting types, you want to sort of see the value that they give you, right?

42:45 You want to recognize whether it was worth the time.

42:49 So first of all, like we as the authors of PEP484 believe that putting type information, even if you're not doing anything else with it, it's already worthwhile because it's a form of documentation.

42:59 But with additional type checking, like you want to see that actually there is like a change in the number of errors that you see in production.

43:08 Right.

43:08 You'd like to see runtime errors become CI, like continuous integration errors instead or even before then, right?

43:16 Yes.

43:16 So you're looking for some sort of metrics that you can look at to like prove that this entire effort makes sense.

43:23 So the simplest thing that you can do, obviously, is just like track the adoption, right?

43:28 Like what is your adoption?

43:29 So we obviously did that.

43:30 And now Instagram is close to 30% typed functions.

43:35 So pretty much at this point where we already see a lot of value, this is not something that you see from day one.

43:41 If you just type a bunch of functions, you're going to maybe find a bit of problems in those particular functions.

43:47 But for the typical Instagram developer, for the typical engineer, they will not really see how that changes their life.

43:55 But as soon as like you're, I don't know, north of 10% of functions, random people start noticing type errors that the type checker tells them about before they shipped something.

44:06 So a metric that I was very interested in is how is this going to affect the, I don't know, average number of attribute errors and type errors that happen in production, right?

44:18 And how many, like, are we going to see fewer exceptions?

44:21 Right.

44:22 And those two that you named, those would be the types you run into when you assume there's one type, but it's actually the other.

44:28 You thought it was a list, but it's a dictionary or something like that.

44:31 Yes.

44:31 So types, in a special case of the attribute error, is mostly the non-type, right?

44:35 When you try to do something with a non-type that it's not prepared to do.

44:39 So we wanted to see whether there's going to be fewer exceptions after adopting typing.

44:44 And sadly, like, this correlation was just not seen.

44:48 Like, we couldn't really detect that this is very easy to, like, prove that, oh, typing helped us with lowering the floor of exceptions at runtime.

44:58 But what I didn't personally notice, and Carl Meyer, who is pretty much spearheading the typing effort at Instagram, he noticed that, yes, it's not about the sort of floor of exceptions that pretty much describe mostly very unlikely scenarios that happen for an unlucky user of Instagram.

45:17 It's more about shipping a bad change.

45:23 And now that number is almost 10 times lower.

45:34 So it's 10 times less likely that you're going to ship some bad diff to production that introduces a type error than it was before.

45:43 And this is a metric that was sort of hard for me to notice from just looking at graphs in a linear fashion.

45:51 But in fact, yeah, that pretty much proves that this actually impacts Q-ality in exactly the right way.

45:58 Oh, yeah, that's a really interesting one.

45:59 Those are the releases that you're like, oh, no, it's crashing.

46:03 And you're just, like, freaked out.

46:05 And you're scrambling to roll it back.

46:07 And those are the worst kinds of errors.

46:09 Not the ones that happen in one in a million, but the one where it happens one for one.

46:13 Yep.

46:14 Yeah, pretty interesting.

46:15 Pretty interesting.

46:15 So let's talk for a moment about this concept of gradual typing.

46:20 I mean, for example, you said you guys are really successful and you've got 30% of million lines of code.

46:27 The functions there have typing or something to that effect.

46:30 And I find this in my code as well.

46:32 Like, I love type annotations, but I don't annotate everything.

46:36 There's, like, a core set of functionality.

46:38 Like, this is really what I want to annotate.

46:40 This is really important that this is clear.

46:42 But this other part, it can kind of just derive the benefits from having the other stuff really stable.

46:48 So you want to talk about the rules of, like, gradual typing and, like, maybe how the order actually affects what is caught and what is not?

46:56 That's surprising.

46:57 There is actually a separate PEP that describes how gradual typing works.

47:02 That is, I guess, 483 that describes this.

47:06 So the reason for this is that in a language like Python, where we are using runtime objects without looking at their types to validate whether they're correct or not,

47:18 putting this feature out essentially means you're going to start with large projects that have never even thought about this feature.

47:26 So the concept of gradually exposing your code base to static typing was not something that we wanted to do.

47:34 It's something that we had to do.

47:36 There was just no other way around it.

47:38 Fortunately, many other languages like JavaScript with TypeScript or, like, Flow or Hack or whatnot, like, went exactly through the same path.

47:47 I think the reason is because the primary driving factor must be if you want to bring in, like, let's say TypeScript.

47:53 If you want to bring in the other JavaScript libraries, if you forced everything to be 100% typed, you would close off the entire ecosystem.

48:01 And same thing with Python, right?

48:02 Like, you want to use all the packages on PyPI and other pieces that are going to lag behind, right?

48:08 So you can't say everything has to be typed or this just doesn't work.

48:11 Yes.

48:12 But even if that were your dream, if you wanted to actually type the world, you have to start somewhere, right?

48:17 And if you cannot reap the benefits until everything is fully typed, then pretty much, like, the feature is useless for the longest time.

48:27 And I think, like, people would get discouraged way quicker than they would see anything worthwhile from it.

48:32 So gradual typing essentially is this notion that you can slowly annotate function by function.

48:40 And by doing this, you're just increasing the footprint of typing and increase the sort of usability, usefulness of the project.

48:50 So the ordering there is important in one important, like, way.

48:57 So I would advise everybody to look at how their function call graph looks like in their program and start annotating from the functions that are mostly used, right?

49:09 Like, are very deep in the stack.

49:11 Like, everybody calls them, right?

49:13 The reason for it is that once a function like this is annotated, all users of it can be validated whether they are using this function correctly or not.

49:23 If you didn't annotate this very central function and went on and annotated a bunch of leaf functions, then you might not know whether they are correct or not.

49:34 And the reason why not is that as long as a function is not annotated, the type checker necessarily has to assume that anything is fine, right?

49:44 Like, any argument typed past is okay.

49:47 The function can also return any type from it.

49:50 So pretty much that means it's going to stay quiet regardless of what you're doing.

49:55 So if you are annotating your core function first, you're going to get the benefit of being warned about invalid usage way faster than if you would actually wait with this core functionality to the very end.

50:10 It gets even worse.

50:11 If you do this, then that might pretty much cause errors to appear on some functions that you didn't touch, right?

50:20 Like, you annotated a core function and suddenly you see 40 new errors from the type checker on functions that you don't even know about.

50:27 But these are the functions that were using what you just annotated and they were using it wrong.

50:33 So now you are faced with the problem, what am I supposed to do?

50:36 Like, am I supposed to fix all those 40 functions?

50:39 I didn't even know that we had functionality like this and now the type checker yells at me.

50:44 So the right ordering can save you a lot of time and a lot of stress with actually making the adoption smoother, right?

50:52 It's a good point.

50:53 You have a really nice graphic in your talk, which I'll put in the show notes, of course.

50:56 And it looks a little bit like the game called Whack-A-Mole.

50:59 You hit a thing, it pops up another play.

51:01 You know what I mean?

51:01 You fix one and then two more errors pop up.

51:05 You fix those.

51:06 One goes away, another pops up.

51:08 And it's sort of like as you add this type checking, the pieces that were just ignored before are now actually getting validated.

51:16 So it can be a little bit funky like that.

51:18 So like you said, I think starting at the right levels, the important functions, and then sort of slowly build your way out is pretty nice.

51:26 Yeah.

51:26 So there's ways to automate parts of this.

51:31 So the PyType project that I mentioned can infer type annotations from just looking at your project.

51:38 It is pretty Python 2-centric still, so it might not work on like the latest Python 3.6 features or whatnot.

51:44 Like that sort of your mileage may vary.

51:46 Always patches accepted.

51:48 But it can actually go a long way to create this initial body of annotations for your big project.

51:56 It does some sort of magic that you might or might not agree with, like figuring that, oh, you are using an append method.

52:04 Within your entire program, the only type that has an append method is a list.

52:08 So I guess what you're using here is a list.

52:11 All sorts of things like this.

52:13 So like this is what inference is all about.

52:15 But it's actually a very worthwhile project that sort of boosts adoption of types in new projects.

52:22 What you can also do is you can maybe gather those types at runtime.

52:27 At some point, I thought it was a crazy idea that I would just like slow everything down and it would never work on unit tests because you're mocking stuff.

52:35 So types are different.

52:37 You would also like have issues with types being returned as those massive unions of 50 things and whatnot.

52:44 So I had this pretty apocalyptic view of this, like that that would never work.

52:49 And usually when somebody says that it's impossible, somebody else that doesn't know this is going to just go ahead and implement this.

52:56 This is exactly what happened on Instagram.

52:58 Like we had Matt Page and Carl Meyer working on this project.

53:03 It's open source now.

53:04 It's called a monkey type that does exactly this.

53:07 It hooks to your program, records the types of arguments to functions.

53:11 It records the return types as well and then generates the typing stubs from what it gathered.

53:18 And you can apply those types back to your code base.

53:22 So that way you can pretty much just remove a lot of the work, like the initial work that has to be done.

53:30 And even though I envision garbage collected away, in fact, it turns out that most people don't actually use Python in crazy dynamic ways all the time because that's also very unreadable.

53:43 Right.

53:44 And Python is all about being runnable pseudocode.

53:47 It has to be readable.

53:49 So the types are, for the most part, very sane.

53:53 You can use them and pretty directly just apply them back and you're done.

53:57 We had a very big spike in typing adoption at the time where we started using monkey type since it's actually producing very high quality types.

54:08 Sometimes it's funny.

54:09 So sometimes it will tell you that this optional argument that has a default value of none has a type of none, which essentially means that you have some very special optional argument and nobody ever uses it.

54:22 In the entire code base, nobody ever actually populates this optional argument.

54:26 So you might as well just remove it.

54:29 All right.

54:29 So you could just get rid of it.

54:30 That's right.

54:31 How funny.

54:32 Yeah, there's a bunch of crazy ideas and those all do sound pretty interesting.

54:35 Another one has to do with like actual performance optimization.

54:39 Like actually going, no, this is a list.

54:42 And so we're going to do some kind of shortcut or something to that effect.

54:45 Yeah.

54:45 So originally, not only me, but I think Guido as well.

54:49 We thought that this is a dead end.

54:51 We're not going to be able to do anything useful there.

54:53 The reason for it was twofold.

54:55 Like, first of all, we saw that Python, its runtime doesn't actually utilize typing information at all.

55:03 It just tries to find attributes on your objects and does things with them.

55:08 And actually, the most performant Python runtime that we have, PyPy, is all about dynamically finding what are you doing.

55:15 And it's being able to find this in ways that are way more precise than type information that you put in will ever be.

55:24 Right.

55:25 Because very often, the types that you're going to describe is that I want to have an iterable of string.

55:31 I don't care what iterable that is.

55:33 It should be an iterable.

55:34 So that is not very useful for PyPy.

55:38 Right.

55:38 What would be useful is, yes, it's an iterable, but I run types or tuples.

55:42 Right.

55:43 So that way, like, it can actually put guards and jit things away and it becomes way faster.

55:50 So we were very negative, like, in terms of seeing value in this.

55:54 But this is exactly what Cython does.

55:56 Right.

55:57 And Cython can sometimes accelerate your function by, like, 20, 50 times, like, by knowing that, oh, this is only ever a string or this is only ever an int.

56:07 So I can maybe not even box it in a Py object and just do C-level computation that way.

56:14 So combining this information with an ahead-of-time compilation step is what is very interesting.

56:22 And I talked with Yuka Lechtostalo, the original author of mypy, about this idea during the last PyCon.

56:30 And he has a project that is sort of spearheading this for Python.

56:37 So I really do hope that by this PyCon, like, we'll probably hear from Dropbox that, hey, this actually works out.

56:44 This actually accelerates Python in this sort of automatic way.

56:48 So I don't know if you remember.

56:49 That would be awesome.

56:50 I don't know if you remember, but originally PyPy started out as, like, this crazy import that you just put in your project, like, import cycle.

56:58 And suddenly, like, your code became way faster.

57:02 You didn't know why.

57:03 So we might actually be back into this world where you don't maybe even have to perform any imports in some future Python version.

57:11 But we are actually going to attempt to do some ahead-of-time compilation for you.

57:16 And type information is going to be useful there.

57:18 That would actually be really, really interesting.

57:20 So I'm looking forward to that.

57:22 All right.

57:23 We're getting sort of short on our time.

57:24 So I want to just cover one more really quick thing.

57:27 And maybe just leave it there for the type annotations.

57:30 It's really awesome work.

57:32 And the more I use them, the more I like them.

57:33 But you have one other interesting piece of news to do with just Python more in general.

57:37 And you, right?

57:38 Yes.

57:38 So you were just chosen as the release manager for Python 3.8 and 3.9.

57:44 And 3.7 is coming really soon, right?

57:45 So you're on deck.

57:47 And you'll be up really quickly, right?

57:48 Yes.

57:49 So pretty much the development of Python 3.8 just started just yesterday.

57:57 So yes, it's going to be developed for the next 18 months, pretty much.

58:01 And Python 3.7 is in beta stage now.

58:05 What it means is we don't add new features to it anymore.

58:09 We're going to pretty much harden it now, like find all the possible bugs and problems with

58:14 whatever we implemented at this stage.

58:17 Release for betas.

58:19 Then release, hopefully, very nice release candidate that we can then bless as the gold

58:24 version.

58:24 If not, then there's going to be another release candidate.

58:27 And at some point, we're going to release Python 3.7.

58:30 It sounds like this is very close now.

58:33 But in fact, that is going to happen late June.

58:37 So the beta stage actually takes quite a bit of time.

58:40 But yeah, like this is how a mature project like Python operates.

58:46 So like with Python 3.8.

58:48 The beta stage and the later release candidates or whatever are going to happen after PyCon in

58:56 2019.

58:57 So this is going to be quite a while from now.

59:01 Unless we change how we do things, which I might sort of influence a bit.

59:05 Like this is the timeline for the Python project.

59:08 That sort of stability is good for the average programmer, right?

59:12 Because the average programmer doesn't want to have backwards incompatible changes all the

59:17 time.

59:17 Like he's not interested in some subtle new features all the time.

59:21 Like being able to run code that you wrote 10 years ago is a very important feature.

59:26 And I don't think like Python did the greatest job at this with the Python 2 and Python 3 dichotomy.

59:32 Like with a lot of smaller changes that end up being incompatibilities.

59:37 I'm always amazed how like Java was able to pull this off.

59:41 I'm still able to just perfectly fine like compile projects that I wrote in college.

59:46 And they still like work perfectly fine like all these years later on a different platform,

59:52 on a different Java version.

59:53 It's still just okay.

59:55 So we do hope that like from now on, there's not going to be a very far off Python 4 that breaks compatibility

01:00:03 in crazy ways again.

01:00:04 So we pretty much learned from this experience that, hey, we don't want to do this to people anymore.

01:00:11 That doesn't work for anybody, including the stress that it actually builds on core developers.

01:00:16 So yeah, I'm pretty happy like that I'm going to be for 3.8 and 3.9.

01:00:22 If I know my luck, Python 3.9 is going to be the last Python 3 version.

01:00:27 So again, like it's going to become like the new Python 2.7.

01:00:31 And I'm going to release it like for the next 15 years.

01:00:35 So yeah, this is like into retirement.

01:00:37 You're going to be working on 3.9.

01:00:39 Yeah, so that might happen, but hopefully not.

01:00:41 Hopefully it's going to be a gig that is going to end like eight years from now.

01:00:45 So you have to understand like because of all the security fixes that you still release for old versions or whatnot.

01:00:51 It's a pretty long time commitment, but it doesn't take too much time a week.

01:00:56 So I do hope I'm going to have to, I'm going to be able to pretty much combine this with every other activity I'm doing.

01:01:02 I want to be cognizant of your time and not taking it all up.

01:01:05 But just really quickly, what features would you like to see in these new versions 3.8 and 3.9?

01:01:09 And particularly, I wrote the single dispatch like generic functions like in Python.

01:01:15 And ever since, I was just poked by everybody to actually go full on multiple generic dispatch.

01:01:23 So I think like it's time for that.

01:01:25 And it would be nice for Python 3.8 to fully implement that.

01:01:29 What else?

01:01:30 Performance.

01:01:31 It's sort of always a second priority feature for Python.

01:01:36 Maybe we'll see that performance optimization that you're talking about types make its way into one of these.

01:01:41 Oh, it would be great if that actually shipped in 3.8.

01:01:44 That would be very optimistic for me to say that it will.

01:01:47 But there's other areas of interest there.

01:01:49 Like for example, speeding up startup time.

01:01:53 So for command line utilities, for bigger projects that have like thousands of files that are involved.

01:01:59 Startup time in Python is not great and could be improved.

01:02:04 So I would actually very much like to see progress on this.

01:02:08 There were a bunch of crazy ideas.

01:02:11 Again, like that are very likely to happen during the core sprint last September.

01:02:16 They didn't quite end up being like ready for Python 3.7, but it is pretty likely that they're actually going to land.

01:02:23 That sounds really awesome.

01:02:24 And I'm looking forward to your sort of overseeing that whole process.

01:02:28 That's great.

01:02:29 Let me hit you with the last two questions.

01:02:31 The two questions before we get out of here.

01:02:33 If you're going to write some Python code, what editor do you open up?

01:02:36 I used to be a Vim person.

01:02:38 Like starting from my first Python conference and back in 2008, when I, you know, sat randomly next to a person that was like a Vim god

01:02:47 and like did crazy things with it that I never thought were possible with an editor.

01:02:52 It really looked like the code was just appearing.

01:02:55 Like there was no cursor that like the person was just sort of fighting with.

01:03:00 It was just like organically forming new ideas.

01:03:04 So for me, that was like, oh, this is amazing.

01:03:06 So I've been using Vim for more than close to a decade.

01:03:11 But then I found out that Vim is always this thing where you can get it maybe to 90%

01:03:16 of what you want with every feature.

01:03:18 So nothing ever works like perfectly for it.

01:03:23 Especially like the thing that I told you about where we have developer servers that are sometimes

01:03:30 very far from us.

01:03:31 The responsiveness and latency from running your Vim over several thousand kilometers,

01:03:38 that was actively like impacting my productivity.

01:03:41 So I looked for something that would be running locally on my box.

01:03:46 And Adam, which is a bit absurd for me to admit, but yes, this editor written in JavaScript, right, is what I use now.

01:03:55 It has a nice set of plugins released under this umbrella called Nuclide by Facebook that like

01:04:03 enables remote code like development, including a remote debugger for Python.

01:04:10 So you can just debug a process that is running in North Carolina and you just step through it

01:04:15 and it shows you where you are in your file and you can set watches and do like all the

01:04:19 things that you would expect from it.

01:04:21 I have Vim bindings for this.

01:04:23 I'm using it sort of like a primitive, almost functional Vim that way, but all the extra

01:04:29 functionality makes it totally worth it for me.

01:04:31 That sounds really cool.

01:04:32 It definitely sounds like a good remote work setup instead of just like SSHing over because

01:04:38 that sounds tough in terms of latency.

01:04:40 All right.

01:04:41 So a notable PyPI package.

01:04:43 Notable PyPI package.

01:04:45 I think this is super underutilized and you should all use it.

01:04:48 It's called adders.

01:04:49 Hinek Schlavak wrote it.

01:04:50 It is actually a way of creating full featured types in Python.

01:04:56 So full feature classes without all the boilerplate.

01:04:59 So by just specifying essentially like a schema for your class saying this class is going to have

01:05:06 those fields.

01:05:07 What you're getting back is like already made init method, already made a wrapper, str, being

01:05:15 able to compare like those objects of this class meaningfully.

01:05:19 You can configure it to slots.

01:05:21 You can configure it to create immutable classes and whatnot and whatnot.

01:05:25 So it is a very powerful package that sort of feels like next generation Python.

01:05:32 Like it removes a lot of the boring, you know, setting arguments to self.argument name, like,

01:05:39 you know, in the init method and so on and so on.

01:05:41 And more importantly, it's always correct, right?

01:05:44 So by removing the boilerplate that you create manually, you make sure that it is going to be

01:05:49 fine every time.

01:05:50 So I kind of recommend this high enough.

01:05:52 If you wish to wait for Python 3.7, this is getting included, like a rewrite of it essentially

01:05:59 is getting included in the Sun library called data classes.

01:06:02 But others is out there now.

01:06:04 It is pretty mature by now.

01:06:06 It's been maintained by Hineg for a number of years now.

01:06:10 We use it extensively.

01:06:11 I can't recommend it high enough.

01:06:14 Sounds really nice.

01:06:15 And it's definitely a cool project.

01:06:16 Thanks for recommending that one.

01:06:17 All right.

01:06:18 Final call to action.

01:06:18 And people are excited about types and the benefits to their, you know, sort of upgrading

01:06:24 their code, finding all these bugs.

01:06:26 How do they get started?

01:06:27 What's the final call to action for you?

01:06:28 If you're afraid of types and think they don't fit in Python and they're not Pythonic, you

01:06:35 should think about this.

01:06:36 This is information that you were already putting somewhere in your documentation, maybe in your

01:06:41 doc strings and whatnot.

01:06:42 Type annotations is a piece of technology that only formalizes where you're supposed to do

01:06:47 this.

01:06:47 And on top of this, it will help you to fix bugs and find like future ones.

01:06:53 Like that's great.

01:06:54 That only makes it like more usable for you.

01:06:57 So in this Pythonic thing, it's still very new, but the tooling is now mature enough for

01:07:03 actual adoption by random non-expecting users.

01:07:06 So if you're afraid, just try it out.

01:07:08 See how that actually looks for you.

01:07:11 It's not going to cause your code to look like Java or Scala.

01:07:14 It is still very much Python.

01:07:16 It doesn't actually cause changes to how you code Python.

01:07:21 I think you should make an informed decision basically by trying it out yourself.

01:07:25 Yeah, it's great advice.

01:07:27 And I definitely second it.

01:07:28 Lucas, thank you for being on the show.

01:07:30 It was great to talk to you about all this stuff.

01:07:32 All right.

01:07:32 Happy to be here.

01:07:33 Thank you very much.

01:07:33 Yeah.

01:07:34 Bye.

01:07:35 This has been another episode of Talk Python To Me.

01:07:38 Today's guest was Lucas Lange.

01:07:40 And this episode has been brought to you by Linode and Talk Python Training.

01:07:43 Linode is bulletproof hosting for whatever you're building with Python.

01:07:48 Get four months free at talkpython.fm/Linode.

01:07:52 That's L-I-N-O-D-E.

01:07:54 Are you or a colleague trying to learn Python?

01:07:56 Have you tried books and videos that just left you bored by covering topics point by point?

01:08:01 Well, check out my online course, Python Jumpstart, by building 10 apps at talkpython.fm/course

01:08:07 to experience a more engaging way to learn Python.

01:08:10 And if you're looking for something a little more advanced, try my Write Pythonic Code course at talkpython.fm/pythonic.

01:08:17 Be sure to subscribe to the show.

01:08:20 Open your favorite podcatcher and search for Python.

01:08:22 We should be right at the top.

01:08:23 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.

01:08:32 This is your host, Michael Kennedy.

01:08:34 Thanks so much for listening.

01:08:36 I really appreciate it.

01:08:37 Now get out there and write some Python code.

01:08:39 I really appreciate it.

01:08:39 Bye.

01:08:40 Bye.

01:08:41 Bye.

01:08:43 Bye.

01:08:45 Bye.

01:08:47 Bye.

01:08:49 Bye.

01:08:51 Bye.

01:08:53 Bye.

01:08:55 you Thank you.