Learn Python with Talk Python's 270 hours of courses

Grumpy: Running Python on Go

Episode #95, published Wed, Jan 18, 2017, recorded Thu, Jan 12, 2017

Google runs millions of lines of Python code. The front-end server that drives youtube.com and YouTube’s APIs is primarily written in Python, and it serves millions of requests per second!

On this episode you'll meet Dylan Trotter who is working increase performance and concurrency on these servers powering YouTube. He just launched Grumpy: A Python implementation based on Go, the highly concurrent language from Google.

Links from the show:

Grumpy home page (redirects): grump.io
Grumpy at github: github.com/google/grumpy
Announcement post: opensource.googleblog.com/2017/01/grumpy-go-running-python.html
Dylan on Github: github.com/trotterdylan

Deep Learning Kickstarter: kickstarter.com/projects/adrianrosebrock/1866482244
Hired's Talk Python Offer: hired.com/talkpythontome

Episode Transcript

Collapse transcript

00:00 Google runs millions of lines of Python code.

00:02 The front-end servers that drive YouTube.com and YouTube's API are primarily written in Python,

00:09 and they serve millions of requests per second.

00:13 On this episode, you'll meet Dylan Trotter, who is working to increase the performance and concurrency

00:19 of these servers powering YouTube.

00:20 He just launched Grumpy, a Python implementation based on Go, the highly concurrent language from Google.

00:28 This is Talk Python to Me, recorded January 12, 2017.

00:56 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

01:04 This is your host, Michael Kennedy.

01:06 Follow me on Twitter, where I'm @mkennedy.

01:08 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.

01:15 This episode has been sponsored by Hired and, as a new sponsor, pyimagesearch.com.

01:21 They're announcing a Kickstarter campaign called Deep Learning for Computer Vision with Python,

01:26 launching on Kickstarter right now.

01:28 Thank both of these companies for supporting the show by checking out what they have to offer

01:32 during their segments.

01:33 Dylan, welcome to Talk Python.

01:35 Thanks.

01:36 Nice to be here.

01:37 Yeah, I'm really excited to talk about Grumpy, actually.

01:41 Grumpy, your Python project.

01:42 It's going to be fun.

01:43 Yeah, it's been pretty exciting a couple weeks since the release.

01:47 So, yeah, I'm excited to talk about it, too.

01:48 Yeah, it's definitely gotten a lot of attention in the open source world on GitHub.

01:52 And we're going to dig into a lot of the details behind it.

01:55 But let's start with you and your story.

01:56 How did you get into programming in Python?

01:58 I started programming, I guess, when I was in high school.

02:01 I took, like, an intro programming course and kind of got the bug.

02:07 And I just kind of took it from there.

02:10 I was really into, like, programming little games and stuff like that back then.

02:13 I did not do CS in university.

02:17 I actually did physics, but I continued to work on programming in my own time a lot.

02:23 And after or during university, actually, I got a gig at a sort of summer gig at a software company.

02:30 And that gave me a leg up when I graduated, which was pretty lucky.

02:35 And so I got a job at a visual effects company doing software there.

02:42 So there was a bunch of different things, like a lot of sort of proprietary languages for the different packages.

02:48 But Python sort of came out as a front runner in terms of integration with different visual effects packages and stuff like that.

02:56 And so that's where I started to dig into Python, especially not so much on the sort of, like, effect side,

03:03 but more on the pipeline data management kind of side of things.

03:07 So there's a lot of asset management and stuff going on in visual effects studios.

03:11 And Python's great for that sort of stuff.

03:14 Yeah, that's really cool.

03:15 I did a whole episode on Python and, like, game development studios and movies and production and stuff.

03:22 I was really surprised how much Python glues all the tooling together for those folks.

03:27 Yeah, it's really deep in there.

03:28 In fact, when I was working in that area, that's when Python sort of started to come to the fore.

03:35 And so, like, Maya, which is a big, like, modeling and animation package, built Python integration around that time.

03:44 And Houdini is another one, similar use cases that integrate.

03:48 Or actually, I think from pretty early on, Houdini had Python integration.

03:53 So, yeah, it sort of became the de facto visual effects integration language.

03:58 Okay.

03:58 Yeah, yeah, very cool.

04:00 And I think that's only growing.

04:01 It seems like there's a couple areas where Python is sort of past critical mass.

04:06 It's kind of like a black hole now.

04:09 It's just sucking everything into it.

04:10 Totally.

04:10 Yeah, yeah.

04:11 And that's a good thing.

04:12 So you don't do visual effects anymore, although you kind of work in the video world these days.

04:19 Why don't you tell everybody what you're up to?

04:20 Yeah, sure.

04:21 So I'm at YouTube now.

04:23 I started there about seven years ago.

04:25 It was kind of a big shift for me.

04:28 Visual effects was a fun environment, but it was always kind of a dream to work at Google and stuff.

04:35 So I took a gig at YouTube.

04:36 And I've worked on a number of different teams there, actually.

04:40 I've worked on sort of user-facing features.

04:43 I was on the channels team for a long time, working on YouTube channels and stuff around that.

04:50 And eventually got more into the infrastructure side.

04:54 And so now I'm working on what's called, I guess, the application infrastructure group.

05:01 And our team specifically looks after the application server that serves YouTube.com and YouTube APIs and those sorts of things.

05:10 Excellent.

05:10 So when we're all watching various things on YouTube, be it cat videos or something educational, we have you to thank for keeping those servers running.

05:19 Yeah, well, me and a lot of other people.

05:22 Yep.

05:22 Yeah, yeah, I'm sure.

05:24 Well, I'll thank you individually.

05:26 So, yeah, awesome.

05:27 Yeah, it sounds like a really fun place to work.

05:30 Where's YouTube?

05:31 Where's the center of the universe for YouTube?

05:33 Is that Mountain View or somewhere else?

05:34 San Bruno actually is where the main YouTube campus is.

05:38 So there's a few different offices around the world.

05:42 But the biggest group of the biggest sort of geographical concentration is in San Bruno.

05:49 So there's a few buildings there that are YouTube.

05:52 Okay.

05:52 Yeah.

05:53 Nice.

05:53 That sounds like so much fun.

05:55 So Python actually plays a really important role at YouTube these days.

05:59 Let's talk about how it's used now and then how that kind of came to be.

06:02 Sure.

06:02 Yeah.

06:03 So Python is what is running the main application server and a lot of the application code for

06:10 the YouTube front end and for that serves like the website and the API service APIs that,

06:18 you know, service your phone and those sorts of things.

06:21 So it's sort of like the gateway for most user traffic.

06:26 Right.

06:27 And then maybe the Python code branches back into all sorts of Google services behind the

06:31 scenes that are in a variety of technologies or something like that.

06:34 Right.

06:34 Yeah.

06:34 There's a, there's a lot of different technologies and servers involved in the whole thing.

06:38 Yep.

06:39 Okay.

06:39 So YouTube wasn't initially a Google creation, right?

06:43 It was created by some other folks.

06:44 It was founded in, in 2005, I think by three guys.

06:50 One of the one of the one of the one of the one of the ones that I joined in 2009 is Chad Hurley.

06:55 I think at the time he was the president or something.

06:59 He left shortly after I joined.

07:01 But yeah, they, they built it in 2005 and it gained a lot of traction really early on.

07:08 And I guess Google took an interest at some point in 2006 and, and ended up buying YouTube in

07:16 November, 2006.

07:17 Yeah.

07:18 I'd say that was a great move for them because it's, it's such a central part of the internet.

07:22 Yeah.

07:23 I feel like it, it had YouTube.

07:25 The idea was something that a lot of people probably had the idea.

07:30 It was a thing that clearly should exist.

07:33 But when you think of the infrastructure and the bandwidth costs and just the actual act

07:39 of creating such, such a huge video network seems prohibitive.

07:44 But, you know, once it came into existence, you know, I guess Google jumped on it.

07:47 That's cool.

07:48 I actually remember thinking what a simple idea it was and, and how like, it seemed so crazy

07:54 at the time that I think the acquisition cost was like 1.6 billion or something like that.

07:58 And, and I remember reading about that and I was like, good Lord, like, you know, you

08:02 could, how, how is something so simple worth so much?

08:06 But now that number seems so quaint compared to recent, recent stuff.

08:11 So, yeah, yeah, of course, of course.

08:13 I mean, a lot of companies go through the thinking, it's easy to go through the thinking of you

08:18 could have just built that yourself.

08:19 Yeah.

08:20 Right.

08:20 I mean, Facebook bought Instagram for like an insane amount of money and that's like a team

08:24 of 12 people, right?

08:25 For whatever it was like 19 billion or something.

08:28 They, they could have easily paid 12 people to build another Instagram, but it's, it's also

08:32 also got people's interests.

08:35 It's got the users.

08:36 It's got the momentum.

08:37 And that, that's the thing I think people buy.

08:40 Absolutely.

08:40 That's what you're paying for.

08:41 Yeah.

08:42 But they didn't write it in Python at first, did they?

08:44 No, they, the first implementation I believe was PHP.

08:47 And I don't think that lasted very long.

08:49 I think it was, most of that was rewritten in Python pretty early on.

08:53 Well, before I was there.

08:54 Sure.

08:54 Sure.

08:56 I suspect the way YouTube looks today with the growth of cloud computing and all the

09:01 different APIs and services is probably super different from when, what you guys got back

09:06 in 2006, right?

09:07 Yeah.

09:08 It's, I mean, I think, you know, the company has grown a lot.

09:11 The, the use cases have grown a lot.

09:14 It's just, I mean, it's kind of night and day.

09:16 It's when I, when I first joined, you know, everyone was kind of in one floor of one building.

09:21 And since then, you know, there's distributed all over the world.

09:24 And so, yeah, it's, it's, it's changed a lot.

09:27 Sure.

09:27 Wow.

09:29 Okay.

09:29 So that brings us to today, to YouTube and what this project that you, was this something

09:37 that you created this project called grumpy or where did this come from?

09:40 Yeah.

09:41 I talked about some of the challenges we were having with, you know, running Python at scale

09:46 and on, on the blog post.

09:48 And it basically, there's a few different aspects that, that affects our ability to run, you know,

09:56 that many Python servers.

09:58 The CPython runtime, well, it's, it's really great and it's highly optimized and, and it

10:05 does a lot of things really well for our use case.

10:09 It's never really been a focus for CPython as a project.

10:14 You know, we thought, you know, maybe it makes sense to rethink how the runtime is built with

10:22 a focus on concurrency and, and running large server applications.

10:27 Yeah.

10:27 And you're not, you guys are not the first people to have this idea of, well, maybe we

10:31 could replace the CPython runtime interpreter with something else.

10:37 There's like Jython, there's IronPython, there's PyPy, there's plugin JIT.

10:44 So there's a lot of stuff happening there, but nobody's gone in the direction that you went

10:48 in, right?

10:49 Yeah.

10:49 Yeah.

10:49 It was, it was, it was an interesting, I mean, it's, you know, in a lot of ways it's kind

10:53 of crazy.

10:53 And the thing about Go, the Go runtime that Grumpy is based on is that it is kind of designed

11:01 for very similar use cases to what we are interested in.

11:06 So Go tends to be, tends to be used for writing highly concurrent server applications with,

11:14 you know, a lot of like sort of message passing and things within the, within the application

11:20 between threads.

11:21 It seemed like kind of a good fit.

11:25 And once I started to flesh things out and to build out some of the core functionality,

11:30 some of the pieces started to fall into place and it started to look actually really compelling.

11:35 And you're like, Hey, we could actually do this.

11:37 We should stop for just a second.

11:39 I don't think we've explicitly said your project is called Grumpy, which is a replacement for

11:45 the CPython implementation with a entirely different Go implementation, right?

11:51 Yeah, that's right.

11:52 Yeah.

11:52 Yeah.

11:53 So, so very interesting.

11:54 I think, you know, Go, obviously it makes sense for Google to be the ones experimenting

12:00 with Go, right?

12:01 Go comes from Google, doesn't it?

12:03 It does.

12:03 Yep.

12:03 It was developed, I think, originally by, well, Rob Pike and I'm going to mix it up.

12:11 It's either Ken Thompson or no, it's, yeah, it's Ken Thompson, I believe.

12:15 Yeah.

12:16 It was, it was developed for, I guess they had observed that I get similar to, you know, the,

12:23 what the observations that we made about running Python programs for Python server programs.

12:29 They had made sort of more general observations about writing server applications and how languages

12:36 that existed didn't, didn't quite fit what our use cases.

12:40 Yeah.

12:40 Go is really quite, it's one of the newest languages out there that I would consider a mainstream

12:45 language.

12:46 It's not as mainstream as C, but it's definitely getting there and came out in 2012 in version

12:51 one sort of officially.

12:52 So it's born within this world of multi-core microservices, distributed cloud computing stuff,

13:01 right?

13:02 Yeah.

13:03 Yeah.

13:04 Okay.

13:04 So let's dig into the, what is grumpy?

13:08 Let's dig in a little bit.

13:09 Like how do I take, so I can take my Python code.

13:13 I can write some, presumably some web app or something in a web service, and then I can run

13:20 that on grumpy.

13:20 Like what, what does grumpy do?

13:22 How does it take my Python code and run it?

13:24 So grumpy is, takes a little bit of a different tack than CPython.

13:29 It's actually a trans compiler and a runtime, whereas you can kind of think of CPython as

13:36 a, it's like a virtual machine bytecode interpreter and runtime.

13:40 And in that sense, it's kind of like a combination of Cython and, you know, a bundling of Cython and

13:48 CPython, except that it's all in Go.

13:50 Right.

13:51 So Cython takes a flavor of Python and then compiles it to C directly.

13:58 And like C, Go is a statically typed compiled language.

14:04 And so it's no longer interpreted.

14:07 It's not even like JIT compiled like Java or .NET.

14:11 It's full on compiled, right?

14:12 That's correct.

14:13 Okay.

14:14 So the sort of runtime side of things is actually like the correspondence is like this Python

14:18 C API.

14:20 There's actually a Go grumpy API.

14:22 And so what it's compiling is code that uses that API to mutate objects, to pull out a state

14:31 and those sorts of things.

14:32 And so whereas CPython or vanilla CPython uses a bytecode interpreter to actually drive those

14:40 API calls, the Grumpy and Cython are actually generating code that drives those API calls.

14:49 Okay.

14:49 Yeah.

14:50 Very, very cool.

14:51 Now in your GitHub repo or the blog post, I don't remember where I got this, but you said

14:56 it's intended to be a near drop-in replacement for CPython 2.7.

15:00 How's that going?

15:02 How far are you towards that goal?

15:04 That's a pretty big set of APIs to cover.

15:07 Yeah.

15:07 I'm learning every day like how big Python is.

15:10 Nobody told me about this weird case I'm going to have to support.

15:14 Oh yeah, totally.

15:15 Yeah.

15:16 I mean, I've been the amount of sort of spelunking I've done in CPython internals is I did not

15:22 expect all that.

15:24 But yeah, so it's going pretty well.

15:27 The core functionality is there.

15:29 So like the basic semantics of the language in terms of attribute access and how types work

15:35 and how method dispatch works, all of that functions basically fine.

15:42 The basic types are all there.

15:43 So lists and dictionaries and things all kind of work.

15:47 Do those mostly map directly to the underlying Go structures?

15:53 Like does a list in Python map to a slice in Go and things like that?

15:58 Or do you have to do more complicated things to map it?

16:00 It's more complicated.

16:01 And the reason is that Python is so dynamic, right?

16:06 Like method dispatch is so dynamic and attribute access.

16:10 You can put attributes on just about anything.

16:12 You know, if it was just this native Go types, then you wouldn't be able to put an attribute

16:20 on a list or on a slice, right?

16:23 Right.

16:23 So it's actually, there's sort of wrapper types, basically structures that actually map very closely

16:29 to CPython's object structures.

16:32 Okay.

16:33 Yeah, I can see that because you're working with a non-dynamic language and yet it has

16:38 to support dynamic capabilities.

16:40 So you got to somehow put a shim in there for that, right?

16:43 That's right.

16:44 Okay.

16:45 I guess the biggest kind of gaps in terms of supporting or being a drop-in replacement are

16:50 the standard library still needs a lot of work.

16:53 So CPython has a lot of its standard library is actually written as C extension modules, which

17:00 Grumpy does not support.

17:02 So that's one area of significant divergence between the two words.

17:06 And we could talk about that more.

17:08 That's turned out to be sort of a big kind of beast to slay.

17:13 The nice thing is that with, you know, all those other Python runtimes out there, there's

17:20 actually, you know, you can find pure Python versions of most things.

17:24 So like PyPy, for example, implements a number of libraries that are in Python that aren't implemented

17:30 in CPython.

17:31 Right.

17:32 So you could say, start this transition or this backfilling of APIs by just moving to

17:39 pure Python implementations that then get sent through Grumpy that actually get compiled

17:45 or run on Go, right?

17:46 Yep.

17:47 That's exactly right.

17:48 And maybe do some profiling and say, well, you know, people use lists a lot.

17:51 Let's write that directly in Go or something like this, right?

17:55 You can optimize later.

17:56 Exactly.

17:56 Yep.

17:57 Okay.

17:57 Yeah.

17:57 I suspect that there's a long tail of like stuff.

18:00 This doesn't really need to be optimized that last 5%.

18:03 Whereas these are the few things that we really should focus on, right?

18:06 Yeah.

18:06 So right now, you know, I'm kind of focused on getting support for the whole, like I want

18:13 to be able to run some common libraries that are written in Python.

18:17 Some, I want some program, Python programs that are out there, like open source programs to

18:22 be able to just use Grumpy.

18:24 So like just getting it to the point where everything runs is the first step and then

18:30 you make it fast.

18:30 Okay.

18:31 Yeah, of course.

18:31 Making it work and then making it fast seems like the right order to me as well.

18:35 So you said in your blog post that there's going to be some things that Grumpy will never

18:40 support and then there's things that it doesn't support yet, but you're working towards.

18:44 Yeah.

18:45 So one of the things I mentioned already is the C extension support.

18:51 The API for CPython is a bit different than the API for Grumpy because it's, well, for one

18:58 thing it's a different language, but also the data structures are a little bit different.

19:02 The function return values and things are a little bit different.

19:05 And so there wasn't a good mapping between those APIs and it would be too constraining for,

19:14 you know, to try to make Grumpy map perfectly to the C API.

19:20 Sure.

19:20 Have you looked at the CFFI stuff that PyPy was using?

19:25 Right.

19:25 So that's, I have not looked very closely at that.

19:29 That is something that we've looked at internally for other reasons as well.

19:33 But that is an interesting way to approach the problem.

19:38 And, and potentially, you know, there are ways to bridge the two APIs that C and CFFI may

19:45 be one of those.

19:46 Yeah.

19:46 Okay.

19:46 Does go must have a C C integration option somewhere, right?

19:51 It does.

19:51 Yeah.

19:52 Yeah.

19:52 Okay.

19:52 And the other thing you said is not going to support is things like eval.

19:55 And again, this is like, it is possible to implement something that's a little bit hokey to support

20:03 eval or exec.

20:04 Shell out and compile.

20:06 Oh yeah, exactly.

20:07 I mean, like that's, well, I mean, it's funny you think about it and like, that's, that's actually

20:11 what Python is doing, right?

20:13 It's like, except that it's a bytecode compiler and then it's executing in a VM.

20:18 If you instead are actually doing a, you know, an actual static compilation and then executing

20:26 that.

20:26 It's not conceptually that much different, except that the tool chain that you have to use to

20:32 do the compilation and stuff is much heavier.

20:33 So it's going to be slower and it just, it kind of doesn't make a lot of sense.

20:38 I think I could see maybe supporting it for, you know, debugging use cases and things like

20:43 that.

20:44 I don't think I, I kind of want to avoid having to worry too much about like, you know,

20:50 making that performant or whatever.

20:52 Yeah, sure.

20:52 I, I, for one would, don't think I would miss it.

20:56 I think it's fine.

20:56 Yeah.

20:57 The other thing about exec and eval is there's very few cases I've ever come across in all

21:04 my years of programming Python where exec or eval was a good idea.

21:08 So actually like, I kind of think that it's an unnecessary aspect of that language.

21:14 Yeah.

21:14 That's interesting.

21:15 And you know, it is kind of keeping with go in the sense that go is very strict about

21:20 conventions and some of the best practices that it believes.

21:23 Like for example, if you have an import of a package and you're not using that package,

21:29 that's a compilation error, right?

21:30 Things like that.

21:31 Right.

21:31 Absolutely.

21:32 Yep.

21:32 Yeah.

21:32 So eval skipping eval seems like that's all right.

21:35 This portion of talk Python to me is brought to you by hired hired is the platform for

21:51 top Python developer jobs, create your profile and instantly get access to 3,500 companies

21:56 who will work to compete with you.

21:58 Take it from one of hired users who recently got a job and said, I had my first offer on

22:02 Thursday after going live on Monday and I ended up getting eight offers in total.

22:06 I've worked with recruiters in the past, but they've always been pretty hit and miss.

22:09 I tried LinkedIn, but I found hired to be the best.

22:12 I really liked knowing the salary upfront.

22:14 Privacy was also a huge seller for me.

22:17 Sounds awesome.

22:18 Doesn't it?

22:18 Well, wait until you hear about the signing bonus.

22:20 Everyone who accepts the job from hired gets a thousand dollars signing bonus.

22:24 And as talk Python listeners, it gets way sweeter.

22:27 Use the link hired.com slash talk Python to me and hired will double the signing bonus

22:31 to $2,000.

22:32 Opportunities knocking.

22:33 Visit hired.com slash talk Python to me and answer the door.

22:37 Then you said there's a set of things that you're going to support, but it doesn't yet.

22:49 What are those?

22:50 We talked a little bit about some of this stuff, but like the standard library is not there yet.

22:56 There's a subset of the standard libraries is available currently.

23:00 Can you give us like a percentage of what, how far you are down that path?

23:04 I mean, everyone listening, this, this whole project has been like, I don't get up for three

23:09 or four weeks.

23:09 So it's not like you should have implemented at all.

23:12 It's just curious, like how far you've gotten.

23:14 It's really hard to put a percentage on it.

23:16 I guess, I mean, I probably could like, you know, compare lines of code or something, but

23:19 I think that what's going to happen is you're going to get sort of a core set of libraries

23:25 that run all the other libraries and everything will just kind of fall into place.

23:29 So I think it's, it's sort of more important to count those core libraries.

23:33 And you know, that's, that's things like types and collections and operator and all those

23:38 things.

23:38 And some of those are already there.

23:42 I mean, I, I feel like, yeah, it's hard to put a number on it.

23:46 Yeah, sure.

23:47 Maybe it's one of those things where it's, it seems like you're not very far and then all

23:52 of a sudden it kind of unlocks and things go really quick.

23:55 That's the dream.

23:56 That's a good way to like, it's a, it's a optimistic view of the future.

24:02 That's right.

24:03 If you're going to clone the repo off GitHub and try things out, like you may be disappointed

24:09 that, that your favorite libraries aren't there.

24:12 There's a good chance that if you have a program that's, that's at all, you know, complex that

24:18 there are some libraries that are missing for you.

24:20 I'd say, I don't know, maybe 20% or something like that.

24:23 Okay.

24:24 Well, that's good.

24:25 And you said you also want to support all the built-ins.

24:27 That's right.

24:28 Yeah.

24:28 That's obviously a good idea.

24:30 Yeah.

24:30 Those are important.

24:32 And again, you know, there's a bunch of stuff that just hasn't a bunch of like functions

24:37 like map and, and reduce and things like that, that I haven't got around to, haven't needed

24:45 to support them yet, but they're actually pretty straightforward to implement by and large.

24:50 So, so I think we're, we're pretty far along on, on that stuff.

24:54 So how much of your focus on Grumpy is going to be to make this a project that you guys could

25:01 use for your specific use cases at Google and then make that a skeleton or base and people can

25:09 come along and add other features and contribute to the open source project to make it more broad

25:14 versus how much are you trying to make this like we're trying to re entirely replace CPython.

25:19 So I think that we, I want to see, I like, okay.

25:24 So, so I put it this way.

25:25 I'm interested in, you know, solving some of these concurrent use cases that don't have a great answer

25:33 in CPython.

25:34 That's the primary focus.

25:35 But I, again, it might be my optimism is showing again, but like, I feel like once you kind of have

25:41 some of those use cases locked down, now people start to use it for, for things you didn't expect

25:48 right away.

25:49 I know that like scientific computing is an area where Python has a really well-established

25:57 libraries and, and NumPy is, is sort of crucial to some of this stuff.

26:01 And that's got, that's C, you know, involves C extensions.

26:05 And I think in the near term, I don't see Grumpy being useful for numerical analysis.

26:11 And, you know, that's kind of compounded by Go doesn't have too many sort of inroads in that

26:17 direction either.

26:19 So, but on the other hand, you know, some of the static, the advantages of like being

26:25 statically compiled and, and type inferencing and compiling down to native operations, that is

26:33 potentially useful for, you know, scientific computing and those sorts of things.

26:37 So, so I kind of see, you know, I want to focus on our, our immediate use cases, but I have this

26:42 kind of idea that there's.

26:44 You know, more opportunities out there once that's, once things are kind of working.

26:48 Okay.

26:49 Yeah.

26:49 Yeah.

26:49 That, that seems like a good roadmap to me.

26:51 It makes a lot of sense.

26:52 So let's talk about the execution engine, which effectively, effectively is the execution engine

26:59 of Go versus CPython.

27:01 So CPython, the Python code gets converted to bytecode.

27:06 Those bytecode instructions are sent to like a super large force for loop switch sort of thing.

27:13 And those are interpreted and run.

27:15 How does Go work?

27:17 Go has a runtime, which is to say that there's code that is running, that's managing things

27:25 like Go routines, which are the equivalent of threads and, and Go programs and garbage collection

27:32 and things like that.

27:33 But much of what is actually happening throughout a Go program is just, is actually, you know,

27:39 low level machine instruction.

27:41 So the Go program, much like a C program is compiled down to a machine code and actually

27:47 executed natively.

27:48 Right.

27:48 That makes sense.

27:49 So you say Go has a garbage collection, which is, is awesome.

27:55 Do you know what kind it is?

27:57 Is it reference counting or is it like mark and sweep or what, what kind of garbage is it

28:02 deterministic?

28:03 How's the garbage collector work and go?

28:04 This is not my area of expertise.

28:07 Nor mine.

28:07 But it is not reference counted.

28:10 So I believe that it is a, and actually this has changed significantly.

28:15 I believe in 1.7, they significantly re retro or sort of retrofitted the garbage collector.

28:22 It mostly just around the way that garbage for particular Go routines is managed garbage that

28:31 is sort of local to particular Go routines.

28:33 And, but it's, it's sort of a traditional, otherwise it's pretty, it's kind of a traditional garbage

28:40 collector that much similar to what Java has.

28:46 But it's actually much simpler.

28:47 Java has a, a number of different algorithms it supports and a lot of tuning parameters.

28:52 Go's garbage collection is fairly, is much simpler and is targeted for the use case of,

28:59 you know, handling requests in a server application and those sorts of things.

29:03 Yeah.

29:04 It makes sense.

29:04 I suspect they highly parallelize that thinking of Go as well.

29:07 One thing you said that's nice about executing ultimately on Go is you said the deployment story

29:13 is a little bit simpler.

29:15 You know, Python, you do, when you deploy a Python program, you are actually including your like

29:23 PY files or at least your PYC files in the deployment.

29:28 And so you have to have some way to sort of package them together and ship them off to

29:33 production or wherever you're running your program.

29:36 Right.

29:36 And beyond that, also the dependencies and the runtime, right?

29:39 So you got to have all of those things.

29:41 That's right.

29:42 Which can make it really tricky.

29:44 And there's things like PY2 AMP, PY2 XE, CX freeze, the, the Bware project.

29:48 There's a lot of project trying to make that something you can ship around, but it's not simple.

29:53 Yeah, that's right.

29:54 And, you know, so I'm sure people who have run Python in production have run into, you

29:59 know, version mismatches, things like that, using the system Python version, which was,

30:04 you know, different than the one they were developing on and so on.

30:07 The nice thing about statically compiled programs in general is that you, you produce a binary

30:13 and you just, you can put that just about anywhere and it'll run.

30:18 And that's very true for Go programs.

30:21 There's few dependencies in most cases.

30:24 Most of the, the runtime is actually compiled or is actually linked into the executable.

30:30 Yeah, that's really cool.

30:31 What's the size of like a Hello World compiled output?

30:35 Do you know?

30:35 I have not looked at the size myself.

30:38 I think I saw some comment somewhere that said it was something like three megabytes.

30:44 So it's, it's pretty substantial, but you know, that, that includes a lot of overhead for

30:49 the runtime that, that wouldn't increase significantly if your program grew.

30:54 Right.

30:54 Absolutely.

30:55 Like, you know, the next 10,000 lines add 10 K or something.

30:58 Right.

30:58 Exactly.

30:59 Yeah.

30:59 I think three megs is totally fine to get a good deployment story, stability.

31:04 You run what you shipped, all those things.

31:06 Like if this was 1994, three megs would be a problem, but it's not today, right?

31:11 Yeah, that's right.

31:12 Nice.

31:13 So what sort of optimizations do you think are possible if you run Python on Go, if rather

31:20 than as an interpreted system?

31:22 This is not an area I've dug into significantly yet.

31:27 My thinking is that if you can determine that a particular, for example, a particular integer

31:33 counter in a function is only ever an integer type and it only, you know, uses integer operations

31:40 like increment or, or whatever, then there's no need to go through the whole Python method

31:46 dispatch and creating new integer objects.

31:50 Every time you increment that counter, you can actually just use a native integer and increment

31:56 using native operations.

31:57 So that's a, that's a really simple example, but not, not uncommon.

32:02 I think once you kind of broaden that to a whole program optimization, that's when things

32:08 start to get interesting because then you can think about like, well, if you know that a

32:12 function is only ever called with particular parameters or parameters of a particular type,

32:18 then you can make some assumptions and again, use native, maybe use native data types.

32:25 Sure.

32:25 What about type annotations?

32:27 And I know that's more a Python three thing, but would you be able to, or interested in having some flavor that

32:35 takes type annotations and then uses that for certain types of optimizations?

32:39 Yeah.

32:40 I thought about this and, and I'm a little ambivalent because, you know, type annotations, the way that

32:46 they are sort of used today, they're not intended to actually, you know, raise or anything if they're

32:52 not respected.

32:53 it's mostly for analysis before you ship your program to like, you know, make the linting, the

33:01 linters job easier and things like that.

33:02 Right.

33:03 And so, when, once it actually in CPython, once your type annotations, once you're actually

33:09 running your program, the type annotations basically have no effect.

33:12 And so I'm a little hesitant to say that grumpy should use these in a more, in sort of a more

33:21 strict way, because I think that might have affect programs compatibility and stuff like

33:27 that.

33:27 Yeah.

33:28 It will absolutely do that.

33:29 Wouldn't it?

33:29 Yeah.

33:29 There's some real advantages there.

33:31 If you, if you do make them strict, then you say that a type, an argument is an integer,

33:36 then yeah, it makes the optimizer's job way easier because it can, you know, it doesn't have

33:40 to do any inferencing to determine that relationship.

33:43 Obviously it would break the sort of contract with type annotations that these are just for editors

33:49 and linters and to help you, but not actually meant to affect the runtime.

33:53 That's right.

33:54 On the other hand, if, if you could make some part of code that's like really critical go,

33:59 you know, 10 or a hundred times faster by putting a type annotation that's strict, you know,

34:05 you might be willing to make that trade off.

34:06 So I have no, I don't know which way would be the right way to go either, but it's interesting

34:10 to think about.

34:11 Yeah.

34:11 I'm very curious how that sort of evolves.

34:15 Yeah.

34:16 Yeah.

34:16 Yeah.

34:16 I'm going to keep an eye on it.

34:18 That's cool.

34:18 So let's talk about when you launched.

34:20 So this should be pretty fresh in your mind, right?

34:22 Yeah.

34:23 It's not a very old project.

34:25 It's about a week, week and a day.

34:27 Yeah.

34:28 Yeah.

34:28 So we, well, I guess we migrated the code to GitHub in mid-December and I spent some time

34:37 over the next month kind of cleaning it or the next few weeks cleaning up the code and adding

34:42 some functionality for the build system that we were not able to use, obviously the internal

34:50 Google build system in the open source project.

34:53 So I had to build some of that out and then I guess January 4th.

34:59 Yeah.

34:59 I guess it was the 4th.

35:01 That's eight days ago just for the day of the recording.

35:03 Yeah.

35:04 Yeah.

35:04 We did.

35:05 We, we sort of coordinated an open source blog post with, with the actual making the GitHub

35:11 repo public and got a little bit of traction on hacker news and, and yeah, it was kind of

35:18 astonishing how great the reception was.

35:21 Yeah.

35:21 It's going like crazy.

35:22 Like when I took notes to, for this conversation, like four or five days ago, I had said there

35:27 were 5,000 stars in GitHub.

35:28 Now, maybe that was three days ago.

35:31 Now there's 6,000, almost 6,317 contributors.

35:35 That's, that's a pretty serious uptake for a project that's been out for eight days.

35:40 Yeah.

35:40 I, I think the thing that kind of blew me away most was the number of pull requests that

35:46 I got.

35:46 I mean, right on day one, people were digging into the code and, you know, doing like it,

35:52 it, the code, there are tricky parts to the code and it's not necessarily obvious how you

35:56 ought to write certain features.

35:59 And people, you know, really dug in and started filling out some of this functionality that's

36:04 missing and started talking about, you know, well, how are we going to support programs or

36:09 libraries, Python, third party Python libraries out of the box and stuff like that.

36:15 So it's been great.

36:16 I've had a really good time working with some, some of these people that have been contributing.

36:20 Yeah.

36:21 Yeah.

36:21 I would say that's really cool.

36:22 You talked about the code a little bit, looking on GitHub, GitHub thinks it's 77% go

36:28 code, 22% Python code and a bit of a make file.

36:31 Yeah.

36:32 That's about right.

36:32 Yeah, that's about right.

36:33 And, and a lot of that Python code is actually just tests, and benchmarks and things.

36:38 So it's, it, most of it is, is go.

36:41 And, and actually, I guess the standard libraries, which most of which are copied from, from

36:47 other places like CPython, there's pretty substantial amount of Python, but that's not like, you know,

36:51 I don't think about too much about that code since we don't have to write it or maintain it.

36:55 Yeah, absolutely.

36:56 So how do you ensure compatibility in this?

36:59 Like, are you running the standard CPython test?

37:01 That's something that we're working to.

37:03 So that's sort of milestone number one.

37:05 I haven't published a roadmap document yet, but getting to the point where we can run

37:11 the unit test library is going to be a huge milestone because it means we can then run

37:16 the unit tests that are written for CPython.

37:19 That would be a huge milestone just on compatibility.

37:21 Exactly.

37:22 Before, we get there, we've been writing small tests to that, you know, demonstrate,

37:30 compatibility concerns and stuff like that.

37:32 And then running those in both Python and Crumpy.

37:36 Okay.

37:36 Yeah.

37:37 Very cool.

37:37 So let's talk about why you chose Go because are there three sort of officially blessed languages

37:44 at Google?

37:45 There's Python, there's Go and Java.

37:47 Is that?

37:48 And C++.

37:48 Is that the story these days?

37:49 Yeah.

37:50 Right.

37:50 Of course.

37:50 And C++.

37:51 So four.

37:51 So why did you choose Go?

37:53 Like you could have tried Jython or something, right?

37:55 Jython is something that we, we've looked into.

38:00 Jython is a really great mature product.

38:03 it's our experience that it's better to start a project on Jython, than to migrate

38:11 to Jython.

38:12 There's a number of compatibility issues, not so much like the kinds of compatibility issues

38:18 like, oh, on an, on CPython, this function returns, a different type or something like

38:24 that more that there are certain constraints of running in the JVM that make certain programs

38:31 not work very well or, or those sorts of things.

38:34 So like performance issues that sort of crop up in those sorts of things.

38:37 It sounds like running on the JVM was not the best concurrency server story as it might've

38:45 been running on Go because Go is more focused on concurrency from the beginning and things

38:49 like that.

38:50 That might be more important to you guys.

38:52 Yeah.

38:52 I think that was part of it.

38:53 I mean, like lightweight Go routines are definitely a big advantage to Go.

38:59 So Java has native threads, which have large stacks.

39:02 And so it has sort of a different performance characteristics for concurrent workloads.

39:09 and so you have to kind of write programs, parallel programs in a slightly different way

39:15 for Java, but also, for real time server applications, the JIT actually can be a liability.

39:23 It becomes difficult to, you know, reproduce certain kinds of, certain kinds of issues,

39:31 debug certain kinds of problems and consistent because consistency of, how requests are handled,

39:39 is really important in these kinds of applications.

39:42 And, and the JIT can make, you know, identical requests behave very differently depending on

39:48 where in the life cycle the program is.

39:50 Sure.

39:50 Yeah.

39:50 That makes a lot of sense.

39:51 Being statically typed, you get a little more predictability.

39:54 Absolutely.

39:55 Well, not, sorry, not statically typed, compiled to like machine instructions rather than digits.

39:59 Yeah.

39:59 Yeah.

40:00 Yeah.

40:00 That's right.

40:01 Yep.

40:01 Hey everyone.

40:02 Let me take just a moment and tell you about a new sponsor with a cool and timely offer.

40:06 This portion of talk Python to me is brought to you by deep learning for computer vision

40:11 with Python, a new book from pi image search.com launching on Kickstarter right now.

40:16 Have you ever wondered how Facebook can not only detect your face in an image, but also recognize

40:21 and tag you as well.

40:22 It's not magic.

40:24 Facebook uses specialized machine learning algorithms called deep learning in pi image search wants

40:30 to pull back the curtain and show you how these algorithms work.

40:32 Their new book is designed from the ground up to help you reach expert status.

40:37 Even if you've never worked with machine learning or neural networks before inside deep learning

40:41 for computer vision with Python, you'll find super practical walkthroughs, hands-on tutorials

40:46 with lots of code and a no fluff teaching style that is guaranteed to cut through all the cruft

40:50 and help you master deep learning for visual recognition.

40:53 To learn more about this book and back the Kickstarter campaign, just head to pi image search.com

40:59 slash Kickstarter.

41:01 Yeah.

41:01 So how do you run apps on, on Grumpy?

41:03 Like if I have Python code and I want to make it, make it go, how do I make it go on Grumpy?

41:09 This is sort of a hot topic right now in the, the issue tracker on GitHub because like the

41:14 build system that I have and is strictly focused on, you know, getting the internal libraries working.

41:20 And so it doesn't have good support for building a program that's outside that directory structure

41:25 or using libraries that are in your Python path or anything like that.

41:29 And so we're debating kind of how exactly it should be supported.

41:32 So right now, if you want to run a program or compile a library, you have to kind of drop

41:38 it into that directory structure and the make system will pick up on it and, and, try

41:45 to compile it into go.

41:46 But, ideally, you know, you have some kind of Python path style construct where it can find

41:53 Python code and build it in a sort of standard way.

41:58 That's something that we're working towards.

42:00 Okay, cool.

42:01 Now, if people want to contribute to Grumpy, there's like three major areas that, that make it up.

42:08 You want to talk about those three areas so they maybe can use it as a roadmap?

42:11 You can kind of think of it as the trans compiler, which is the tool called Grump C and that takes

42:19 Python code and it actually uses it's written in Python and it uses the AST module.

42:23 So it's kind of cheating.

42:25 Another milestone will be when Grump C can compile Grump C and, that takes, the Python

42:33 code and spits out some go code.

42:36 And then you're going to, the second part is the Grumpy runtime, which is kind of the parallel

42:41 of the C API.

42:42 The trans compiled go code will depend on that runtime.

42:48 So it imports the runtime and uses the, constructs and functions and things in the runtime.

42:53 And so that's another sort of component that's written strictly in go.

42:58 And that's where all the sort of data structures and things are defined.

43:02 and finally there's the standard library that is a mostly written in or actually exclusively

43:09 written in Python, but also has some uses some of the Grumpy native extensions to actually

43:15 interface directly with go packages and, and functions and things.

43:19 so those are, so there's sort of the three areas and there's a lot of work to do in,

43:23 in all of those different areas.

43:24 I'd say like the standard library is, is the biggest chunk of work to do at this point.

43:29 Presumably you guys chose go because of the concurrency story, right?

43:34 And if you have Python code running on go, you want to leverage that concurrency.

43:40 Do you have to use a different API?

43:43 this is Python two seven.

43:44 So you don't have things like async or wait.

43:47 How do I interact with the concurrency model of go?

43:50 Currently, the way that go routines are made available is through the threading library.

43:57 So the standard Python threading library, you create a thread and start it.

44:00 And that actually starts a go routine instead of a native thread that will work pretty seamlessly with existing code.

44:09 I don't foresee huge problems there in terms of like the differences between those kinds of threads.

44:15 And again, like, you know, go has the concept of channels, which are sort of a message passing mechanism.

44:22 And whereas in Python, you have a queue, the queue data structure, and this isn't actually implemented, but I plan to implement a queue using channels.

44:31 And so you should be able to just write Python concurrent Python code like you always have.

44:36 But I think to really take advantage of sort of the concurrency model, you probably, eventually I'd, I'd like to implement the async and await Python construct.

44:48 I think that would be a huge win.

44:50 Yeah, that would be, that would be a huge win.

44:52 And it seems to me like using the threading API is much more coarse grained concurrency than go is really built for.

45:02 And while it would work, it's not, not taking full advantage.

45:06 The idea with go is you can start a go routine or starting a go routine is extremely lightweight and passing messages back and forth is the way to sort of share state rather than with sharing memory.

45:20 Or sharing objects.

45:21 So I think that programs that are written with sort of heavyweight threads in mind aren't necessarily going to be the best possible way to express that functionality.

45:34 And so, you know, long-term I could see, you know, maybe, well, actually because you can access native go constructs.

45:44 For example, you will be able to, in a grumpy program, use go channels directly.

45:50 You know, that has upsides and downsides.

45:52 It starts to diverge from the Python language and those sorts of things.

45:55 Yeah, but it's not unlike, Iron Python or Jython or those things, right?

46:00 Where you can reach down into the underlying JVM or CLR or something like that.

46:05 That's right.

46:06 Yep.

46:06 Absolutely.

46:07 Okay.

46:08 So if you're going towards async and await, what's the story on Python three?

46:13 Since I feel like the threading concurrency story is a lot better in Python three.

46:17 Yeah.

46:17 I'd love to support Python three.

46:19 The long-term goal is definitely to support it.

46:22 The reason for 2.7 is that we have a large, YouTube had a large existing Python code base and

46:29 that was a 2.7.

46:31 So that was the main reason for choosing 2.7 out of the gate, but certainly long-term,

46:37 I'd like to see all Python three supported.

46:39 Right.

46:40 Oh, that'd be, that'd be fantastic.

46:41 I'd like to see that as well.

46:42 I mean, it certainly makes sense if you're working on the YouTube team.

46:45 YouTube has a tremendously large and widely adopted deployment of Python two seven.

46:51 Like you want to, you know, work where you can have the biggest impact locally, right?

46:55 Absolutely.

46:56 Yeah.

46:56 So reading the tea leaves, does this mean that Grumpy might someday run YouTube?

47:02 I want to hedge a little bit on that.

47:04 I think there's a sort of a long road ahead before Grumpy's ready to handle the kinds of

47:10 large applications that we run on YouTube.

47:13 So I wouldn't want to speculate about the long-term outcomes there.

47:18 Sure.

47:18 Yeah.

47:19 Yeah.

47:19 Of course.

47:19 You know, let me just imagine, let's imagine a world where it did.

47:23 That would be, probably the first few weeks that it switched to Grumpy would be a little

47:30 bit nerve wracking, right?

47:31 Yeah.

47:31 It would definitely.

47:33 If YouTube goes down and it's your fault, that's going to be a problem.

47:36 Yeah, exactly.

47:37 I don't want to be that guy.

47:39 Exactly.

47:40 Exactly.

47:40 Here's the four pages we're giving you.

47:42 No, just kidding.

47:43 But it would, if, if someday that, that came to be, that would be a really cool outcome of

47:47 this project.

47:47 Yeah, absolutely.

47:48 that's, that's sort of the dream.

47:50 Excellent.

47:50 Okay.

47:51 So maybe that's, that's a good place to leave it.

47:52 Let me ask you just a couple of questions before we let you out of here.

47:56 If you're going to write some code, what editor do you use?

47:58 Vim.

47:59 Vim.

47:59 All right.

48:00 Yeah.

48:00 Very cool.

48:01 And there's over 96,000 packages on PyPI these days.

48:06 And I'm sure you've come across some that are kind of unique.

48:09 You're like, Hey, have you heard about this package?

48:11 It's pretty cool.

48:11 You should check it out.

48:12 You got any, coming to mind?

48:13 You know, it's funny.

48:14 I mean, because I do a lot of my, most of my development inside Google, you know, we

48:20 kind of have a different set of tools we tend to use.

48:26 I don't have a ton of, experience with a lot of PyPI packages.

48:30 Yeah.

48:30 So it's a little bit more a dark matter.

48:33 We out here in the larger universe don't get to see a lot of the cool stuff you guys get

48:38 to use.

48:38 I'm sure it's pretty neat though.

48:39 Absolutely.

48:40 All right.

48:41 Awesome.

48:41 So how about a final call for action?

48:43 Like how can people get started grumpy?

48:44 What can they do if they, if this resonates with them, things like that?

48:48 yeah.

48:48 I mean, we're, we're super interested in, in seeing where the project goes.

48:52 I, I don't have, like I said, I would like to see, where grumpy can be useful

48:57 besides just, you know, large concurrent server applications.

49:01 Community feedback around that is great.

49:04 I, people have been filing, issues asking about, you know, support for different things.

49:08 And that's been really illuminating seeing where people are thinking about where this might

49:12 be useful.

49:12 So that's huge.

49:13 if, you have the time and the inclination, try it out, just clone the repo and type make

49:20 run and, and try out Python and go and, report any issues.

49:25 That's really useful to us.

49:27 And, and obviously there's a ton of work to do.

49:30 we talked about some of the different things and, you know, contributions, via PR,

49:36 pull requests on GitHub are really appreciated.

49:38 It's been kind of amazing how much people effort people have put in already.

49:42 So that's been, really exciting for us.

49:45 Yeah.

49:45 It's, it's a cool project.

49:47 And I think if we have yet another powerful, flexible runtime that has some different trade

49:54 offs that we can make for Python, that's great for everyone.

49:56 So congratulations on your project and thanks for sharing it with everyone.

50:00 Yeah.

50:00 Thanks very much, Michael.

50:01 You bet.

50:01 Talk to you later.

50:02 This has been another episode of talk Python to me.

50:06 Today's guest has been Dylan Trotter.

50:09 And this episode has been sponsored by hired and pie image search.

50:12 Thank you both for supporting the show.

50:14 Hired wants to help you find your next big thing.

50:17 Visit hired.com slash talk Python to me to get five or more offers with salary and equity

50:22 presented right up front and a special listener signing bonus of $2,000.

50:27 Struggling to get started with neural networks, deep learning and image recognition.

50:31 Pie image search.com can help with that.

50:33 To learn more about their new book, deep learning for visual recognition with Python and back the

50:39 Kickstarter campaign.

50:39 Just head to pie image search.com slash Kickstarter.

50:44 Are you or a colleague trying to learn Python?

50:46 Have you tried books and videos that just left you bored by covering topics point by point?

50:51 Well, check out my online course Python Jumpstart by building 10 apps at talkpython.fm/course

50:57 to experience a more engaging way to learn Python.

50:59 And if you're looking for something a little more advanced, try my Write Pythonic Code course

51:04 at talkpython.fm/pythonic.

51:08 Be sure to subscribe to the show.

51:10 Open your favorite podcatcher and search for Python.

51:12 We should be right at the top.

51:13 You can also find the iTunes feed at /itunes, Google Play feed at /play and direct

51:19 RSS feed at /rss on talkpython.fm.

51:22 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.

51:28 Corey just recently started selling his tracks on iTunes.

51:31 So I recommend you check it out at talkpython.fm/music.

51:34 You can browse his tracks he has for sale on iTunes and listen to the full length version of the theme song.

51:40 This is your host, Michael Kennedy.

51:42 Thanks so much for listening.

51:43 I really appreciate it.

51:44 Smix, let's get out of here.

52:08 Don't forget.

52:09 you

Talk Python's Mastodon Michael Kennedy's Mastodon