Monitor performance issues & errors in your code

#21: PyPy - The JIT Compiled Python Implementation Transcript

Recorded on Wednesday, Jul 8, 2015.

00:00 Is your Python code running a little slow? Did you know that PyPy runtime could make it run up to 10 times faster! Seriously. Maciej Fijalkowski is here to tell us all about it.

00:00 This is episode number 21, recorded Wednesday, July 8th, 2015.

00:00 [music intro]

00:00 Welcome to Talk Python to Me. A weekly podcast on Python- the language, the libraries, the ecosystem, and the personalities.

00:00 This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy

00:00 Keep up with the show and listen to past episodes at talkpython.fm and follow the show on twitter via @talkpython.

00:00 This episode, we'll be talking to Maciej Fijalkowski about the amazing alternative Python implementation PyPy.

00:00 This episode is brought to you by Hired and Codeship. Thank them for supporting the show on twitter via @hired_hq and @codeship

00:00 Before we get to Maciej, let me share a little news with you:

00:00 First off, Talk Python To Me has a new domain: talkpython.fm. I put the idea of a shorter .fm domain out on twitter and about 80% of the people who responded said they liked it better than the long .com domain.

00:00 A month ago I moved all the MP3 file traffic out of Amazon S3 and into dedicated audio cache server. It's a light-weight Flask / Python 3 app running through nginx and uWSGI. A few listeners expressed an interest in seeing the code so I did some work to generalize it a bit and open-sourced it.

00:00 I'm calling a project Cache- Tier, and you can find the blog post as well as the link to the GitHub project on the show notes.

00:00 Next up, we have a new Python podcast. I'm super happy to announce a new Python podcast by Brian Okken: Python Test podcast You can find it at http://pythontesting.net/category/podcast/

00:00 Now, let's get onto the show.

00:00 Maciej, welcome to the show.

02:19 Thanks for inviting me.

02:21 Yeah. I'm super excited to talk upon our topic today which is PyPy. And, I think what you guys are doing at PyPy is so incredibly cool to be taking some of these JIT compilation, JC sort of semi compiled languages or concepts and applying them to Python. So, I am really happy to talk about that.

02:43 Story of compiling dynamic languages is really sort of old and half forgotten; like we know these days that you can do this with Javascript, but that original work on Smalltalk dates back to at least mid nineties if not earlier, which is what we are building on top of anyways. It's nothing new, the new part is just applying this to Python.

03:08 That's right. I think it's great. Maybe before we get into the details of what you guys are doing, maybe you could give the listeners who are not familiar with PyPy a little history and introduction to it.

03:21 So, PyPy is Python interpreter which works very similarly to the normal thing that you would call Python that technically is called CPython, so Python interpreter written in C. And we have a different Python interpreter which is implemented slightly differently, and for the most part, like it should run faster on most of the examples because it can dynamically compile Python down all the way to disassembler level. So, it's like a normal Python interpreter except sometimes faster, most times faster in fact. That's it, it sounds very simple but it's actually quite a big project, it has been around more or less ten years by now.

04:11 Wow, it started ten years ago. And when did you get involved with it?

04:14 I got involved I think 2006, or 2007. I sort of got interested in Python static analyses, which part of PyPy is doing that, is taking a restricted subset of Python, which PyPy is implemented in and compiling it down to the C level. So I was interested in Python static analyses and I glanced over PyPy project and sort of started getting involved. And then I got spot at Google summer of code to work on PyPy for the summer, and that's essentially how it all started.

04:53 How many people work on PyPy? Or contribute to PyPy?

04:57 Depending how you count- anything between 3 and 30. PyPy is a big umbrella project for a vast variety of the- anything from- as I said a Python interpreter to very researching stuff that people at various universities try to experiment with, like there is a couple of people working on running Python and PHP in the same process so you run PHP code in the server but you can still call Python functions in that process. There are people working on software transactional memories so it's a big umbrella project that is a research valuable for a lot of people additionally to being a Python interpreter.

05:41 Yeah, I can see how that would work for a few doing some sort of academic research, especially something with JIT and JC than it makes a lot of sense.

05:41 I think one of the things that people- either who are new to Python, or have kind of dabbled in it but are not deeply working with it and thinking about internals of it every day, do not realize that there is actually a whole variety of different interpreters out there.

06:06 There is a bunch, they are all slightly different, so let's glance over them because I think it's important to know; there is like the CPython as the normal Python interpreted is probably used by 99% of people using Python-

06:23 Yeah, if I open up Linux or my Mac and I type the word Python and hit enter, that's CPython, right?

06:28 That's CPython, so that's what most people would use. The CPython internals that you need to know is that's implemented in C, and also what is important to know is that exposes the C API which goes quite low, so it is possible to write C extensions in C for Python. So you write a bunch of C code use a special API in Python objects and then it can be called from Python code, your C functions.

06:58 Then we have Jython which is quite old actually and Python interpreter is in Java, and similar project called IronPython which is a Python interpreter written in C Sharp, and those two interpreters, they are quite widely used from people who write Java in one better language so they are mainly- they got integration in their underlying platform, so Jython is very well integrated in Java and Iron Python in C Sharp. So if you are writing C Sharp but you would really love to write some Python, you can do that these days.

07:40 And then there is PyPy, which is another Python interpreter written slightly differently with a Just In Time JIT compiler. So those are the four main interpreters and there is quite a few projects that try to enter this space like Pyston which is another Python interpreter written by Dropbox people.

08:01 Yeah, I wanted to ask you about Pyston, because that seems to me to be somewhat similar to what you guys are doing. And the fact that it comes from Dropbox where Guido is and there is a lot of sort of gravity for the Python world at Dropbox that made it more interesting to me. Do you know anything about it, or can you speak to the how it compares or the goals or anything like that?

08:24 So, well, I know that it's very similar to the project that once existed at Google called "unladen-swallow". So, the main idea is that it's a Python interpreter that contains a Just In Time compiler that uses LLVM as their underlying assembler platform, let's call it that way, and this is the main goal: the main goal is to run fast. Now, the current status is that it doesn't run fast, that's for sure, it either runs roughly at the same speed as CPython for the stuff that I have seen on their website. As for the future, I don't know, predicting future is really hard.

09:04 Especially if you don't have much visibility into it, right.

09:08 Yeah, like, I can tell you that PyPy probably has a bunch of different problems to Pyston. So for example, we consciously choose to not implement the C API at first, because the C API 9:26 a lot into the CPython module. We choose not to implement it first, we implement it later, as a compatibility layer. So, the first problem is that it's quite slow, it's far slower than the one in CPython. And as far as I know, right now Dropbox uses the same C API which gives you a lot of problems, a lot of constraints of your design. But also like gives you a huge benefit which is being able to use the same C modules which are a huge part of the Python ecosystem-

10:01 Yeah, especially some of the really powerful ones that people don't want to live without, things like Numpy, and to a lesser degree SQLAlchemy, the things that have the C extension that are really popular as well. So you guys don't want to mess out on that, right?

10:14 Right, so you brought two interesting examples. So, for example Numpy is so tied to the C API that it's very hard to avoid- not just Numpy, it's the entire ecosystem. In PyPy, we reimplement the most of Numpy, but we are so missing out the entire ecosystem, and we have some stories how to approach that problem, but it is a hard problem to tackle that we choose to make harder by not implementing the C API.

10:48 However, for example the SQLAlchemy stuff- SQLAlchemy is Python, but uses the database drivers which are implemented in C, like a lot of them. So, our answer to that is CFFI, which is very simple way to call C from Python, and CFFI- crazy like for most things, like database drivers there is a CFFI ready replacement that works as well and usually a lot better on PyPy, that make it possible to use PyPy in places where you would normally not be able to do that. CFFI is like really popular, it gets like over a million downloads in month. Just quite crazy.

11:40 And CFFI is not just a PyPy thing, it also works in CPython, right?

11:44 Yeah it works in CPython in between like 2.6 and 3. something I think, 3. whatever is the latest, and works on both PyPy and PyPy 3. And since it's so simple it will probably work one day in Jython 2.

12:02 You said you have a plan for the Numpy story and these other heavy sort of C based ones. Currently, the way you support it- this is a question, I don't know- is that you've kind of re-implemented a lot of it in Python?

12:15 So we- to be precise, we reimplemented all of it in RPython. RPython is the internal language that we use in PyPy.

12:26 Right, that's the Restricted Python that you guys actually target, right?

12:29 Yes. But we generally don't encourage anybody to use it. Unless you write interpreters, than it's great. But if you are not writing interpreters, it's an awful language. But we- so the problem with Numpy is that Numpy ties so closely that we add it special support in JIT for parts of it, and things like that, that we decide that are important enough that you want to have them implemented in the core of PyPy.

12:56 And so we have- most of Numpy actually works on PyPy and this is sometimes not good enough because if you are using Numpy chances are you are using SciPy, SciKit Learn, Matplotlib, and all these stuff. We have some story how to use it, which is to- the simplest thing is just to embed the Python interpreter inside PyPy and call it using CFFI. It's great, it works for-

13:25 Really, you can like fall back to regular CPython within your PyPy app?

13:31 Yeah, it's called Py Metabiosis.

13:34 It's awesome.

13:37 I am pretty sure there is at least one video online with the author talking about it. It works great for the numeric stack, which is its goal. So, this is our start, we are still raising funds to finish implementing Numpy, it's a very long tail of features. And once we have done with Numpy, we'll try to improve the story of calling other numeric libraries on top of PyPy to be able to mostly be able to use stuff like SciPy and Matplotlib. That will still take a while. I am not even willing to give an estimate.

14:19 Sure. It's great, and it does look like there is a lot of support there. We'll talk about that stuff in a little bit, because I definitely want to call attention to that, and let people know how they can help out.

14:19 Before we get into those kind of details though, could we talk just briefly about why would I use PyPy, or when and why would I use PyPy over say, CPython, or Jython like what you guys excel at, when should a person out there thinking like they've just realized, "Oh my Gosh, there is more than one interpreter?!" How do I choose, can you help give us some guidance around that?

14:52 So typically, if you've just discovered, "Oh there's more than one interpreter", you just want to use CPython. That's like the simplest answer. You want to use CPython but if you are writing library you want to support PyPy at least. Which is what most people are doing, they are using CPython and the library support PyPy. For the most part. Out typical user, and this is a very terrible description, but this is our typical user.

14:52 [music]

14:52 This episode is brought to you by Hired. Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.

14:52 Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company. Typically, candidates receive 5 or more offers in just the first week and there are no obligations, ever.

14:52 Sounds awesome, doesn't it? Well did I mention the signing bonus? Everyone who accepts a job from Hired gets a $2,000 signing bonus. And, as Talk Python listeners, it get's way sweeter! Use the link hired.com/talkpythontome and Hired will double the signing bonus to $4,000!

14:52 Opportunity is knocking, visit hired.com/talkpythontome and answer the call.

14:52 [music]

16:24 You have a large Python application, that's spooning servers, serving millions of users, and you are running into problems, like you can't serve requests quickly enough, you can't serve enough users from Machine, you are running into problems. Now, your application is too big, to say we write it in C or is just too scary for whatever reason, so you look like what they do take to you run stuff in PyPy.

16:56 It usually takes like a bit of- your code should run, but usually takes a bit of effort to like see what sort of libraries do you use; do you use NSE extensions, if there are C extensions are they crucial, can you replace them with something. So yeah, this is our typical user and I have people, I run the consulting company that does that, that people coming and asking like, "ok I have this setup, it's impossible to do anything with it now, can I just like swap the interpreters, making it run faster and make the problems go away?" This is our typical user.

17:33 The way you described it may not be the best way, but you know, you are right. If you had a 100 000, half a million lines of Python and really you just need to make it a little faster, if switching to a different interpreter like PyPy will solve that, that's great. So speaking of faster, can you talk about the performance comparisons. I have a little example I'll tell you, but I'll let you go first.

17:58 As usual, performance comparisons are usually very hard to do-

18:05 Yes, absolutely. Everybody's thing they care about is not exactly what you are measuring, and so it might be totally misleading. But give it a shot.

18:14 One good estimate is if you don't have benchmarks you don't care about performance. Like, if you never wrote benchmarks for applications the chances are you don't actually care all that much. And you shouldn't really- that's the first step, like make sure you know how fast your applications run. Once you know that, you can measure it on different interpreters.

18:34 But as far as expectations go, PyPy tends to run heavy computations a lot faster, like a lot is anything between 10 and 100 times faster, depending on the workload, for stuff that's more- and again, what is typical Python program- Typical Python program is probably "Hello world", how fast PyPy runs "Hello world"- roughly at the same speed as the CPython, we won't notice, but for a typical web application the speed up if you are not heavily relying on C extensions, will be around 2X. And so two times faster for a lot of people makes a lot of difference.

19:13 Absolutely. It also depends on where are you [bading]19:16 like you said, you should profile and figure this out, if your Python web app is slow because 80% of the time you are waiting on the database, well it doesn't really matter how fast your Python code is, your database is the problem or something like this, right?

19:30 Exactly. And the thing is like- so let's narrow down to say web applications. Let me first talk about other stuff and then let's go to web applications, like where people from PyPy incredibly useful is things like high frequency threading, like not the very crazy high frequency where you have to make decisions like multiple times per miliseconde, but like the sort of the frequency where you want to make decisions within a few miliseconds. And then those decisions are all right, tens of miliseconds, those decisions can- then you want to be able to modify your algorithms fast, which is a lot easier on Python than say on C++, and you are running into less problems with how to shoot yourself in the foot and [20:16]. So that's when people tend to use PyPy because like, in this sort of scenarios it would be like 10 times faster.

20:24 So super low latency stuff where ten miliseconds makes a huge difference to you, something like that?

20:29 Yeah. Another example is, there is for example a project called "My HTL" which is a hardware emulation layer, and these tend to emit sort of low level Python code that just do computations that are hardware, and then again, on PyPy it's like over ten times faster, so those are the very good examples. The very bad examples as you said, if your stuff is waiting on the database then you're out of luck, like no matter how fast your interpreter responds.

21:03 But yeah, on the typical web server loads even if there is such a thing, it would be around 2 times speed up, sometimes more, sometimes less, depending on the setup, really. But as I said, you should really measure yourself. The things where PyPy is quite badass, if you spend most of the time in C extensions, then either no helping or actually prevent you from doing so, and the second time where is not that great is when the program has short running. So, because it's Just In Time compilation means that each time you run your program, the interpreter has to look what's going on, pick things to compile to assembler, compile it to assembler and that all takes time.

21:50 Right. There is a little more initial start up when that happens.

21:54 Yeah, the warm up time is usually quite bad. Well, I like to think that warm up time of PyPy is quite better than Java, when it is absolutely [22:03]

22:06 It's a relative state.

22:09 It's relative, compared to CPython, PyPy time is really terrible, and compared to 22:12 again, the warm up time is terrible, but comparing to Java it's not that bad. And so yeah, it really depends on your setup, and it's typically important for long running applications; then again, this is the typical PyPy user, where stuff like server based applications where your programs run for the long time.

22:34 Right. You start it up and it's going to serve a million requests an hour, until it get recycled or something, yeah?

22:40 Something like that, I mean, these days even Javascript is long running app, like how long do you keep your gmail open, usually longer than a few seconds, so-

22:51 Yeah, that's for sure. So let's talk a little bit about the internals- could you describe just a little bit of- so if I take a Python script and it's got some classes and some functions and they are calling each other and so on, what does it look like in terms of what's happening when that code runs?

23:11 Ok, so I'll maybe start from like how Python is built and then get back to your question directly.

23:16 Yeah, great.

23:17 So, PyPy is two things, and it has been very confusing because we've been calling them PyPy and PyPy, and calling two things which are related but not identical the same name is absolutely terrible. We'll probably fix that at some point, but like PyPy is mostly two things, so one thing is a Python interpreter, and the other thing is a part that I would call R Python, which is a language for writing interpreters.

23:44 It tends to be similar to Python in a sense that it's a restricted substance of Python, but this is largely irrelevant for the architectural question. So you interpreter written in RPython than can be PyPy we have the whole variety there is Hippy which is a PHP interpreter, and there is a bunch of scheme interpreters and there is even a prolog interpreter, and the whole bunch of other interpreters written in R Python. And then-

24:15 Is RPython a compiled language?

24:18 Yes, and the other part is essentially that translation tool chain are a compiler for RPython. So it contains various things, like garbage collector implementation for RPython, that they've got like strings, unicodes and all the things that RPython supports. And but it also contains Just In Time compiler for RPython and for interpreters within in RPython. Which is one level in direction compared to what you usually do.

24:49 So, the Just In Time compiler would be sort of generated from your RPython interpreter and not implemented directly, which is very important for us because Python despite looking simple is actually an incredibly complicated language. If you are trying to encode all the script of protocol or how actually functions and permiters are called, chances are you'll make a mistake, so if you are implementing like an interpreter and then just in time compiler, it's very hard to get all the details right.

25:19 So you have- we implement the Python semantics once, and the Python in the PyPy interpreter and then it gets either directly executed, or compiled to assembler. So coming back to your question, if you have Python program, first what it does will compile to bytecode and bytecode is quite high level. There is a thing called dis module, which you can just call dis.dis on any sort of Python object and it will display you bytecode. And the basic idea which is what CPython does and which is what PyPy does too, at first is to take bytecodes one by one look what's it and then execute it, like-

26:08 Yeah, and is that like what's in the PyCash folders and things like that, like those pyc files?

26:14 Yeah, that's the PYC files are essentially the serialized version of Python bytecode.

26:18 Ok.

26:20 It's just a cash to store, to not have to parse byte in files each time you import a giant project.

26:26 Right, ok. And so then CPython takes those instructions and executes them via interpreter, but that now happens on PyPy right.

26:33 That's what happens on PyPy initially, so all your code will be executed like CPython, except if you hit the magic number of like function calls or loop iterations I think it's 1037 for loop iterations, then you compile this particular loop in fact this particular execution of a loop into assembler code. Then if you have a mix of interpreter code and assemble code, and the assembler code is in linear sequence of instructions that contains so called guards, so the guards will be anything from if something in the Python source to is the type of this thing says the same, then if you happen to fail those guards, then you, "Ok, I fell this guard, I'm going to go and start compiling assembler again".

27:29 I mean, at first you jump back to the interpreter, but if you again, hit the magic number, you compile the assembler again from this guard. And then you end up with like a tree of executions that resembles both your Python code and the type structure that you are bussing and the few other things that are automatically determined. So at the end of the day, you end up with Python function array multiple Python functions that got compiled to assembler. If you ... stuff for long enough.

27:58 Ok, that is super interesting, I didn't expect that it would have this initial non-assembler version, that's very cool. Do you know what the thinking around that was, is it just better performance?

28:09 So there is a variety of things, like one thing is that if you try to compile everything like a from it will take you forever, but also you can do some optimizations, like a lot of the optimizations down in PyPy are sort of optimistic. Like, we are going to assume special things like systray or sys._getframe, just does not happen, and until it doesn't happen, things can run nicely and smoothly.

28:43 But you are trying to figure out on the fly what's going on, and then you compile pieces that you know about. So, at the moment we're in New York compiling Python loop or a function or something like that, you tend to know more about the state of execution than is just in the serials, like you tend to know the types, the preces shape objects like is this an object that is class X, and has two attributes A and B, or is it an object of class X that has three attributes A,B and C. And those decisions can lead to better performance essentially.

29:17 So on your website you say that PyPy may be better in terms of memory usage as well. How does that work?

29:24 It's a trade off, right, so first of all PyPy does consume memory for the compiled assembler and the associated bookkeeping data, that depends on how much code you actually run, but the object representation of Python objects is more compact in PyPy, so the actual amount of memory consumed on your heap tends to be smaller, like all PyPy objects are as memory compact as CPython objects using slots.

29:59 Right, ok.

29:59 So it's the same optimization except it's transparent. Then, like a list of integers would not allocate the entire objects, it would allocate only small integers; then the objects are small themselves because we use a different garbage collection strategies, not ref counting as garbage collector.

30:22 Right. So, let's talk about the garbage collector just for a moment. Is it a mark and sweep garbage collector?

30:22 [music]

30:22 This episode is brought to you by Codeship. Codeship has launched organizations, create teams, set permissions for specific team members and improve collaboration in your continuous delivery workflow. Maintains centralized control over your organization's projects and teams with Codeship's new organizations plan.

30:22 And, as Talk Python listeners you can save 20% off any premium plan for the next 3 months. Just use the code TALKPYTHON.

30:22 Check them out at codeship.com and tell them "thanks" for supporting the show on Twitter where they are at @codeship.

30:22 [music]

31:18 It's a very convoluted variated mark and sweep. It has two generations of objects, young objects and old objects, and old objects are mark and sweep and young objects are 31:30 allocation. So, the net effect is that if you are having a lot of small objects that get allocated all the time and forgotten really quickly, allocation takes like on average around one CPU instruction. It's an average one because it takes like slightly more but then you have pipe lining so sometimes they just like do less.

31:55 Ok. Do you guys do compaction and things like that as well?

31:58 No, but we do copy old objects from the young generation to the old generation, then we don't compact the old generation but usually more compact than your normal setup where you have a lots of objects that they are scattered all over the place, because you only have to deal with objects that survive minor collection.

32:18 Right, and that's the majority of objects that we interact with.

32:23 Vast majority.

32:23 Absolutely.

32:24 For the most part.

32:25 Ok, that's very cool. One of the things that is not super easy in regular Python is parallelism and asynchronous programming, and so on. And you guys have this thing called the stackless mode, what's the story with that?

32:42 It's the same thing as stackless Python, it gives you an ability to have coroutines, that can be swapped out without an explicit yield keywords. So it's not like Python 3 coroutines, it's like normal co routines where you can swap them randomly. For example Gevent users, stackless mode for swapping the coroutines.

33:07 Ok. So, you said that you can get better concurrency- can you kind of describe, speak to any or what are your thoughts there?

33:15 I personally don't use stackless all that much, but, the net effect is that you can write code like with Python 3 coroutines, without the yield keywords, so you just call a function then you can swap the functions for other things. It's a bit like implicit twist, that where you don't get better concurrency than Twisted but you don't need to write your programs in this style that twisted requires. So-

33:43 I was going to say, it's just a little more automatic, you don't have to be so explicit that you are doing threading.

33:50 Yeah, yeah. Exactly. The normal threads, especially in Python that the global interpreter lock, they don't scale all that well and the solution is usually Twisted but Twisted requires you to have all the libraries and everything written Twisted aware which stackless does not generally requires. I don't have any particular feelings towards all of that to be honest.

34:16 Sure. Did it also support Twisted, running on PyPy?

34:19 Yeah, obviously. Twisted is a Python program. We had from the very early days we had good contact with Twisted people and people who use Twisted tend to be from the same category as people who use PyPy. People who have large running code bases that are boring but they have problems because they are actually huge. I mean not huge in terms of code base, but huge in terms of numbers of requests they serve and stuff like this, so they tend to be very focused on how to make the stuff work both reliably and fast.

34:56 So, for example, like a typical answer to your Python performance problems, oh just rewrite pieces in C. Well that's all cool, if you have like few small loops that you can write in C and have everything fast, but most web servers are not like this. If you look at the profile of just 35:15 it's tones of dictionaries and things that are not easy to rewrite in C; and C introduces security problems.

35:29 No definitely not, or even reliability, right?

35:32 Yeah, so all those problems Twisted people tend to write like Python better than C, and they've been very supportive of PyPy from the very first day, so they generally PyPy is running Twisted and is running Twisted quite fast for quite a few years.

35:51 Yeah, that's excellent. It seems like if you have a problem that Twisted would solve you also would probably want to look into PyPy.

35:58 Exactly this is like the same category of problems that you are trying to solve. Another interesting stuff about concurrency which I guess I am slightly more excited about this is software transactional memory, that I am working on right now. So this is one of our fundraisers, just like Numpy-

36:16 Yeah, so this is one of your three sort of major going forward projects if you will. And-

36:22 Yeah, those are the three like publicly funded projects.

36:24 Right. And if you go to PyPy.org, right there in the right it says donate towards STM, and you guys have quite a bit money towards this project and so, and it's excellent. What is software transactional memory for the listeners?

36:39 The two ideas, related but not identical. First problem is that Python has the global interpreter lock. So, global interpreter lock essentially prevents you from running multiple threads on multiple cores, on one machine. So if you write a Python program and you write it multithreaded, it will only ever consume CPU. Which is not great if you want to compute anything.

37:06 So that's one problem that STM is solving, and I will explain just now how it is solving it; but another problem is that it's trying to provide a much better model for writing programs with threads. If you start using threads, the Python mutable model makes it so hard to write correct programs. You are essentially running into problems like certainly, well ok I will have to think who will be fight what and in what order, and because all the possible combinations

37:37 ...make sure that every bit of code that is going to work with this segment of data is taking the right locks and all that kind of stuff that gets really tricky right?

37:47 Yes, so essentially the models where if you write program in C, you write the program so fine, then you switch the threading, I need that performance immediately. Like, your program if you write threads correctly will rely on that, 4 times faster on 4 codes, whatever. But, it will likely crush and it will likely crush in the next couple of weeks, months, years whatever. Because you need to get 100% correctness back. So, STM works slightly differently where you essentially write programs in amount where you- it looks like you put the gigantic lock around everything that matters in your program, so you write one event loop and you are like, ok, this loop will consume locks or whatever, consume some sort of data in an unordered queue and you can add to the queue in unordered way, and then you put the giant lock over like the whole processing. If you write that sort of program with normal threads and normal locks it will be correct but it won't run fast because everything will be inside the giant locks.

39:04 Could be more or less serial but all the complexity in your code are doing parallelism anyway.

39:09 Yes, STM stands for Software Transactional Memory, and it works roughly like a database where you run multiple transactions and then if you don't touch the memory, from 2 threads at the same time, then it's all cool, and if you touch the one of those, it gets aborted and inverted and you can only commit a transaction if the memory access was right. So if you think again about the model of where you have one gigantic lock, means it will run in parallel optimistically a few versions of the same code on different data, and if they tend not to conflict, if they can be run serially in the sense, like they modify some global data but not in the conflicting manner, then you'll get parallelism for free.

40:04 But if they do conflict every now and again, then that one of the guys gets reverted back to the start so the net effect is that it looks like you are running stuff serially for the programer, and you get correctness for free, if you write it in the way that's naive then you won't get performance because your stuff will collide all the time. But then you can use tools and look where it collides and remove those contention points and you get more and more performance. Which is almost the same goal but the difference is that if you have 100% performance and 99% correctness your problem is still incorrect and you can't run it. You have 100% correctness and 99% performance, you are mostly good to go.

40:50 Yeah. Would you rather be fast and wrong, or slow or right is sort of- you know, there is really interesting classification of those types of problems that you only see every very very rarely for humbly sort of race condition or timing threading problem, and I've heard people describe those as heisenbugs, because as you interact with the program trying to see the problem, you might not be able to observe it but if you are not looking, all the sudden boom- the problem, the timing realines and it's a problem again. It's the very frustrating-

41:23 So it's important to look at, so that the usual answer for those problems in Python is just use multiple processes.

41:31 Yeah.

41:31 And using multiple process works for category of applications and web servers tend to be one of those because they only ever show data that's either cashes or database-

41:43 Usually that's another process anyway like redis, or it's in a database like Mongo or SQL or something like that?

41:49 So you don't care but like there is a whole set of problems where this is not what you have, you have data that's mostly not contentious but you still have to share it and work on it, you can't afford to serialize and parse between processes, and yet you want to have a correct results, so this is what STM is trying to do. A set of problems that can be solved by just splitting stuff into processes.

42:18 Right. Maybe something very computational or scientific whereas iterative or something would be way harder.

42:26 Well, essentially anything where you have data that mostly do not conflict and you can do it in Perl but it's a big data set that you work on it every now and again, you tend to conflict like- graph algorithm, a great example, you have this like large complicated data structure in memory and most of the time you are working different parts of graphs so you don't care. But every now and again you'll find contention of one graph because two parts are doing stuff on the same [42:58] and then you are like that's wrong. And writing the sort of stuff using threads is really hard.

43:02 Yeah so that has a lot of promise. Do you know when it might start to show up as a thing people can use? Is it there yet?

43:08 So it's already there to an extent, you can download the STM demo somewhere, and the STM demo works, it might not scale how you want it, it might not work how you want it, but it should generally work and scale. So the current version only scales to like 2 or 3 course, and given comes the quite hefty cost of like 1.5 to 2 times slower on each core, it's not that useful. So the next version will try to reduce the overhead of single core and improve this to more cores.

43:47 And then we'll see how it goes. It's going on quite well, I would expect like, I mean, there are consecutive prototypes that are usable to some extent, like we manage to get some performance improvements running on multiple cores but they are like 20, 30% range which is just not that exciting, but on the other hand they were mostly for free. Which is again, something that you might say what if I rewrite- no, the point is you don't have to rewrite, it's very simple change and then you might get some performance benefit.

44:22 Yeah, that's fantastic. One of the other projects that you have on your donation list as a major thing you guys are working on is Py3k in PyPy. What's that?

44:34 It's the Python 3 implementation of PyPy. So as I said before, we have various interpreters in PyPy that are only implemented in RPython, the one of those interpreters is a Python 2 interpreter and one those interpreters which is less complete is Python 3 interpreter, and it's like 3.2 by now. So we need money to push it forward and help I guess to push it forward like 3.3. or 3.4, or even 3.5 to bring it more up to speed. One thing that we don't do in PyPy is we don't debate the Python language choices, and that thing has served us well, so for example I don't work much on the Python interpreter itself, I work a lot on the RPython side of things and most of the improvements help all of the interpreters not just the Python interpreters. So I personally don't care if it's Python 2 or Python 3 the improvements are all the same to me.

45:36 Right, that's great. Then you also have a section towards just general progress, and then the last one is Numpy. What are you guys trying to accomplish with that sprent or whatever you call it?

45:49 So as I said before, the Numpy stuff is we want the re-implement the Numpy so then numeric part the operation on arrays and it had like a very exciting project for summer of code that does vectorizations so using SSE for Numpy, and then we want to integrate more of the have the way to call more of the whole ecosystem of numeric Python, so SciPy, Matplotlib, all this stuff that's outside of the scope so want the core of Numpy implemented in PyPy because those things are too low level to just call external library, and then we want to have a way or multiple ways depending on the setup to call all the other ecosystem, and this is essentially what those goals are here.

46:42 Those are three ambitious and very cool goals, very nice.

46:46 Well they have been around for a couple of years I think. So you are working towards them, and we have people working right now on all three proposals, as far as I can tell.

46:57 Yeah, that's great. So one thing that's related to PyPy that you've done individually is the what is it called the JIT viewer? Can you talk about that briefly?

47:07 So the JIT viewer is a bit of an internal tool, for visualizing assembler representation of your Python program, so is very useful for if you are really interested in how PyPy compiles your program you can look into that, so one related project that I've been working on recently quite a lot is called vmprof, and vmprof is low overhead statistical profiler, for Python, for vms in general, but we are going to start with CPython and PyPy. So those are tools that help developers find their problem X in the code and find how to improve performance usually because you can understand that you can usually improve it.

47:54 Yeah, that's excellent. Yeah, we have been talking a lot about how PyPy makes stuff faster; but, before you just say, well we are switching to some new interpreter, maybe it makes sense to think about your algorithms and whether they are slow and whether or not that switch would even help.

48:09 So, it really depends on the situation. Sometimes you switch without thinking about it and sometimes it doesn't make sense even if you think about it first. It really depends on your program and what are you actually trying to achieve.

48:25 Yeah at minimum you probably want to measure and profile your app and then try it on PyPy and profile it again and just compare and see how you are doing, right?

48:34 You definitely want to measure, you definitely want to know how fast your application is actually running before attempting anything.

48:40 It felt a little faster, let's do it.

48:43 You are laughing, but we've seen people like that, like my application runs faster on Python my local machine but not on the server. Ok, how did you benchmark? Oh I looked at the loading time chrome like developer tools and like that's not good enough, usually. That's like, yes, it might be slower because your network is slower, I don't know...

49:07 Exactly.

49:08 I don't know what your setup is.

49:10 Maybe in the peak time is a 100 miliseconds the request time is 10 miliseconds. So it's really slow, right?

49:10 Awesome, all right, Maciej this is probably a good place to wrap up the show, this has been such an interesting conversation, I'm really excited about what you guys are doing and I hope you keep going.

49:10 I want to make sure that people know that the source code is on Bitbucket, go to Bitbucket.org/pypy, that's the main repo?

49:34 The main way to contact us is usually through either mailing list or IRC. We hang out on IRC a lot it's #pypy on the free note and we are usually quite approachable when you come with problems and one interesting thing is if you find your program running slower on PyPy than CPython, it's usually considered a bug unless you are using a lot of C extensions.

49:59 Right, so if people run into that maybe they should communicate with you guys and-

50:03 They can definitely file about and complain.

50:06 Excellent. Two quick questions I typically ask people at the end of the show- what is your favorite editor, how do you write code during the day?

50:12 I have Emacs actually. It does all kinds of weird stuff, and I am way more proficient in 50:24 that I ever want to be.

50:26 A scale you didn't really want to earn but you have done it anyway, ha?

50:30 Something like that.

50:31 Yeah. And then also what's a notable or interesting PyPi package that you want to tell people about?

50:40 That's a tough one for me, because I don't actually write all that much Python code that's using libraries.

50:46 You can't import too much into like the core, right?

50:49 Right, but definitely and it is self promotion but definitely CFFI is something that I would recommend people to look at as a way to call C, because this is something very low level that has been very successful as a simple, simple, simple way to call C.

51:06 That's cool, and if I was right in some program in Python and I had some computational bits I'd written in C I could wire them together with CFFI?

51:17 You would be surprised how few people actually do that, like most of the time it's like, I have this Python program and I have this obscure C library that accesses this weird device that nobody heard about- and then you can call it somehow and that's why you call C. The computational bits it's actually quite rare, but that would be an option to.

51:37 Yeah, sure, sure.

51:37 Ok, awesome. And then, finally, just you said that you do some consulting- do you want to maybe talk a little bit about what you do so if people want to contact you or anything like that?

51:48 So, the website is baroquesoftware.com and essentially what we do is we make your Python programs run faster. The same thing as we do in Open, so it's like site for their commercial side. So typically if your open source software is running too slow, just come to IRC, and if your commercial software is running too slow, we can definitely do a contract with you to make it run faster.

52:11 Yeah, that's awesome. So, I am sure people who are having trouble might be interested in checking that out, so great.

52:11 Maciej, this has been super fun. I've learned a lot, thanks.

52:19 Thank you Michael, have a good day.

52:22 Yeah, you too.

52:24 This has been another episode of Talk Python To Me.

52:24 Today's guest was Maciej Fijalkowski, and this episode has been sponsored by Hired and CodeShip.

52:24 Thank you guys for supporting the show!

52:24 Hired wants to help you find your next big thing. Visit hired.com/talkpythontome to get 5 or more offers with salary and equity right up front and a special listener signing bonus of $4,000 USD.

52:24 Codeship wants you to ALWAYS KEEP SHIPPING. Check them out at codeship.com and thank them on twitter via @codeship. Don't forget the discount code for listeners, it's easy: TALKPYTHON

52:24 You can find the links from the show at talkpython.fm/episodes/show/21

52:24 Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes and direct RSS feeds in the footer on the website.

52:24 Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. You can hear the entire song on our website.

52:24 This is your host, Michael Kennedy. Thanks for listening!

52:24 Smixx, take us out of here.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon