Learn Python with Talk Python's 270 hours of courses

#172: Nuitka: A full Python compiler Transcript

Recorded on Monday, Jul 16, 2018.

00:00 Michael Kennedy: Quick, name some ways to make your Python code faster. Did you think of PyPy, the JIT-compiled version of Python? Maybe some async and await parallelism? How about Cython, where you write in a Python-esque language that compiles to machine instructions? Well, I'm here to add a new one to your vocabulary. Nuitka. Nuitka is like Cython in that your Python code is compiled into true machine instructions rather than interpreted. But, unlike Cython, you can take standard Python3 that runs just in regular old Python, CPython, without changing the syntax at all and still compile that to machine instructions. And Kay Hayen is here to take us on the journey of Nuitka, a project he's created and has been overseeing for some time. This is Talk Python to Me, Episode 172, recorded July 16th, 2018. Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython. This episode is brought to you by Cox Automotive and Rollbar. Please check out what they're offering during their segments. It really helps support the show. Hey, everyone. Thanks for tuning in like always. I have two things I want to share with you real quick before we get to the interview. First, I owe you an apology. You don't realize it yet, but you will pretty soon. When I recorded this, the microphone cable that connects my good microphone, the one I'm talking on now, to my computer, got bumped or something and then the computer just dropped the microphone and fell back to something terrible like my iPhone headphones or something like that. Anyway, the audio is not up to where I would like it. It's not terrible, it's somewhat like a lot of my guests, I guess. But it could be better and I really work hard on that as some of you know. And this one, it's not up to my standards. So hopefully you enjoy the conversation. You could still hear it just fine. It's just not as smooth and nice as this microphone. So please bear with me. It's still a great conversation and absolutely one worth listening to. Secondly, I just released a new course and I'm really excited about it. This is one that's been in the works for a really long time. It's called Building data-driven web apps with Pyramid and SQLAlchemy. And we cover all sorts of cool stuff around building web apps, but also things like Alembic for migrations and unit testing web apps and deploying to Linux and a whole bunch of things that I think you'll find useful. If that course sounds interesting to you, please check it out at talkpython.fm/pyramid. Have a look through the course page, watch the course intro video, and see what it's all about. Hopefully you love it. I really am glad that I got it out there for everyone who's been waiting for it. All right, now with that stuff out of the way, it's time to chat with Kay. Kay, welcome to Talk Python.

03:02 Kay Hayen: Yes, hello Michael. I'm glad to be here.

03:04 Michael Kennedy: Yeah, it's good to have you here. We've been trying for awhile to set this up. We've both been on and off traveling and things like that, but it's great to have you here. And, I'm really looking forward to speaking to you about your projects and your efforts to make Python faster, which is...

03:18 Kay Hayen: Yeah, yeah. As you can imagine, I'm very excited. Very excited too.

03:22 Michael Kennedy: Yeah. It looks like you're having some really great success, so I'm excited to share that with everyone. Let's start with your story. How'd you get into programming in Python?

03:31 Kay Hayen: Yeah, I started as a kid, so I think that was a long time before the internet. So when I was 11, the first thing I ever saw was C64, Commodore 64, my brother who brought it home. I had like one or two days of learning BASIC on it. And then he went away with it and I didn't have a computer anymore, so what I did was programming on paper.

03:54 Michael Kennedy: Wow, that's some dedication.

03:57 Kay Hayen: Programs with my with mind, yeah. That was, it was bizarre.

04:00 Michael Kennedy: It's hard, but I suspect that that's actually really good training because a lot of people, especially...

04:06 Kay Hayen: We editors want good balance. Paper are usually a lot more flexible.

04:12 Michael Kennedy: That's probably true, actually. But just a lot of people, I feel just poke at the program when they're new. They'll write something that doesn't work. Well, let me just change it and see if it works now. I'll change it in a different way. You don't actually so deeply think through. You do it on paper. You got to think I through.

04:32 Kay Hayen: Yeah. I didn't learn everything from my brother. I wanted to write something I think hangman, something like this. And I came up with there should be something like a loop. But I didn't know what it is. And so the school yard I pestered all the kids who I knew who were also programing. There were like maybe three other guys that I knew of. I pestered them and asked them questions. I learned about the goto statement.

05:00 Michael Kennedy: Like the early loops. The early unstructured wild days of programing before functions and things like that.

05:06 Kay Hayen: It was, you were an island. So, the cursor was blinking and you would figure out everything on your own. There were barely people to talk to. Also, I grew up in a small village. Went to school in a small city. Very small city. Six thousand people, maybe. And, I think in my village there was not a kid who had a computer that could actually do some programing. At least I wasn't aware. I was very isolated.

05:38 Michael Kennedy: Yeah. I think that's one of the bigger differences now. That you're connected, right?

05:44 Kay Hayen: That's what make me, self; I don't know. I was autodidactic all my life. That's probably stemming from there. Because later I learned Stack Overflow programming. You type what you want. You copy paste your code, you're done. Initially it wasn't like that. So I recall learning the sampler. Coming from Basic it had these line numbers. And, I thought addresses are like line numbers. And, I wrote down my program. And when I wanted to enter the statement I just took a number between, two addresses. And, yeah. If you know sampler you know about corrupts the program. I didn't get it for a long time what this was all about. So, yeah. But eventually I learned the 68K series of processor was almost like a high-level language. The C64, it had no registers. You couldn't learn that. That was too crazy. I couldn't get anything done. I think it couldn't multiply.

06:53 Michael Kennedy: That was crazy.

06:54 Kay Hayen: But this one was actually pretty good. And I learned sampler in C. Later I did C++ for some real programs. Actually had a day job. Met again with my brother who was studying Mathematics and Computer Science. And he's pretty much older he's 13 years older. He learned computers from these, I would call with the punch cards. And he told me, learn Perl. If you know Perl, you will always be the only guy who can do stuff. And he was right. So, Perl was my absolutely first love. I had learned a lot of languages until then. But Perl was amazing. And I read the book of Larry Wall who was a linguist designing a language. And he designed the language in a way that it fits your mind. So you would write what you think. However unstructured that it is. So, unlike Python where you are forced in a certain outline of a code, you could write unless at the end of statements. You could basically write down as you think that you would never have to go back. You could always just continue writing. That was a really productive way of programing.

08:17 Michael Kennedy: I haven't done much with Perl. But that sounds quite interesting.

08:21 Kay Hayen: Yeah. Python was upcoming. But it was very weak in terms of, this is 2002 we're talking. It was very weak in terms of supported stuff. With Perl you have modules for everything much like with Python today. Nothing can compete with Python anymore. But back then CPAN was, if you want to do stuff, it's already there. So why bother with Python which cannot even send an email. You couldn't get stuff done. But it was much more readable. Yeah, 16 years ago when I switched jobs we were working on a product which had a lot of sed scripts and awk scripts and it was pretty ugly. And we wanted to improve on it. And Perl was of course an idea, but a better idea was to use Python. So I switched to Perl from that. I've since fallen a bit in love with Python, actually a lot. But it's not my first love. It's never the same.

09:24 Michael Kennedy: Yeah, I hear you.

09:26 Kay Hayen: Perl was the first to enable me to do what I think a lot of people have perceived Python to be.

09:31 Michael Kennedy: Right, yeah. That's really cool. But I think...

09:34 Kay Hayen: So, teamwork it's absolutely much better.

09:37 Michael Kennedy: Yeah, for sure. And I think what's really interesting, you brought it up here. Often people debate languages on the syntax. This syntax is better than that syntax. It would most of the time rules the day is, yeah but I have all these libraries that actually let me do stuff when I put the syntax in place, right? And that's so important and often forgotten in these language debates.

10:02 Kay Hayen: Yeah. I think Java also has this rich ecosystem. C++ sort of, but not as productive I think. Yeah, getting things done and being quick. For many tasks with Perl, I've done things for my day job that were just, pick a name.

10:22 Michael Kennedy: Yeah, great. Nice.

10:25 Kay Hayen: And Python, too. And, I think now everybody knows this. And everybody that's using Python. That's pretty much, that's a lot.

10:33 Michael Kennedy: That is a lot of people today; it's pretty amazing, the growth. So, speaking of your day job, what are you doing these days?

10:39 Kay Hayen: Currently I'm on to leave. So, I have a one and half years kid now. And there's this recordation in Germany which forces your employer to give you time off when you have kids. And you get paid by the state for 14 months total, both for us. So my wife took 12. I took two of them. You get a reduced payment for that. And on top of that I added four months of unpaid parent leave so I could focus on Nuitka. That's what I currently do. I have free time, two months to spend on Nuitka, and I'm at the start of a chunk of that right now. In reality that parent leave. I mean, you're a parent. Yeah, it's a lot of work to care for a small kid. And especially with my wife, she's a doctor. And I'm very emancipated because for roughly 10 days per month she's not here for 24 hours. So, actually I get a fair share of work. But when I will return I will be what I've been for 16 years now, I will be an engineer for management software. Also did a bit of project management. But mostly I've been an engineer on the side.

12:00 Michael Kennedy: That sounds like software that you have to be really careful with.

12:03 Kay Hayen: Yes. That's actually. We ready everything. It's the requirements, the code, tests; everything gets looked. At least for us. Through several traces and, it's amazing. I was very lucky that the project that I'm working, it's very old. Is it 30 years now? I think it's getting 30 years old or is already 30 years old. I was blessed with the opportunity to lead the project to replace the middleware because what's close source software. And the idea was to be able to distribute it to parts of the world where we didn't, or the owner didn't have a license for. And so I got to replace the middleware. And I got to introduce Python in a safety critical software. That was also an amazing experience and a huge success and it's also a reason why a lot of my day job now is Python. So, that's the angle because, obviously that's not easy. Java can't go there.

13:11 Michael Kennedy: Yeah, that's cool.

13:12 Kay Hayen: Because of the virtual machine doing it's cleanups whenever it wants to. But for Python we managed.

13:20 Michael Kennedy: Wow, that's really interesting. I definitely forgot the JIT, GC languages. The unpredictability of the pauses is just too much sometimes. Yeah, interesting.

13:30 Kay Hayen: Yeah, but Python is much more predictable with its garbage collection. JITing is actually, can become a problem. Threading is also something we're not allowed to do. So, we are using modular processes and that fixes Python's weaknesses pretty well.

13:50 Michael Kennedy: That's right. You have to do it the way that Python has to programs, the way we...

13:54 Kay Hayen: Yeah. Because, actually threading this is terrible. I recall writing an editor. And this editor was also a file system. So I would type on my editor and launch a compiler. There was no safe. It wasn't necessary. I was just using the file system that was the editor and I had a script in my work as well. And these were all working on the same data. And I learned an important lesson from there. You don't do threading. It's terrible we're locking. It's such a time killer. I'm not a fan of threading.

14:31 Michael Kennedy: It sounds like it. With modern processors sometimes it's necessary. But yeah, maybe not the first thing to jump to.

14:40 Kay Hayen: Depending on your problem, you don't have a choice. When it's about safety. I think a process and a thread is not much of a difference depending on how you work with it. It's roughly the same. It's just more difficult. Because with threads you get access easier but also careless.

15:00 Michael Kennedy: Yeah, exactly. This portion of talk Python to me is brought to you by Cox Automotive. They're leading the way in cutting edge industry changing technology that's transforming the way the world buys, sells and owns cars. And they're looking for software engineers and technical leaders to help them do just that. You hate being stuck in one tech stack? Well, that's not a problem at Cox Automotive. Their developers work across multiple tech stacks and platforms. They give you the room you need to grow your career. Bring your technical skills and coding know-how to Cox Automotive. You'll create real-world solutions to today's business problems along side some of the best and brightest minds. Are you ready to challenge today and transform tomorrow with Cox Automotive? Go to talkpython.fm/cox C-O-X And check out all the exciting positions they have open right now. So, speaking of performance let's talk about your project Nuitka. Yeah, so maybe let's just start with a really quick overview of what it is and then how you got started on it.

16:01 Kay Hayen: The first thing is, it's a fully compatible Python compiler. So it is everything and it's compatible. So the idea is, if you run a program with the standard CPython interpreter, or you run it with Nuitka, it does the same thing. So that's basically only a clone of CPython that is capable of producing binaries just faster.

16:25 Michael Kennedy: Yeah, that's really neat. And one of the things that I thought was cool about it is you don't have a separate language, right? It's not like, say Cython where you write a slight variation of Python that then can do this, right? I can take something I wrote yesterday and just run it through here.

16:41 Kay Hayen: That's exactly my point. It's like you have no lock in. You can, anytime something is behaving strange you have these other implementations or the other Python implementations and you can switch to them. And, it's a drop in replacement for as long as it works. And when it doesn't, you can use something else. That's the point. And you can, and this is very important, you can drop a million lines of code on it and have it converted. That's the important point because for small programs which you fully write on your own you can manually. But for libraries from third parties that you have no clue about, you cannot just live with any kind of limitations. A big part of the problem was to get everything working.

17:33 Michael Kennedy: Yeah, to give people a sense during your EuroPython talk, you showed compiling Mercurial which is; is it a million lines of Python?

17:43 Kay Hayen: I don't know.

17:45 Michael Kennedy: It's large, though.

17:46 Kay Hayen: It's not a million lines, I think. But it's a substantial project. I think a million line is overstatement. Python is not that bad.

17:54 Michael Kennedy: It's true. But it's a seriously large project. It's not just, oh look, I can take this and I can write a calculator app.

18:03 Kay Hayen: To make no changes to my Mercurial. And actually I was capable of doing this in 2012. This was my first demo. When Nuitka was more like a feasibility study back when I had. It was more like a templating from Python to C++. It was just going to demonstrate that this is achievable. Since then it has evolved into an actual real compiler under the hood. Still capable of doing the same things and becoming more compatible. So, yes. People are using this in real life on real programs. Obviously, some of the things are not working. But I get a lot of positive feedback on the level of compatibility. I was always thinking that for such a compiler to be acceptable it needs to run foreign code and guaranteed to behave the same. And otherwise you just cannot trust it.

19:03 Michael Kennedy: So how to you do, I'm probably getting things a little out of order but I'm wondering how you ensure compatibility? You run the tests that they run against CPython itself, or things like that?

19:14 Kay Hayen: I'm taking the CPython test suite. And, for example the also the Mercurial test suite, and run it with compiled and uncompiled. And it must pass or fail the same way. Actually, it was CPython test suite. For me, it never really passes all things. And then I run the CPython 3.6 test suite with 3.5, and then I get a lot of exceptions. So I get extra coverage for error cases. And these things. So that's one way of doing it. Obviously then, users report me back with incompatible behaviors that they still encountered.

19:54 Michael Kennedy: Yeah. Maybe something that wasn't covered in the test, right?

19:57 Kay Hayen: Yeah. Sometimes that happens. Often, they describe the behavior of things in the test pretty well. And sometimes I just add another test to see where it is. But there's many times or instances where for example running Mercurial tests would misbehave and I would trace it down to incompatible behavior of Nuitka. This has happened in the past.

20:25 Michael Kennedy: Yeah it's cool they have these large programs that are open source. You can go and just try them.

20:30 Kay Hayen: Actually, I would like to go further. Because, right now running a Mercurial test which is something I do. This is a lot of CPU cycles. I wanted to with talks these kind of projects and get people to just switch on running their own tests with Nuitka and see if they pass as well. That's something I need to work on to get out and get even more coverage.

20:54 Michael Kennedy: Right, if you could get all these other projects with all of their continuous integration automatically running their tests or your stuff that would be great.

21:03 Kay Hayen: Maybe not automatically but they have to enable that. They shouldn't just say much more. If you want, you're running the test with PyPy I think it's as simple as saying do it. List PyPy in enumeration and Nuitka should also be there. But that's just a plan. Right now I'm relying on people doing things manually and reporting back. And some people run the test suites with compiled stuff and I am still pretty early with integration of Pytest. Running you tests, the compiled code and then uncompiled tests with Pytest. That's working with some tricks But it's not documented yet. And it's not complete. There's another field where I want to get more exposure is test suite integration. It's something that people have started through pull requests on GitHub that's pretty awesome because you don't even have to learn how Nuitka works. You just install Nuitka, enable a flag and it outputs a wheel which is a binary for the given platform instead of no arch wheel. And that's ridiculously easy to use for existing projects. Right now it's a manual rerun of what setup.py does that you have to specify on the command line. And that's another field of integration. So it should also be simple to use. That's sort of my goal.

22:37 Michael Kennedy: Yeah, that's really great. The more you can just make it just another command you give things. You got to really understand it. That's cool. So maybe we can start looking at what you built by comparing against other things people might know. So, people might know Cython. Let's see if I can summarize it.

22:56 Kay Hayen: I think Cython is a lot more successful in terms of adoption because there are people willing to write this hybrid language code and get the optimization benefits of manual work. And it's used quite a lot. Actually I worked with Cython for a while. I was listed as a contributor. But I asked them to remove it since. Initially I wanted to turn Cython into compatible Python as well but this didn't work out. It's a pretty good solution for what it does but I think it's bad for the ecosystem for all the reasons that we set the IDE integration and these lock in. You cannot switch. Cython has a lock. There's no way of comparing to anything else.

23:52 Michael Kennedy: Right. You're stuck with it. You got to just work around it sometimes.

23:57 Kay Hayen: Yes. I have a lot of respect for Cython. I'm an even bigger fan of PyPy which I also contributed a patch to. PyPy was our Python specifically. But reduced Python for something which tried out to actual code of mine. Long, long time ago. It's such an amazing project. And obviously very little commonality. It's a totally different approach. It's also a very active project. I think they've been a lot more successful at getting funding. So I'm not getting that, I think they got money from the European Union initially and they also got fundraisers and PSF endorsement and all these things. And their huge benefit is many of the benefits that Nuitka also has. It's a drop in replacement. You don't do anything. It works when it works. Sometimes it doesn't. It requires too much memory. Can happen. I'm not so much using it. But I think it's a great alternative or maybe even... I don't know if Nuitka does succeed I think it's a great thing but PyPy is here...

25:22 Michael Kennedy: Yeah. That's really cool. Cython to me feels a little bit like inline assembler it used to be for C, C++. Right? So I've got most of my code the way I like it. But this little part we've got to make this a lot faster. Let's re-write it here to be able to go faster. And so that's Cython. And then PyPy is P-Y-P-Y. The JIT compiled CPython runtime mostly compatible alternative to Python which sometimes is faster, sometimes it's not. Pretty interesting project.

26:01 Kay Hayen: I think they've made progress with the extension modules but it was a large holding back factor. I think they might maybe do now PyQt. I'm not so sure.

26:13 Michael Kennedy: Yeah, I think that's right, they did. Yeah.

26:15 Kay Hayen: Yeah. But, with Nuitka I can do all the extension modules. I think they can do some or all now. I'm not so sure what the current state there. I don't follow it, actually. But it's...

26:28 Michael Kennedy: I've had mixtures also with it as well. I really think the project is awesome and I'm glad it exists. I know some places it makes things a lot faster. I switched out one of my larger websites to run on top of it, and it did, but it was two and a half times slower. And well, that's not helping me. So I guess I'll just switch back to CPython. And I think it was probably C extensions and the data. I'm not entirely sure. It was compatibility.

26:57 Kay Hayen: Yeah. If you're on Django or something. That's really something where I need to go to get Nuitka to work out of a box for Django projects, which are their own mess. The manage.py is doing all sorts of nasty things that you cannot know about at compiler time. But I think the Django even by bit will be very welcome to many people. And that's also something somebody should go eventually. But I'm not focusing on that right now. I'm focusing on optimization. So, for Nuitka I was observing the landscape for many years and all this time I was thinking the compiler for Python that's totally possible. And, why is nobody doing it? And that was just I think a couple of years that I was observing it and wondering. And I think around the 2.5 time frame I thought I'm going to make a prototype. Make a proof of concept and show that it's possible. And that turned out to be pretty much the case because the Python design is very welcoming to be compiled, actually. It's a trivial. It's a non trivial task. It's very hard work. But you can do it for Python if anything. So I always compiled function type. And as the uncompiled function type in Python. And all I have to do is for them to behave the same. And then you're in. So you can have a compiled function. That's basically, a lot of how you get to work. And then you don't have Py code but you have C code behind.

28:45 Michael Kennedy: Yeah. And when there's C code compiles, the machine instructions, and then you're off to the races. Yeah.

28:50 Kay Hayen: Yes. And the Python engine really doesn't care all that much. Because extension modules are first class citizens.

28:58 Michael Kennedy: Right. So let's talk about the architecture just a little bit. So, the way it works is we take regular straight Python code. We feed it to Nuitka. Nuitka translates that into a reduced Python. The reduced Python is then translated into C. The C is compiled basically to a C extension and that runs on top of CPython. So, Nuitka is not it's own special runtime like say PyPy is.

29:28 Kay Hayen: No.

29:29 Michael Kennedy: It's a thing that... Do you have to install the runtime for it. Or does it just create. You have to install CPython 3.5 in order to run it? How is it put together?

29:38 Kay Hayen: There's two modes. There's a stand lone mode where you create something which will be self-contained and then contains Python runtime. And there's also a accelerated mode and then it just links against a Python installation and loads lib Python like every program that embeds Python needs to do. And that's how it goes. And I try to avoid C runtime, the Python runtime as much as possible. And that's how I accelerate things. Not having Py code and then having knowledge of static optimizations. Trying to avoid to go as much into the Python runtime as possible. It's basically the acceleration.

30:24 Michael Kennedy: Yeah. But you're still, like that C code I talked about being generated in the architecture. That still uses the Py object types and stuff like that, right? As if you're basically your compiler is writing the C extension from our Python code.

30:41 Kay Hayen: That's true, except for; did you say R Python? The PyPy.

30:45 Michael Kennedy: No, no. Our Python. Like the Python that I write. I give to Nuitka. You write, yeah.

30:50 Kay Hayen: Yes, exactly.

30:51 Michael Kennedy: Sorry. It sounds the same, of course.

30:54 Kay Hayen: No, our Python would be reduced. Yes, exactly. Takes your Python and turns it into extension module and then a very hacky one. It's really nasty. I'm really a big friend with a dictionary implementation. I take advantage of all the internal knowledges that I'm not supposed to use.

31:14 Michael Kennedy: Nice. But that's what a compiler is supposed to do, right?

31:17 Kay Hayen: Yes. It's a responsibility to undo all these nasty things. If you have a older Python version the safer it is. So 2.7 is not going to change a lot.

31:29 Michael Kennedy: Yeah. You can be sure of that, yeah. This portion of Talk Python To Me has been brought to you by Rollbar. One of the frustrating things about being a developer is dealing with errors. Ugh! Relying on users for reporting errors, digging through log files trying to debug issues or getting millions just flooding your inbox and ruining your day. With Rollbar's full stack air monitoring you get the context, insight and control you need to find and fix bugs faster. Adding Rollbar to your Python app is as easy as pip install rollbar. You can start tracking production errors and appointments in eight minutes or less. Are you considering self hosting tools for security or compliance reasons? Then you should really check out Rollbar's compliant SaaS option. Get advance security features in meet compliance without the hassle of self-hosting, including HIPAA, ISO 27001, Privacy Shield and more. They'd love to give you a demo. Give Rollbar a try today. Go to talkpython.fm/rollbar and check them out.

32:33 Kay Hayen: This is the task.

32:33 Michael Kennedy: Yeah. So let's talk about compatibility a little bit. So it works on Windows, Linux and and macOS.

32:39 Kay Hayen: It works on not just that. Because it's creating C code I think it will work on anything that you can with Python. That's the idea. I obviously can only test Windows and Linux and I know that macOS is working because other people are using it. And sometimes I fix something but I don't have a macOS machine.

33:02 Michael Kennedy: Yeah, you made an interesting point around that as well. Well, I think iOS and Android probably are also candidates here.

33:10 Kay Hayen: Yes. Android. I think people have done Android. Obviously it has been done. I think I have done myself there's there's a cross platform toolkit. I've done some Android compilations myself. It's non-based Linux that has a Python and you can obviously the compiling machine might be too slow. That's a limitation there. But in principle there's nothing which prevents it from working. And I think it has been done occasionally on and off. But practicality is currently just... But, I think once the speed ups are increasing I would expect that people... There's a lot of people currently doing games in Python. I met a couple of those and it's very interesting to have something like Nuitka for these used cases.

34:04 Michael Kennedy: Right. You mentioned the PyPy folks getting the 3D stuff accelerated and working faster and that was pretty cool. So, your project is really right in there as well for making these things go faster.

34:18 Kay Hayen: Yes. But, it's not happening.

34:23 Michael Kennedy: Yeah. So, lots of OS's. Python 2 and Python 3 both?

34:27 Kay Hayen: Yes. 3.7 actually, at the time we are releasing this it will be working. And 3.2 I just discontinued the support for that. Mostly because it's impossible to get it running. And 3.3 - 3.7 2.6 - 2.7 they all work. In my design I have what I call is reformulations. I think you can put up the link to these language conversions to make things simpler for my developer manual which is very instructive and it explains a lot of things. Actually to me this is baby Python. Or this commonality of Python internals that I'm using. And the with statement, for instance, it's translated into many statements, actually and or assert statement is just if condition, if not condition raise for certain error with my arguments. These kind of things. And that means that most of the time the Python versions do not make as much difference as you would think they do.

35:36 Michael Kennedy: Right, you've got some stuff that goes, all I'd look at a with statement, I'm going to translate that to something else. Regardless of which level.

35:44 Kay Hayen: There's a huge chunk of code dedicated to reformulating the with statement into a series of trying except finally stuff. Doing look ups of enter and exit and implementing the Python mechanics. In explicit terms with temporary variables and so on. Which then can be optimized at compile time. Maybe sometimes we know but it's not going to raise an exception and then we can drop this. That's basically the reason why the language versions are not that much of a problem.

36:22 Michael Kennedy: Yeah. I suspect that the async stuff was more interesting.

36:25 Kay Hayen: This was terrible. And especially imagine coroutines, I've seen this being used. I personally never knew why I would use it. I don't use it. Because it's like, you get a lot of disadvantages of threading without the advantages. I'm not so sure why it is good.

36:47 Michael Kennedy: Well I was just thinking of you. How difficult that must be to implement in this scenario? And, how your comments in the beginning how you don't really like them.

36:56 Kay Hayen: I had no idea what it is about it. It's like I'm trying to find out what it does and use it and the semantics and especially the coroutines for example in the minor release, 3.5.2. They changed our mind how it worked. And they introduce a compatibility layer and then in 3.5.2 it was like a totally new implementation compatible with another one which is a total mess. Generators are taking it to extreme. You know when I started Nuitka there was a yield statement. A generator could yield the execution. There was no return value. And then this got editted. Actually, it was through Nuitka that I learned that this changed at all. So you can imagine that what happened there is amazing and they have a full team of people implementing this stuff and throwing out a new release. And I get to represent that. That's basically what happens. And one guy against the whole team. And then we try to be faster with your implementation. That's actually ridiculous. But that last release wasn't so bad. I didn't encounter anything crazy.

38:20 Michael Kennedy: That's cool. I'm pretty impressed that you got 3.7 going already 'cause that just came out a couple of weeks ago.

38:27 Kay Hayen: Yeah. Actually, with 3.6 I was even reporting back so I did, 3.7.0. So I'm back report now about dictionaries. I was using release candidates reporting my findings. So, I have a couple of tests which demonstrate that Nuitka is not misoptimizing certain things badly. And I found out CPython was. Nuitka wasn't, but CPython was. And I could turn this into reports. And 3.6 was already pretty good at the time it was released. But for 3.7 they made a huge change to exception handlings also. Absolutely terrible. I think in Python 2 it had pretty crazy semantics where a frame had an exception and restored it. And it was terrible. And Python 3 they changed it again. And now in 3.7 they changed it yet again but mostly on the implementation side. So the structure internally way storage and so on this changed, and this prevented me from, without a lot of changes to compile with new Nuitka. So I'm a bit late actually. I normally try to be quick with this because the early adopters will pester me about lack of support for the latest, greatest Python release.

39:50 Michael Kennedy: Yeah, sounds like you're pretty much on top of it. That's really good. So, one thing that I was thinking about as I was looking at this is, would it basically hand me the output of Nuitka? It's like a C extension, right? A compiled C extension.

40:04 Kay Hayen: That's one of the modes.

40:05 Michael Kennedy: Yeah. So, if I have regular Python code that I just want to run not compiled or anything. Just standard. But there's part of my program that I would like to make faster and optimize with what you're doing, I can put those two things together, right?

40:20 Kay Hayen: Yes, you can. That's absolutely true. And actually, one of the things I'm aiming at is making bindings an easy task. So my vision of bindings is for you to use C types or something. For middleware we use C types to make bindings to implement it in Ada but you can think of C code. And that's a pretty neat way of doing the bindings, and my idea is for Nuitka to optimize that way and not C types but to directly C codes and do binding like this in a comparable fashion and what I also want to achieve is that Nuitka makes available the original code. So if I use that binding extension and try to compile a program I would like for the compilation process to be able to go and inline code from there. I'm not so sure that was clear or no, but the idea was... Right now there's a barrier. So if it's an extension module, it's an extension module. I can't look into PyQt at all, if it's created with it. And I want to replace this with the ability to make a call into a bound C function in my main program. And normally I would use this compiled extension module, compiled with Nuitka and if I compile the program it would be able to inline that extension module's call.

41:55 Michael Kennedy: Right, so not just optimizing the Python code you feed it, but actually optimizing the underlying C code that's being used. Because ultimately it gets down to C on C.

42:04 Kay Hayen: Yeah. Generate C code for the compiled program again. And I think the ecosystem would benefit from having this dual mode code. You have this original Python binding and you have the compiled extension. You can use the compiled extension where you want and you can also use the original code if you compile. So that's my vision. And I think that will be healthy for the ecosystem and that's something that Cython doesn't do. And then it would I think be a much more attractive to do bindings in pure Python.

42:42 Michael Kennedy: Yeah, I think that sounds really great. That's awesome.

42:45 Kay Hayen: That's my vision and I think I will be getting there.

42:49 Michael Kennedy: Yeah, I think you will as well. You definitely got some impressive stuff already working. Let's talk about some of the optimizations. It sounds to me like one of the most important things you could do in Python is to inline functions. Because functions in Python are almost unreasonable slow. They're quite, it's actually quite, they're high-penalty to call a function in Python. Just that mechanics of setting up that stack and call and whatever, compared to, say, other languages where it's slow, it has an effect but it's not super significant.

43:23 Kay Hayen: Yeah. I did a lot of hacks to accelerate function and method calls actually. A lot of the acceleration in Nuitka is coming from being willing to do all kinds of nasty tricks to call functions in a faster way. That's a huge overhead. And these inlining, that will, of course, make all the difference. But it requires both program optimization. That's a rather difficult task. So, as you know, with Python code can change everything behind my back without me noticing.

43:57 Michael Kennedy: Right.

43:57 Kay Hayen: It will always; I will always have to do both things. So, I will have to check that it's actually what I'm thinking it is much like a PyPy JIT. It's having got... Is it really the same as last time? And is it really what I expect it to be? And then I can be super fast. Or I need a fall back because something crazy happened behind my back.

44:22 Michael Kennedy: Right. Like unitest patch, for example.

44:25 Kay Hayen: Yes. That kind of thing. We'll have to work. This duality, it's my vision of how things will then eventually work, but inlining code will definitely do a lot of things. I'm currently working on classes and optimizing the classes and the dictionaries and tracing the values in there. Nuitka, and I hope the next release will be capable of statically optimizing most classes into a simple yet call to type with a dictionary of stuff. But the first step, obviously to understanding types because I will be able to precisely know the dictionaries then I have to see through the meta class mechanics and see where actually mostly harmless, most of the time, and take it from there. This is a... I think this it's a huge undertaking. It's one of the directions that I want to go. So this global program optimization are one thing and then locally for cases where I do know that something is an integer I want to get to using alternative integers and Python integers or just a C type where possible. That's a direction I intend to work on the next two months.

45:49 Michael Kennedy: Basically, that is the magic sauce that makes Cython fast is that you explicitly type your stuff and it let's an int just be four bytes on the stack in C rather than all the indirect stuff. Well, it could be this reference type. We don't really know what it is. Treated as a full on Py object.

46:10 Kay Hayen: Yes.

46:11 Michael Kennedy: You're working on this as well?

46:10 Kay Hayen: Yes. And this is where I think C level performance is possible. And then I have the idea that unlike Cython I want to make type annotations that actually behave in Python at runtime, too. So I would like you to declare that a function takes integer arguments and will raise an exception if it isn't integers. And then I want to have Nuitka see through that take these assertions and generate specialized code much like Cython does but in a lot more complex way and would be benefit of actually enforcing these things at one time. That's the vision there, which is also a lot of work. You would have to write some decorators. But I think other people can do this. And just today I asked somebody who offered to volunteer if he could do it, that would be great if people joined such effort. Because that's only about CPython, right? You can write without any knowledge of Nuitka. I want to have a decorator which says I have to declare this class and now it's frozen. You will not be able to change anything after the fact about this or else you get an exception and that's a hint to the compiler to take advantage of that static optimization.

47:37 Michael Kennedy: Yeah. Really like slots, right?

47:39 Kay Hayen: Yeah, exactly. Like slots. And, but in a more general fashion. And forcing these things. These kinds of decorators I think make all sense. And, yeah. This is another avenue where I think Nuitka will also be beneficial to the ecosystem of making it viable because what it will do is slow down your program terribly because it will do all these checks that typically are not useful at runtime if you do will see Python decorators. But then with Nuitka then it's actually better. And that's the idea.

48:20 Michael Kennedy: Better that you clear a much higher performance boost. Look, I ran it Nuitka, it's 20 times faster. It just can, it would be slower.

48:27 Kay Hayen: Yeah. I don't know if you're getting this but Nuitka is a crazy compromise free project. And, if it fails which it probably will, that's why. It's all about trying to do the best thing possible.

48:42 Michael Kennedy: Yeah, the compromise free part is what makes it so challenging, right?

48:46 Kay Hayen: Yes. It's so challenging but it's also, yeah, it's rewarding and everything I've achieved challenge, when in 2003, I made this public release of Nuitka. It was usable. It was usable from day one. And it has remained usable all this time. And it only became ever usable, more usable. What it didn't become and I think, at least two years, it didn't become faster. Because I'm doing this big transformations on internals where, for example the local dictionary work that I'm doing on classes. Class declarations aren't performance critical at all. You run that code once. It's not in a loop, but for scalability it's hugely important to get this. Because if you compile this million line of code. You optimize it better. It's much less code. And for these plans we really need to understand a global picture. So I'm spending a lot of time on stuff that doesn't actually improve performance at all most of the time. And I really look forward to finally having fun running benchmarks again. Because, I really don't enjoy running benchmarks if the numbers are all good. It's sad. It's crazy. And it's totally anti what everybody else has done in the field so far. So there has been compilers from Google. Forgot the name now. But, they had their own project. And the first thing they had is incredibly well numbered, the benchmarks. Yeah, but it wasn't doing what it should doing. I think the unladen swallow.

50:37 Michael Kennedy: Yeah, I think you're right. Unladen swallow, I think.

50:41 Kay Hayen: Just went away. It's not sustainable because you can't use it for anything but benchmarks.

50:47 Michael Kennedy: Right.

50:48 Kay Hayen: That's basically the idea. And I'm doing something which can use outside of benchmarks but there will always be code which is currently slower in Nuitka because I didn't look at it. Actually CPython has a lot of tricks I recall that it took me a long time until I was at a point to emulate their in place assignment tricks. So if you do in place assignment to string, waited somethings to avoid allocating a new object unless a reference was held on the outside. Which in many instances makes it infinitely faster than Nuitka was. I'm behind in some optimization even. And that's pretty tough. Obviously, I would like to have also cool benchmarks. But what I would like to have is also the ability to say, and you can have too in your program. That's obviously, I don't know. Forever it has felt like it's right around the corner that I can do this now. And I discover something else that I need to do first.

51:57 Michael Kennedy: Yeah. That's why I brought up Mercurial 'cause that's a pretty serious application to process.

52:05 Kay Hayen: Yeah, carrying the full weight. So doing all these refactorings. I'm always doing these refactorings in a nonbreaking way. So, I try to not straggle too far away from a working state in all the changes that I made. So, I made gradual changes. Like I said, initially it was more like a templating language and I turned it into a similar state assignment form compiler gradually, one by one, bit by bit. And trusted more here and there. Right now, this value tracing is very reliable but still not used for everything. I think right now the major fact that I'm using it is do I need to check the values assigned or not. And that's mostly what it's being used for. And for static optimization. And that working bug free but I'm always expanding this. And then I find something which is not yet working. That's the crux of it.

53:13 Michael Kennedy: Yeah, well. It sounds like it's getting better and better. And there's a lot of things that you could unlock with it. So speaking of stuff that's faster, stuff that's slower. When should I use this and maybe when shouldn't I use it?

53:29 Kay Hayen: You should use it if you don't care about performance.

53:31 Michael Kennedy: Yeah. If compatibility is just like mostly what you care about, yeah.

53:35 Kay Hayen: Actually, I have a secret pipe dream of being better in Pylint. But that's a whole other issue. Because my static analysis may uncover things that Pylint wouldn't. The program analysis. If you don't care about performance, and I think that's roughly 99% of all the people. You should use it if you are into NumPy and these things. Although, I was in discussions with the University. It would take somebody to work on specializing code generations for NumPy. So, NumPy is this hugely great scientific library where you would just provide very little Python code on your own and for you to do something for each cell. For example, that would be Python code. It would be massively cool if we could have this little bit compiled. And still run it. And I think NumPy would be practically not entering the Python runtime a lot and that would be something but I didn't quite manage to get some University or something involved. So scientific applications and financial applications are also the way you want to do financial stock something. You want to buy as fast as possible. And make your decisions as fast as possible. Real time applications in Python.

55:09 Michael Kennedy: Algorithmic trading or something.

55:10 Kay Hayen: Some people are crazy enough to do this in Python because of the development turn around. And they want to throw something easy on it to get it acccelerated. And, on an update they want to change it simply. And that's the kind of uses. I think to most people that do not care that much about acceleration some people care about deployment. And there's a stand alone mode especially on Windows, where you do not normally have any Python, or other arbitrary user machine. And definitely not the version you want to have. And definitely not the modules. You can run Nuitka and use it to create this distribution folder. And send that people's way and have things work. I think what a lot of people use it for.

56:04 Michael Kennedy: I think that is really excellent. And so, basically, you compile it down to an .exe that does not depend on CPython. It bundles it up, like contain that within itself, right?

56:14 Kay Hayen: Not only in installation. And, although you could use Anaconda Python, too. It's, I think a convenient package. I think that it's a lot of people are using it to hide their source code from people. And that's actually something which I'm, that's what compilers allow for. It's not my daily joy of being part of depriving people of the ability to change source code. But people own their source code. So, they can give it to you or they can't. But yeah, that's one other mode of operation which I think is unfortunately the majority of users, I guess. And it's also causing a lot of trouble because extension modules often do imports of stuff which they don't tell me about and then they crash when it's not present. And it's sometimes very nasty.

57:15 Michael Kennedy: Yeah, I can imagine that's tricky. So, though, if I'm using external packages off of PyPI that's no problem, right? You can deal with that? So if I work with request, SQLAlchemy, whatever, I can run that through Nuitka?

57:28 Kay Hayen: Mm-Hmm.

57:29 Michael Kennedy: That's awesome.

57:30 Kay Hayen: Yes. Actually, yeah. You just, arbitrary code. It's supposed to work. I think Django, some of it really dynamic crazy stuff would need a plugin created to to pass it. Because, I think Django does some dynamic imports of stuff. Some people are getting it to work. I don't know, that takes some modification.

57:57 Michael Kennedy: Yeah. Sure, sure. Interesting. The standalone mode is pretty interesting as well. So I think we're getting kind of near the end of our time. So, one thing I did want to highlight, I think. It sounds like you're looking for contributors and there's a lot of places that people can contribute. You talked about writing some standard Python code but then Nuitka could use. I'm sure, if there was a long list. These are all the optimizations we're looking for. Could anybody who's good at this kind of stuff in C. Could you help here? Things like that. Are you looking for contributors along those lines?

58:33 Kay Hayen: Yes, of course I am. I actually have to think if I did. I should have a list of issues on GitHub that are marked as help wanted. And there's a lot of these things. And, obviously I could and should create a couple more from these ideas. But I just told you about that's totally something I should do. I have relatively bad experience with people joining the project because that's it's tough to get progress. It's also, a complex design. Not easy to get into. It takes a lot of skill of Python and C. And, occasionally I get contributors. But I think it's not rewarding enough I think for many people, unfortunately.

59:30 Michael Kennedy: Yeah it takes a lot of effort to see the rewards through, right? I'm just thinking there must be many people who are working on compilers and stuff who just left university and are looking for some kind of project. Maybe they can jump in.

59:43 Kay Hayen: Yes. I welcome everybody. But I have to honest with you. I also don't have time to actually mentor much. I can't guide people through doing things and they lose interest. That's happened a couple of times. So there's a lot of people who want to find out if this is something they would be wiling to do. I have a family. I have a day job. And this is my spare time. It needs a lot of work. Communication is like, often it takes second place after coding myself.

01:00:21 Michael Kennedy: Yeah. That's the challenge of doing this as a part time project, right?

01:00:25 Kay Hayen: Yeah. It's totally impossible to keep up with. So suppose somebody really wants to get into this. Yeah. I cannot respond quick enough. Or sometimes. And that's already frustrating.

01:00:41 Michael Kennedy: I guess people can probably start by watching your EuroPython presentations and then go from there.

01:00:46 Kay Hayen: Yeah. Also, I think what I really would need is somebody with community skills who would be capable of doing a fundraiser for me. It's a long-term project. And a lot of people have short term needs. I need to get Nuitka to the point where people with short-term needs will feel that investing their time into Nuitka is worth while and then it will be a different story. Yeah, but come back to what I would like to have somebody who organizes some fundraiser and does this so I could get time off my day job and work on Nuitka. I've invested my own money right now. Four months, unpaid.

01:01:34 Michael Kennedy: That is a big commitment.

01:01:36 Kay Hayen: It's a lot of effort. I'm buying hardware. I'm getting some donations but it's not enough to buy the hardware that I buy for Nuitka. And it starts with something people could do is send donations my way. I think a concerted effort that would gain visibility and raising goal and me being able to put a couple of months again into Nuitka and pushing really forward. That would be great. Probably that's something more than I need than actual contributors. Because, I think I need to reach a breaking point where I have something that is attractive enough for people who have actual problems to just add the bit that they are missing. At the point where you have a lot of optimization and then they run into some construct which is not properly optimized and then they can hack it. And then it is. That's a feat for them. Right now that's not the case. Right now you would have to be somebody with long-term vision and no concrete other problems that require your attention. So, that's it.

01:02:46 Michael Kennedy: I think the couple months sprint would really make a big difference. I would see it not make a difference in other projects more than donations or other things as well. So, certainly that sounds like a good thing. Before we go, let me ask you the final two questions. If you're going to write some Python code what editor do you use?

01:03:05 Kay Hayen: I'm using Eclipse and PyDev specifically. I occasionally use Vi on the command. Just for quickness. But Eclipse and PyDev it is.

01:03:15 Michael Kennedy: Right on. And then just some notable PyPi packages, maybe people haven't heard of them or aren't using them yet.

01:03:23 Kay Hayen: Obviously I have a lot of machines running and I'm using Ansible, which somebody introduced me on a conference. And am forever grateful for the tutorial I got there. It's very useful tools, like salt. If you know that. Ansible, written in Python. You just can easily deploy all your machines in a similar fashion and that's great. And Nikola. Actually, I forgot my contributor number. It might be contributor number five or something. Static website generation. That's what people are using a lot these days. And really like the idea. And I joined that. And I'm using it for my website. It's great. Obviously build port is also a tool I heavily rely on. I absolutely despise Jenkins which I have to use at work. The build port because you get to write your configuration as Python code. Actually there's no weapon to face needed to configure stuff. You just do your for loops and stuff and everything's consistent which is great. And I think something which should get more exposure is pipenv which combines the virtues of virtual environments and pip install making it easy to run stuff. I'm not using that myself a lot. But I think it's very notable.

01:04:49 Michael Kennedy: Yeah, absolutely. Alright Kay, thank you for being on the show. This was a really interesting project that you've been working on a long time and it's definitely a cool exploration of this compiled space and it's its own take.

01:05:03 Kay Hayen: And I would like to thank you for hosting me and giving me your opportunity. I hope I wasn't rambling too much.

01:05:10 Michael Kennedy: Yeah, it was some interesting stories. Thank you. Yeah. Take care.

01:05:15 Kay Hayen: Okay. Bye-bye.

01:05:17 Michael Kennedy: This has been another episode of Talk Python to me. Our guest for this episode is Kay Hayen and it's been brought to you by Cox Automotive and Rollbar. Join Cox Automotive and use your technical skills to transform the way the world buys, sells and owns cars. Find an exciting, technical position that's right for you, at talkpython.fm/cox. C-O-X. Rollbar takes the pain out of errors. They give you the context, insight you need to quickly locate and fix errors that might have gone unnoticed until your users complain, of course. As talk Python To Me listeners track a ridiculous number of errors for free at rollbar.com/talkpythontome. Want to level up your Python? If you're just getting started try my Python Jump Start by Building 10 Apps. Or our brand new 100 Days of Code in Python. And if you're interested in more than one course be sure to check out the everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You could also find iTunes feed at /itunes, Google Play feed at /play and direct RSS feed at /rss on talkpython.fm This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon