Monitor performance issues & errors in your code

#113: Dedicated AI chips and running old Python faster at Intel Transcript

Recorded on Saturday, May 20, 2017.

00:00 Michael Kennedy: Where do you run your Python code? No no no, not Python 3, Python 2 or PyPy or any other implementation. I'm thinking way lower than that. This week, we're talking about the actual chips that execute our code. We catch up with David Stewart and meet Suresh Srinivas and Sergey Maidanov from Intel. We talk about how they're working at the silicon level to make even Python 2 run faster, and touch on dedicated AI chips that go beyond what's possible with GPU computation. This is episode 113 of Talk Python To Me, recorded live at PyCon 2017 in Portland, Oregon on May 19th 2017. Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem and the personalities. This is your host, Michael Kennedy, follow me on Twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at and follow the show on Twitter via @talkpython. This episode is brought to you by Talk Python Training and Hired. Be sure to check out what we both have to offer during our segments, it helps support the show. David, Suresh, Sergey, welcome to Talk Python.

01:32 David Stewart: Thank you very much.

01:32 Suresh Srinivas: Thank you.

01:33 David Stewart: Great to be here, thank you very much.

01:35 Michael Kennedy: David, you were, we talked in the early days of the Intel Python distribution. And you guys have a lot of new things to discuss that you've done in the whole Python space. With the Intel Python distribution, but also with other things, right?

01:49 David Stewart: Yes, that's right, yeah. We have a lot of, I mean, yeah, about a year ago when we talked the first time, we had really put the plans in place for not only our upstream contribution to Python, CPython, we were also doing a lot of work with PyPy, the JIT interpreter for Python as well as the Intel Python distribution. Since then, we've released the Intel Python distribution and we had some very significant, you know, upstream contributions and some proof points with some customers that have been showing some very positive things. So yeah, not to repeat maybe last time, but just to say, generally speaking, if you're doing scientific computing, the Intel Python distribution, particularly things like using Pandas or scikit-learn or NumPy, SciPy, these are all things that really work together very well with this Intel Python distribution, it's sort of a one distribution, just download and install the whole thing as one, right, so not a lot of messing around with it. We also have significant proof points with PyPy in particular, we were showing off a doubling of throughput of like OpenStack Swift with the PyPy contributions we had made and that's, you know, so that means faster throughput, more users being able to being maintained and things like that. So that was a year ago.

03:06 Michael Kennedy: Yeah, that was a year ago. And when you say you're doubling the speed with PyPy, does that mean the contributions you've made back to PyPy now resulting in it going faster, or is that somehow running the Intel Python distribution on PyPy?

03:17 David Stewart: It's actually, so the Intel Python distribution is separate from PyPy. We have two major efforts from a year ago that we're doing, upstream open source contributions and the Python distribution, which is a proprietary product, right. So what we had, the doubling actually was just initially out of the box. Let's see what PyPy gives us, and we were stunned that we got twice the throughput and 80% better response time on Swift. Just out of the box, and that said, oh, let's start doing some work, you know, to actually optimize this thing and make it better and try more sort of proof points with other real customers, right. And since then as well, the Python distribution has had a lot of proof points with customers. I think we've had financial organizations, right.

04:02 Sergey Maidanov: Financial, oil and gas, government organizations. A lot of different usages.

04:07 David Stewart: Yeah, it's great.

04:09 Michael Kennedy: Okay, so that's really cool that you guys are working on that. I know we're not just talking about the Intel Python distribution, but let's dig into that just for a minute, is that basically CPython forked with some changes in the middle or like, is it a from scratch implementation? How does this relate back to CPython in terms of--

04:26 David Stewart: The beauty of Python, right, is it's a language specification and sort of the, sort of standard implementation, CPython, right. But it's very open to any number of other interpreters or implementations of the language, PyPy being one of them, and it's basically a JIT, which means a just in time compiler, which means that instead of just interpreting the bytecode from Python, it actually generates native code any opportunity it can for the hot parts of the program. And that's incredibly helpful, strategic because then we can, you know, make use of a lot more processor instructions, make use of more processor parallelism.

05:07 Michael Kennedy: Yeah okay, that sounds great. Suresh, were you involved in this Intel Python distribution work?

05:12 Suresh Srinivas: As Dave was saying, right, we have Intel Python distribution, which is for the high performance computing. And then we also have Python optimizations for the data center. So I've been involved a lot more in the Python optimizations for the data center.

05:27 Michael Kennedy: Okay. Is that at the chip level or?

05:30 Suresh Srinivas: Both at the chip level as well as being able to optimize some, being able to deliver some workloads, and then optimize, start optimizing the CPython. So we have things like profile guided optimizations and link time optimizations that have now become default in CPython.

05:50 Michael Kennedy: Tell us a little bit about the profile guided optimization? So what you do is you run in a profiler? And then you somehow feed that back to the compiler?

05:59 Suresh Srinivas: So profile guided optimization is very critical since a lot of these runtime languages, they have both large code footprint and then they have a lot of branch mispredictions. Which essentially stall the frame time of the CPU. And by profiling the code, then you're able to then lay out the code better so that it's friendly to the CPU and that it's also more efficient, and that's--

06:27 Michael Kennedy: So that PGO is now a default with Python.

06:30 Suresh Srinivas: With CPython, and PyPy, the Py performance project, that is how they're measuring it now.

06:38 Michael Kennedy: Wow, that's really cool. And Sergey, how about your involvement?

06:41 Sergey Maidanov: They are solving really critical problem making interpreter or JITing really fast on Intel architecture. Intel distribution for Python, it also solves the problem of making numerical and machine learning running faster. And Python is known and loved for really nice numerical packages, NumPy, SciPy, scikit-learn.

07:06 Michael Kennedy: Right, all the stuff that we saw in the keynote today, it was just like here's why people that do numerical analysis love and use Python. For those people listening who didn't get a chance to watch the keynote, you deserve to go on to YouTube and watch, all right? So yeah, absolutely, those groups of people, the scientists, data scientists, it's great, right?

07:25 Sergey Maidanov: That's why we focus on this area, and we optimize these numerical packages, not interpreter itself, but rather the packages. And for that, we rely on high performance libraries, native libraries, that Intel develops for decades. Intel Math Kernel Library, Intel MPI, Intel Data Analytics Acceleration Library. These all good high performance libraries I used underneath to accelerate NumPy, SciPy, scikit-learn.

07:53 Michael Kennedy: I see, so you take, like let's say NumPy. You take NumPy and you recompile it against these high performance Intel libraries. And that, because the foundation is faster, basically makes NumPy itself faster.

08:04 Sergey Maidanov: Exactly. It makes almost as fast as native code.

08:08 Michael Kennedy: How much do you think the fact that you guys control the hardware and build these libraries, that you can make them compile to exactly what they need to be, or could anybody do this? Is it a big advantage that you guys control the chips and understand the chips?

08:20 Sergey Maidanov: Absolutely. I can tell you the example, I was with Intel Math Kernel Library team for 15 years. And we started optimizing MKL for new processor 3-5 years in advance of its launch. That's really huge benefit. So by day one of processor launch, we had MKL optimized for that processor. Same with Intel Python distribution. We had Knights Landing Xeon Phi processor launch last summer, and by that time, Intel distribution for Python was already optimized for KNL.

08:52 Michael Kennedy: I see. 'Cause you guys have a long lead time--

08:55 David Stewart: Yeah, and I think that's a good, the other side of this is not just being able to, you know, have these libraries for, if you're using those for scientific computing, but there's a ton of usage of Python in the data center that is not scientific computing. You know, a great example is the site Instagram. Any number of other sites that are out there that are using Python to, OpenStack itself is implemented in Python. So one of the things that, in terms of working with the chip architects, is being able to actually help them design the chip so it runs all Python better and just not this library code, but all Python as well.

09:28 Michael Kennedy: Right, and that's, you said there's some really interesting performance improvements that you got for running old Python, and we'll dig into that in a little bit, 'cause that's, as fun as it is to look at the data science stuff and the machine learning performance and all that, most people are running old Python, maybe Python 2 stuff that they don't even wanna touch, much less optimize away, right. If somehow you guys can just magically make it run faster--

09:52 David Stewart: That would be good for us to do, wouldn't it? And that would made sense, yeah.

09:55 Michael Kennedy: It would, it would. So I mean, we're talking about performance in terms of speed, but when you're optimizing like the data center, one of the major meAzures of efficiency in the data center is how much do I have to pay to run this in electricity and cool it? So it's just like, pure efficiency in terms of energy, right? How much of a difference have you guys seen in that?

10:14 Suresh Srinivas: That's really huge because part of the challenge in the data center is all the cooling costs and all the space costs and things like that. So Intel and Facebook worked together to create a new server architecture, right, that many of the Python programs can run in the data center on that architecture, and that runs at 65 watts compared to--

10:37 Michael Kennedy: Compared, yeah, give me an example. Like, what is that relative to?

10:39 Suresh Srinivas: Compared to a server that runs at 150 watts. And so it's really efficient and then it has a lot of technologies that we are adding to the silicon itself to make it perform well at the same time, 'cause people want both the power and the performance.

10:57 Michael Kennedy: Obviously, you want the speed. But you can get the double density in a data center, so if you're AWS or Azure or Google or Facebook, you can have twice as much computing power and the same amount of energy and cooling out.

11:08 David Stewart: That's a real win. And not only that, there's something that we've observed, you know, an extra processor generation of performance improvement with some of this optimized software. That's something that's an advantage going that route, yeah.

11:21 Michael Kennedy: Yeah, and so what's really cool, I think, is some of this work that you guys are doing is being pushed upstream to CPython, it's being pushed upstream to PyPy. It's one thing to say, well, we have our own distribution and that one's really fast so please use ours instead. But you guys are also giving a lot back by making Python itself faster for everybody, or more efficient in energy terms.

11:41 David Stewart: It's really sort of the not a one size fits all sort of philosophy, it's really doing data science using these libraries Sergey was mentioning. That's, the Intel Python distribution is a great one stop shop for all of that stuff. If you're not using necessarily the libraries, then we're working in the sort of the upstream areas that make sure that any use of Python that you would download will run faster.

12:04 Michael Kennedy: Yeah, are there any notable specifics about what you've contributed to CPython or PyPy that you can think of off the top of your head?

12:11 David Stewart: Yeah, one of the things that's been interesting for us is making sure we have really customer sort of relevant workloads. When we talk about a workload, what this means is, you know, you have software just sort of sitting there, you install Python and, well, that's not particularly interesting, right. What's more interesting is if you can run some code that represents what everyone else is doing, right, and hopefully not like just a simple sort of micro, right. It's like something that's actually sort of realistic. And so one of the things we're really excited about is we just open sourced, with Instagram, a new workload that represents what not only Instagram is doing with Django, but also represents a lot of other Django usage out there. And so that one is really why open sourcing it by both companies contributing to it, I think it's gonna help everybody sort of drive performance better, right. We also do a lot of monitoring of the sources, so for Python 2, Python 3 and PyPy, we actually do a nightly download of the sources, run a bunch of benchmarks and then report the results out to the community. So anybody can go to and see a complete read out of a bunch of different workloads with the different versions of Python and PyPy, and so you can see exactly on a day to day basis how the performance changes. Now, the reason why this is important is someone can do a pull request that slows things down by 10 or 20%. We've seen this in some cases where a single pull request will really slow things down. And so we're not only monitoring this thing, we have engineers that are jumping on it and being able to say hey, if we have a real regression in performance, we wanna jump on it very quickly and get it back. So this is one of the earliest things that we did in these languages to try to help with this.

13:55 Michael Kennedy: That's a big deal because it's very hard to be intuitively accurate about performance, isn't it?

14:00 David Stewart: Correct. Or it could be, your intuition might say one thing, but it might be absolutely wrong, you'd go, "Whoa, this should run faster," and it's like, wow, it only improved like half a percent or maybe it degraded 5% 'cause of other things that might've gotten pulled in or just assumptions that were missing.

14:18 Michael Kennedy: Right, right. The code looks tighter, but it actually, that's something different with memory. Some example, I think that if you look at say, list comprehension versus a for loop that adds to a list. I think the list comprehension is faster, even though they're effectively doing the same type of thing, right, these types of things are pretty interesting.

14:33 David Stewart: And by the way, if you're a programmer, I think I made the comment last year on the podcast, the best runtime in the world, the best libraries in the world and poor Python code, right, will still run poorly. And so one of the things I think I'm really also very excited about is that we have a great profiler called VTune. It's from Intel, the group Sergey is from, and you're actually able to see where the hot spots are in your Python code. And I think this is really powerful because, you know, I think both the runtime and the user code are really important to optimize, or else you may not get nearly what you think you're gonna get in terms of performance.

15:10 Michael Kennedy: Right, even if you adopt the fast libraries.

15:11 David Stewart: Exactly.

15:12 Michael Kennedy: If you have an O, if you have some sort of exponential order of magnitude algorithm, you're still in trouble, right?

15:18 David Stewart: Or an order log n or something like that, right, n order n squared or something like that, then you wanna make sure you actually can identify some of those things and correct them.

15:26 Michael Kennedy: Yeah. So Sergey, tell us a little bit like how would I take, say we're talking about Django, Could I take a Django app and apply this VTune profiler to it and get some--

15:35 Sergey Maidanov: Absolutely. This is what we suggest essentially as a first step. You run your application on a new architecture, you won't understand what affects the performance and how I can improve this performance. The first step is to run it with a profiler like VTune, and VTune, this has existed for many years, this is a product known for profiling native tools.

15:57 Michael Kennedy: Yeah, I remember VTune from my C++ days.

16:00 Sergey Maidanov: Yeah. The only challenge with that, when you ran VTune in old days with Python code, it didn't show your Python-specific code, you saw these weird symbols.

16:11 Michael Kennedy: Yeah, like this C, eval.c is really slow, it seems to be doing a lot of stuff in here.

16:16 Sergey Maidanov: Tells nothing.

16:16 Michael Kennedy: Yeah, tells nothing.

16:17 Sergey Maidanov: So what we added to VTune, it now understands the Python code, and it can show exactly the Python function or Python line of the code, the loop which consumes the most cycles. So you can really focus on optimizing this piece of the code using a variety of technologies. Either libraries or PyPy or other technologies.

16:39 David Stewart: Or maybe just changing your code. As you were saying, Michael, a for loop versus a composition, right.

16:43 Michael Kennedy: Yeah, exactly. Exactly, yeah, that's pretty interesting. Does it require some kind of GUI thing, can I make this like a CLI part of my automated build?

16:52 Sergey Maidanov: You can write command line, if you like nice GUI, you can write GUI, like yeah.

16:57 David Stewart: So either a CLI or a GUI.

16:59 Michael Kennedy: Yeah, yeah. So any of you guys can take this one. Suppose I'm sitting, you know, I got my MacBook Pro here, and I've written my code and it runs a certain way. And then I want to like push it out to some hosted place, DigitalOcean, Azure, AWS, whatever. How much would I expect the performance to vary on say like one of those VMs versus say on like my native machine, and could I use something like VTune to like test it there so I test it in its home environment, not--

17:25 David Stewart: I think it's a great question. You know, so much code is being run in the public cloud now. My recommendation on that, and you know, performance, here's the other thing, there's nothing against any of the public cloud providers, but one of the things, if you're sharing compute resources, you're not necessarily getting the purest performance. There's some sort of performance trade-off for the fact that you do not have a dedicated machine.

17:46 Michael Kennedy: And it varies, right, you don't know what, are they doing machine learning or do they have an unpopular website, they just happen to pay for a VM.

17:52 David Stewart: Or even, you know, in some instances, we have a noisy neighbor. You know, maybe you'll have some VM that's destroying the cache, right. By the way, we have a feature that we've added to our processor to detect noisy neighbors and manage them. Which is a separate thing we're doing for cloud service providers. But anyway, for Python, so yeah, I would recommend running it native and doing most of your tuning there. By the way, I've noticed that not all cloud service providers would let you run VTune.

18:17 Michael Kennedy: Oh, really? It's like--

18:19 David Stewart: Well, it's not that it's not running VTune, it's just, they sort of sometimes mask some of the registers that let you detect the performance. And so that's some of the things I think you've gotta either a private cloud setup or an on-prem, it's much easier to really tune the performance and figure out what's going on.

18:35 Michael Kennedy: Maybe if you're doing OpenStack, you control the thing a little bit better.

18:37 David Stewart: Exactly right, you can say, you know, hey, give people the ability to actually monitor the performance of what they're doing and figure out how to make it better, right.

18:46 Suresh Srinivas: And also our silicon has these advanced features called the performance monitoring unit. Which, like when you're profiling on your MacBook Pro, VTune can really take advantage of that and it can tell you where your cache misses are coming from, where your problems are coming from. Whereas sometimes if you try to do it on a public cloud, it becomes harder for you to figure out. So we would definitely recommend like what Dave is saying, to be able to profile and get your code optimized and then deployed

19:14 Michael Kennedy: I see, yeah, so maybe test both, right. On one hand--

19:17 David Stewart: Yeah, I think that's, yeah.

19:18 Michael Kennedy: You get the best, most accurate answers on the real hardware, but it actually has to live over there so you wanna know also what it does.

19:25 David Stewart: Certainly see what the experience is, it's, you're expecting some throughput measurement, you know, set things up. By the way, for performance work, we sort of recommend that people have something that they can run their code against that's repeatable, you know, repeatable results, and then just change one thing at a time to kinda see what the changes, so it's a very scientific approach, right, as opposed to changing a bunch of things and gee, things, a lot of change but I don't know what it was that affected it.

19:48 Michael Kennedy: Right, make a hypothesis, make some measurements.

19:51 David Stewart: Exactly right, it's the scientific method, right, that we were taught in school.

19:54 Michael Kennedy: I think Aristotle and those guys were, they were onto something, and it's for profiling too.

19:58 Suresh Srinivas: My previous manager used to say, "MeAzure it twice, cut one."

20:02 Michael Kennedy: Yes, exactly, exactly, yeah, very much. So another area that you guys are working in that's, it seems to be like the last year or so this has become real, is AI and machine learning. I remember thinking for like 10 years, like yes, AI and machine learning, this type of stuff, especially AI, was like one of those always 30 years in the future sort of technologies, it's like people are working on it, but it doesn't ever seem to do a thing.

20:25 David Stewart: Flying car and jet pack.

20:26 Michael Kennedy: Yes, exactly, like as soon as I have my, you know, teleporter, I'll be able to do machine learning and stuff. But over the last I'd say two years, it has become super real, right, we have self-driving cars, we have all sorts of interesting things going on.

20:41 David Stewart: Lots of applications of AI, just in recommendation engines, facial recognition, all these sort of things that are just practical every day things.

20:49 Michael Kennedy: Yeah, it's gonna have some interesting societal effects, I think, in some very powerful ways. I think we as a world need to think about what that means for us.

20:59 David Stewart: I totally agree.

21:00 Michael Kennedy: I mean, I'm thinking of like breast cancer analysis, used to think radiology was like a super high end job that like, you're safe if you are a doctor. And now it's like, well, or you feed it to this machine and it's actually a little more accurate.

21:13 David Stewart: You could talk about other social impacts like are you gonna use past performance to indicate which is the best candidate to hire? Well, if you did that, you might eliminate a lot of people of color of women because they haven't been as much in the workforce, right, so you gotta be very careful with some of this social impact of these things. However, I will say this, one of the things we have been very, you know, there are a lot of systems on the internet that Intel's provided the chips for, and there's a ton of data that's out there, and so one of the things we did that's very interesting from a Python standpoint is, since a lot of companies have this data accessible through Hadoop and Spark, what we've done, we've recently, just in March, upstreamed or open sourced what we call BigDL. BigDL, it's sort of a big deal. Yeah, thank you, got the laugh. Anyway, so BigDL has a Python interface, so what it does is deep learning, so when you're doing training of a deep learning algorithm, and then inference analysis, right, what a lot of times that data that you're using to do the training on is accessible out of Hadoop and Spark. So a lot of people have said to us, hey, we would like to be able to do deep learning on our Spark data lakes. Or you know, Hadoop big data. It's like, yeah, so that's what BigDL does, but it's like, a lot of people said, we don't wanna have to use Java to go into that stuff, we'd like to be able to use Python. So that's what, one of the things that got released in March was our first Python interface to BigDL. So this is one of the ways where a lot of organizations, they already have a big data lake already that they can access through Hadoop and Spark, that can use Python and the BigDL project to do their deep learning experiments and then products.

22:47 Michael Kennedy: Yeah, that sounds really really cool, and it sounds like you guys are doing a lot of almost reorganization of Intel around this AI research and work.

22:57 David Stewart: That's a very good observation, in fact, we started up a new product group, the AI platform group, product group, platform group? The AI product group. We acquired a company called Nervana, which is spelled with an E, N-E-R-V-A-N-A, and we actually put the CEO of Nervana in charge of this new product group, reporting directly to the CEO. So these are chips that they're making, that Nervana is making, that actually does this deep learning, inference training and inference much much faster, order of magnitude better than anything else that's out there.

23:30 Michael Kennedy: Wow, okay, so I know about deep learning on CPUs and training and machine learning, that's pretty good. You move it to a GPU and it gets kind of crazy. These chips, these are not just GPUs, these are something different?

23:43 David Stewart: Correct, yeah, they're specifically designed for the problem set that deep learning presents to the CPU, so it's not like, yeah, I mean, our main Xeon processors actually do deep learning pretty well compared to the GPUs that are out there, but something to actually like turbo charge it and really take it to the next level, a chip that's specifically designed for that, not for that plus graphics or that plus something else, right.

24:08 Michael Kennedy: 'Cause traditionally, graphics card just coincidentally are good at machine learning.

24:12 David Stewart: Well, with a ton of effort. I remember, you know, the first time looking at, well, how do you get a GPU to actually do general purpose computing? Let's see, if you do a matrix operation, right, it's a texture, and so let's see, we'll get a couple of textures as matrices, we'll feed them into the GPU, and then you can do texture, your lighting transform on the textures, and it's like, well, that happens to be a matrix operation, read out the resulting matrix, and it's like, from a programming standpoint, that's why you need a lot of libraries and things to help you through that process.

24:39 Michael Kennedy: Can I express this general programming problem as a series of matrix multiplcations?

24:43 David Stewart: Exactly.

24:44 Michael Kennedy: Not in my own mind, but maybe those, yeah.

24:45 David Stewart: Texture, OpenGL texture processing and things like that. So this is one of the things I think is very exciting about moving this into the mainstream in terms of either the x86 Xeon processors, and then as we bring Nervana's chips, we bring them into the Xeons, we have actually FPGAs as well. These are special purpose, you can program to do a bunch of accelerations and they have multiple acceleration units built in, and so we can actually accelerate a lot of things along with the CPU, so there are a ton of options that we're bringing to the table that will really accelerate a lot of specific workloads.

25:18 Michael Kennedy: Yeah, that sounds really interesting. I wanna dig into that some more. This portion of Talk Python To Me is brought to you by us. As many of you know, I have a growing set of courses to help you go from Python beginner to novice to Python expert, and there are many more courses in the works. So please consider Talk Python Training for you and your team's training needs. If you're just getting started, I've built a course to teach you Python the way professional developers learn, by building applications. Check out my Python jumpstart by building 10 apps at Are you looking to start adding services to your app? Try my brand new Consuming HTTP Services in Python. You'll learn to work with the RESTful HTTP services, as well as SOAP, JSON and XML data formats. Do you wanna launch an online business? Well, Matt Makai and I built an entrepreneur's playbook with Python for Entrepreneurs. This 16 hour course will teach you everything you need to launch your web based business with Python. And finally, there's a couple of new course announcements coming really soon. So if you don't already have an account, be sure to create one at to get notified. And for all of you who have bought my courses, thank you so much, it really really helps support the show. Just on the general machine learning stuff. Suresh, you were working in the data center and optimizing that space, right? Over the next five years, how do you see machine learning contributing to that? Like, can you take a trained up machine learning system and say here's my data center, here's what we doing, can you make it better? And just ask it these questions, like, is that something that could happen?

26:47 Suresh Srinivas: No, that's definitely happening because it's all about like, what are the inputs that you can take in? And the more inputs you can take and learn some specific things, then you're able to start optimizing the system. So we'll start seeing this kind of technology becoming kind of more prevalent in a lot of things that we do. It's a very exciting time to be in this field.

27:10 Michael Kennedy: It's, every day I wake up going, it's even more amazing than yesterday. So same question to you, Sergey, like--

27:17 Sergey Maidanov: The big deal in this new area is cross-team productivity. You cannot solve the modern complex problems without involving demand specialists, programmers, data scientists. This is all new collaborative environment, so productivity is a key. This is what we're trying to offer through Intel distribution for Python. We provide out of the box performance and productivity to our customers so they can focus on solving their domain problem in deep learning, in machine learning, in general. And then with Intel distribution for Python, to scale this to real problem in data center.

27:55 Michael Kennedy: How about parallel distribution, multi you know, sort of grid computing type stuff, like, how do you see, what do you see out there and what do you see working for that in the Python space?

28:06 David Stewart: I think one of the things that, it's like I said, we have an array of things, so to speak that you can bring to bear on different problems. One of the ones that Sergey mentioned is something we call Xeon Phi, P-H-I, Xeon Phi. And it actually, as opposed to maybe 18 cores on a chip, it might have up to 80-90 cores per chip, right. So think about that, I mean, think about these all x86-compatible CPUs, all available to do a variety of things in parallel. So that's an interesting model to think about, it's like if you have parallelism, you can express it in a number of different ways. You can express it in terms of the vector, we have vector processing within the CPUs, we have this parallel processing, and I think Python has a lot of, you know, certainly some of the things that Sergey was mentioning in terms of these libraries that can be, make use of the vector operations within the CPU and really turn up the performance, right. Traditionally, Python has sometimes had a few challenges relative to parallel programming. And so one of the things that's really cool about, thinking about one of these libraries like MKL that Sergey mentioned is it can automatically take advantage of the parallelism that's available, right. And so if you have one of these, by the way, the Xeon Phi, if you go to the top 500 supercomputers, there's a significant number that you can look at and it's, oh, it uses the Xeon Phi as part of that, right. So the top, you know, supercomputers in the world are using this chip to basically achieve incredible results.

29:29 Michael Kennedy: It just keeps going, it's really really amazing, all the stuff that people are doing there. So back to the AI chip, it sounds to me like what you're telling me is you have this custom chip which makes a lot of sense because like GPUs, they were meant to do video processing, if you could make a special purpose chip for that, you're in a good place. What about other things? Do you guys have other specialized chips coming in addition to AI? Is this a trend, right? Going to have more specialized chips.

29:58 David Stewart: A couple of things I would talk about there, one of them is, you may have heard of a new memory technology that we've actually, it's incredibly revolutionary, and I say that as an Intel guy, but I gotta tell you, it's just mind blowing is that it's memory that sits, you know, you think about DRAM. You know, your regular memory in your Mac or whatever versus flash memory, right? The flash memory is, you can get a lot of it, it's lower cost but it's slow. Main memory, the DRAM, is like super fast, but it's expensive, right.

30:25 Michael Kennedy: And volatile.

30:26 David Stewart: And volatile. What if you could have memory that was non-volatile if you want it to be and sit in between flash memory and DRAM, right? And so we've come up with this, we call it 3D Cross Point. It's a memory technology that's coming out in SSDs now. And think about it from a Python standpoint, being able to make use of memory that's, it's actually chips in the DIMMs in the computer itself. So when you power on the computer, it actually has, you know, this persistent memory already available without going to the SSDs, right. It's instantaneously available.

30:59 Michael Kennedy: The choice previously has been better with SSDs, I remember when it was not. Is we've got this regular DRAM, and then we've got swap. And that's like 100 times worse or something to go to swap, and if it's a slow spinning laptop cheap disk, maybe it's way worse than that still, right.

31:16 David Stewart: But think about a data center where you have maybe a few terabyte of DRAM in a system and then multiple terabytes of this, it's just right as more memory DIMMs in the computer, right? This is amazing, and not only is it super fast in terms of latency, access latency, but it's also can be used persistent. So these are things which are, from a Python standpoint, we'll actually be able to make some of this stuff available to Python programmers when these products start rolling out, so this is a very interesting future. The other thing from a future chip standpoint that I think is very interesting is to look at we're now, because we're partnering up with the chip designers, you're talking about Intel controlling the chips, right. One of the things we're able to do is, folks like Suresh, Sergey are able to partner up with the chip designers and say, let's take a look at how Python runs on the chips. So you're running this stuff and you go, oh, hmm, looks like from the size of the code footprint, actually we're spending a lot of time just twiddling our thumbs in the processor because it's waiting for instructions to get fetched--

32:16 Michael Kennedy: Exactly, because it's too big to fit in the smallest cache.

32:20 David Stewart: Correct. And this is true of a lot of interpreted languages, if you look at PHP, Node.js, et cetera, there are all of these massive code footprints, and we've analyzed the internal pipelines within the CPU, we see this idling effect, right. And now with the next generation of chips that are coming on, they've actually taken a look at this and actually we're amazed at how much they've been able to improve on this instruction level parallelism, so in fact, even with a single instruction stream without parallel instruction streams, they're actually able to run old Python code faster. So if you think about it, if you've got a data center and got a bunch of Python running there, one of the best things you can do, now, you know, we as software guys would say, oh, we want you to use all of this good software goodness.

33:05 Michael Kennedy: Why are you running on this old version of Python, you should be running on this.

33:07 David Stewart: Use the new upstream version or use the Python distribution, et cetera. But the good news is, as an IT decision maker, you can now think about, well, upgrading to the latest Intel CPU actually runs Python, it's more than just like is it a different clock speed. It's not the frequency that matters, it's not even really the number of CPUs, the CPU itself actually at the same frequency can actually process Python much much faster 'cause it's making use of more of the CPU. Does that make sense?

33:35 Michael Kennedy: Yeah yeah, that makes a lot of sense. Well, you know, you make your comment about as a programmer, it's great to use all the new stuff. I personally as a programmer would like to work on new code that is adding new value and not go, you know that crummy thing that's been there for 10 years? We need to rewrite that so we can save on computers, that is not where I wanna spend my time, like you guys don't, right?

33:55 David Stewart: Right, oh yeah.

33:56 Michael Kennedy: Yeah. So if you can just make it run faster without me touching it, then I can go write stuff that I wanna write, like that new REST framework.

34:05 David Stewart: By the way, I would say one of the things that's cool about either PyPy or the Intel Python distribution or the other upstream work that we're doing is those typically don't require code changes either. So that's the other thing, is that if you make, you know, that's sort of the goal. We sort of feel like Python's a powerful enough language and an attractive enough way for programmers to work, productive way for programmers to work. Why should they be hobbled by performance, right? Why not provide something that will immediately give a boost? Now, we'd sort of like to think you ought to get a new processor too, I think that's a good idea, I think all of us would appreciate that, yeah. I think, good. But then some of these other things, our goal really is to make it so you, like taking a few actions, you don't have to change the code. Now, there are some new things, by the way, if you wanna get into your code to let me play with some new features, right. That's where we got some of these things like some accelerators or BigDL which will let you use Python to do more deep learning sort of things or maybe accessing this 3D Cross Point memory. So there's a lot of stuff that's gonna be very powerful to bring this stuff to bear if you wanna change the code and if you don't, you know, we have these other things to help you out with it.

35:11 Michael Kennedy: Sure. You know, if it's your core product, right, if you're Instagram and these are your APIs or whatever, like, you probably wanna spend some time to make those faster.

35:19 David Stewart: Yeah, absolutely.

35:20 Michael Kennedy: Right, and things like that. Interesting, okay, so what about Cython? Have you guys thought about how Cython works on the chips, and for those people listening, maybe they don't know, Cython is like Python language with a few little tweaks that compiles basically down to C or the way C complies, right.

35:37 David Stewart: In fact, Intel Python distribution is making both Cython and--

35:42 Sergey Maidanov: Numba.

35:43 David Stewart: And Numba, which are a couple of these, you know, moving to C code, basically, right. And then there are trade offs, as engineers know, there are trade offs for everything. The nice thing about that is you can get either Cython or Numba, you know, as part of that package, right. Some people will go, well, I don't wanna have to give up on the quick turnaround of being able to change code and have it interpreted, right, so that's where some of those trade offs go, right. Python 2, Python, CPython, PyPy, we tend to say, hey, you can still have the same development methodology. Numba, Cython are more--

36:13 Michael Kennedy: There's a build step which is weird to all of us, right.

36:15 Sergey Maidanov: It's all about choice. If we don't have Cython or don't have Numba, what choice do we have? Going to native language or staying with Python. So we are just providing choices people can make in trade offs to get what they need.

36:28 David Stewart: That's a great point. If you choose any of these things, we wanna make sure Intel are the best option to use for it.

36:33 Michael Kennedy: Yeah, that's cool. So let me ask you this, Sergey, about maybe your workflow. So I write my code all in pure Python. Maybe run it on CPython, right. See how it works, maybe it's not quite as fast as I want or maybe it should be optimized because it's better to have it faster, like you can scale it, put it more density or whatever. Then I run VTune against it, figure out where it's actually slow, that might be like 5% of my code or less, right, in like a large application, it's actually these three parts that kind of kill it. If I look at my website right now, which is pure CPython talking to MongoDB. The slowest part of the site is the deserialization of the traffic back from the database into Python objects. That's literally 50% of the workload on my website. And so I'm not gonna change that 'cause that's not my library, that's like a different ODM. But if I did control that, would it make sense to go write that in say Cython, that little 5%, and then somehow bring that in, what do you think?

37:30 Sergey Maidanov: Optimizing the last 5%, if you make it zero, even zero.

37:35 Michael Kennedy: Yeah, the 5% that's spending, where almost all the work, the 5% of my code base where I'm spending 80% of my time or 50% of my time.

37:41 Sergey Maidanov: Yeah, totally it makes sense. Totally it makes sense.

37:44 Michael Kennedy: Okay.

37:45 Sergey Maidanov: You really focus how do I optimize the biggest hot spot with minimum code change. 5% is a nice nice hot spot.

37:53 Michael Kennedy: Right, if I rewrite 5% of my code in Cython, but that's what was mostly slow, you could probably get a big bang for the buck, right?

38:00 Sergey Maidanov: Right.

38:01 David Stewart: You know, it's like, I was one day just lunch time, and I got this call on my cellphone. It happens to be this Intel executive that I kinda know, an acquaintance, right. And she said, "Oh, my daughter is working on this project in school with Python, it's running really slow." This is hilarious. How did you know that I was, you know, "I heard you had something to do with Python performance." And so I got--

38:20 Michael Kennedy: I've got an insider at Intel, I'm gonna figure out why my code's slow.

38:22 David Stewart: That's it. And oh yeah, trust me, I've learned many things sitting down with people at lunches, like people who created all manner of things in our world, it's like, oh, that's why that works that way, okay, interesting. Anyway, so I said, well, have her try PyPy as an example, it's a very easy step to try and see if it speeds things up, right. And so I didn't hear back from her, so I suspect that probably either worked for her or she got frustrated, I don't know. But I've talked to, there are plenty of architects, CPU architects and people who, there are people who have this massive lake of instruction traces. So we're actually able to take millions of instructions and record them and figure out what's going on, this is how we analyze future chips and analyze performance on them, running these existing instruction traces. And so they will have billions of instructions floating around in Python scripts that will actually go figure out what's going on and categorize them and help develop what's going on. But if that stuff runs really slow, it was actually one of those architects that mentioned PyPy to me the first time, and he was like, I think he's actually here, retired, lucky dog, and so I gotta find him and thank him again for having helped us get more insight into this stuff, yeah.

39:28 Michael Kennedy: Yeah, that's really cool. So coming back around to your AI focus, you guys see AI helping you design chips in the future?

39:35 David Stewart: That's a very interesting question. I'm sure a lot of engineers that I've worked with might be considered artificial intelligence, no, I-- No, I'm sorry. I am an engineer, so what can I complain about? I think there's already a lot of machine learning being employed in the design of the chips. We have a building, there's a particular building, I can't tell you where it is.

39:53 Michael Kennedy: Is that in Portland? Like, it's--

39:55 David Stewart: It's an undisclosed location. I will say there is a building that's stuffed full of CPUs, and it's got the most amazing structure. It was built really interesting structure, but that thing is running, essentially using machine learning to analyze simulations of chips continuously 24/7/365. So that place, it's really kind of fun to think about all that's going on, and I've actually taken a tour, it's super cool.

40:20 Michael Kennedy: This portion of Talk Python To Me is brought to you by Hired. Hired is the platform for top Python developer jobs. Create your profile and instantly get access to thousands of companies who will compete to work with you. Take it from one of Hired's users who recently got a job and said, "I had my first offer within four days, and I ended up getting eight offers in total. I've worked with recruiters in the past, but they were pretty hit and miss. I tried LinkedIn but I found Hired to be the best, I really like knowing the salary upfront and privacy was also a huge seller for me." Well, that sounds pretty awesome, doesn't it? But wait until you hear about the signing bonus. Everyone who accepts a job from Hired gets a 300 dollar signing bonus and, as Talk Python listeners, it gets even sweeter. Use the link and Hired will double the signing bonus to 600 dollars. Opportunity is knocking, visit and answer the door.

41:12 David Stewart: You know, we have been using machine learning essentially to design CPUs and validate them. A lot of what we're doing, by the way, is not waiting for the silicon to be baked before we figure out whether it works or not, we actually have a lot of simulation that we're doing, you can actually buy it, it's something called Simics which we actually are able to produce simulations of all the things that are going on on the chips, right. And so we're actually able to run a ton of workloads and programs through this thing before the chip ever appears, right. And so we're able to run essentially whether it's Python, Java, you know, any number of things through these simulators so that, by the time the silicon comes out of the fab, it actually already runs all of this stuff. So there's a lot of stuff that we're doing to accelerate the design of the chips.

41:56 Michael Kennedy: Yeah. I think it's gonna be, 10 years from now we're not even gonna predict it, the majority of the stuff that's happening, right.

42:02 David Stewart: Well, think about what happened 10 years ago, wasn't, you know, I mean, Facebook or any of these other things on the Internet, Google, all these things were around but it's like, the concept of how they've affected our lives now.

42:12 Michael Kennedy: Yeah, it was just the dawn of Internet as a usable thing for everyone, right.

42:18 David Stewart: And it's been fun to be a part of Intel to have really helped fuel this thing. And now I think from our standpoint, one of the things that's very exciting is to say hey, how can we project the future better? 'Cause you're talking about how to figure out how things run better in the future, one of the things we're doing is a tremendous amount of work in the whole area of benchmarking and performance. Right, 'cause you think about, we talked about various things like this Instagram Django benchmark that we're working on. There are other various codes that we're working on for the Python distribution. But one of the things that we're doing is kind of really looking at the whole area of AI as an area, and it's like, how do you benchmark that? Or think about big data, think about if you maybe have, you're standing up Cassandra and Kafka and NodeJS and all of these things in a system. How do I figure out what the performance is today? And then how do I project forward performance on some of these things, right? And so there's a whole area, I'm incredibly excited about this, it's that you're gonna start seeing more and more of this from us, I'm working on a lot of it myself. I've seen us really take a much stronger position out there to try and help contribute some of this stuff to the industry. And so you can take your Instagram Python Django benchmark, for example, and evaluate what is this gonna work against this CPU versus that CPU or this vendor system versus that one, this public cloud versus that public cloud. These are all things that I think are incredibly powerful to think about. Well, the control now is with you as a user to figure out what kind of choices do I make? So we're doing a lot in that sort of space, 'cause we sort of believe that, in the data center, performance is king, right. It's like, and people have come to expect from us every CPU generation to have a good whatever it is, 30 or 40% boost at the right same price point. So performance is king as far as we're concerned in the data center, and we're doing a ton of stuff to try and drive the future and use this whole area benchmarking and workload. We would love, by the way, from the community standpoint, if they have representative sort of workloads that they'd like to work with us on, we would love to get involved with that, 'cause that's something we're incredibly excited about.

44:20 Michael Kennedy: Yeah, I think that having realistic workloads makes a super big difference.

44:25 David Stewart: Take your MySQL, your website, right, and think about the data marshaling issue that you're going, would love to be able to have that as kind of a standard piece of what we're looking at to make sure either the CPU runs it really fast so we can go in with the library providers to make sure that stuff gets accelerated, right, so those are the kinds of things we absolutely wanna stand up, and we think there's a dearth of these things, actually, representative benchmarks that will help people visualize what's going on in the data center today, because it's not just like your old database, there's still a lot of database out there, you know, your big SQL databases running relational database, transaction processing. All this stuff exists. But there's a ton of new stuff in the data center today, and we believe that Intel will be contributing strongly to this area.

45:07 Michael Kennedy: So you guys, I feel like over, broadly across the industry, there's a mind blowing opening into open source from all sorts of companies that you just wouldn't expect, right. I mean, the stuff that Microsoft are doing, Facebook with their, some of the open--

45:26 David Stewart: HHVM and the open data center project.

45:29 Michael Kennedy: Yeah, the data center stuff that people have, just, so you see Intel contributing more to these open source projects in order to make your story back at the data center better potentially.

45:38 David Stewart: Absolutely, I mean, Intel has been, for the past few years, the top one or two contributor to each Linux kernel release. So you go back in time, who are the top contributors to the kernel? Intel has been like number one or number two for years now for each kernel release. So that in and of itself represents a very strong commitment to open source at least at the core, right. All of the work that we're doing on open source code, right, so whether it's Python, whether it's open source databases, this is a very strong commitment to open source, absolutely.

46:08 Michael Kennedy: That's awesome. We're kinda getting near the end of the show, I have two questions. And I'm gonna mix it up a little bit, 'cause normally I have the same two questions.

46:15 David Stewart: I kinda biffed the last time on your standard questions.

46:17 Michael Kennedy: It's all right. So the two questions are, Sergey, I'll start with you. If you're gonna write some Python code, what editor do you open up, what do you usually write your code, your Python code in?

46:25 Sergey Maidanov: I usually don't write Python code.

46:29 Michael Kennedy: You're analyzing how it runs.

46:30 Sergey Maidanov: I am Outlook guy.

46:31 Michael Kennedy: Outlook.

46:34 Sergey Maidanov: I typically use Spyder.

46:35 Michael Kennedy: Spyder, okay, yeah, sure, Spyder's good. The Continuum guys, I don't know if they're here, I'm sure they are, I haven't been able to do the rounds yet, but that's a cool thing that comes with Anaconda. David?

46:44 David Stewart: Ask Suresh.

46:45 Suresh Srinivas: I recently took a class at university, it's a local organization. I've been loving Jupyter, Jupyter--

46:51 Michael Kennedy: Oh yeah.

46:52 Suresh Srinivas: Then writing Python code as well as writing script documentation and also taking other people's codes and forking it and modifying it.

47:01 Michael Kennedy: Yeah, if you wanna visualize and play with code, then Jupyter is amazing, yeah yeah. David?

47:06 David Stewart: My fingers are programmed with Vi, I'm sorry, I'm an old guy, my fingers are programmed with Vi, it's the only way muscle memory works with me, so yeah.

47:12 Michael Kennedy: So yeah, there you go, awesome. Then I guess I'll ask you the standard questions, well I have one more. Suresh, there's a ton of packages on PyPI. Over 100,000 now, which, it's partly why Python is such an amazing community, like all these different packages you can just install and use. Think of a notable one that maybe people don't know about that you've come across.

47:29 David Stewart: I should've prepared you guys for this question. But it's good for you to be surprised, I think.

47:35 Suresh Srinivas: I think some of these lightweight web development ones like Flask.

47:41 Michael Kennedy: Yeah, Flask is amazing.

47:42 Suresh Srinivas: Django is like really popular, but people are using Flask for some lighter weight things.

47:48 Michael Kennedy: Yup. A lot of APIs built with Flask, like we have the, we also have the Django REST framework guys here, so yeah, for sure. How about you, David?

47:55 David Stewart: I'm gonna suggest people check out, I don't know if it's in PyPI or not, but BigDL. It's a great thing to check out, absolutely. It's a big deal.

48:03 Michael Kennedy: It's a big deal, it's awesome. All right, so here I wanna throw one more in as a mix. Since you guys have a special vantage point towards the future. Predict something interesting that will come out in 5 years that we would be maybe surprised by. Like, just in computing in general. Suresh, go first.

48:22 Suresh Srinivas: I think AI is going to be really really pervasive. Much more, from your glasses to the clothes you wear to all kinds of things, car you drive.

48:33 Michael Kennedy: Yeah, I can definitely see on automobile AI processing for sure. Yeah, this edge processing stuff, yeah. David?

48:41 David Stewart: I would love to see a more organic approach to computing. You know, our artifacts are slick and carbonized or aluminized or what have you. I would actually like to see computers made out of natural wood cases with maybe some mother of pearl or, you know, something that would just actually be more human, I mean, almost a steampunk kind of approach or more organic approach, I'd love to actually see it become a more organic part of our lives as opposed to dehumanizing.

49:10 Michael Kennedy: Sure, well, as it goes into this IoT of everything and we have these little chips that run Python, MicroPython and other things, it's much more likely that we'll have little computing things that are more adept rather than beige boxes or aluminum boxes. Sergey?

49:27 Sergey Maidanov: I think whatever direction industry will go, Intel will become, will stay relevant and be at the core of all these transformations. That's my prediction.

49:35 Michael Kennedy: Yeah, you guys, you will be there. So here at PyCon, in Portland, Oregon, you guys have a big presence here. Just one quick fact that I think people might like to hear is how many Intel employees do you guys have in this general area?

49:48 David Stewart: The exact number as of whenever your audience listens to this may be different, but it is true that as, you know, Intel is the biggest chip maker in the world, Oregon is actually our largest site. So we have sites really all over the world, but it's kind of, Oregon from that sort of standpoint is, we're growing not only the new fab processes, the new absolute micro things that are going on into design or the manufacturing, making millions and millions of things that are a few nanometers big, you know, it's amazing. We also have kind of the center of a lot of our software work going on here, as well as the circuit design itself is going on here. So there is nothing against the other parts of the world where Intel does business, but it's, we absolutely have a lot here in Oregon.

50:28 Michael Kennedy: Yeah, it's like over 10,000, right?

50:29 David Stewart: I can't actually give a number, we'd probably be, I would probably be shot if I did, so I don't, no no no, no one would shoot me, but I couldn't tell you that.

50:37 Michael Kennedy: So I guess the point is it's really surprising what a presence you guys have here, right, this is--

50:42 David Stewart: Most of it is in Hillsboro, Oregon to the west of the West Hills from Portland.

50:45 Michael Kennedy: You guys drive traffic jams I'm sure with your workforce.

50:49 David Stewart: We try and stay outside of the traffic jams if we can, so yeah.

50:52 Michael Kennedy: All right, well, thank you so much for meeting up with me and sharing what you guys are up to with everyone on the podcast.

50:58 David Stewart: Thank you, Michael, it's been great, you have a great listenership, I know people who've come up to me, amazingly, said, "Oh, you were on Michael's show," so I was like, here's a shout out to all the great Python programmers out there, really appreciate everything you're doing with Python.

51:12 Michael Kennedy: Yeah, David, Suresh, Sergey, thank you guys, pleAzure as always.

51:15 Sergey Maidanov: Thank you for inviting us.

51:16 Suresh Srinivas: Thank you for all your work that you're doing.

51:18 Michael Kennedy: Yeah, thank you. Bye. This has been another episode of Talk Python To Me. This week's guests have been David Stewart, Suresh Srinivas and Sergey Maidanov. This episode has been brought to you by Talk Python Training and Hired. Hired wants to help you find your next big thing. Visit to get five or more offers of a salary and equity presented right upfront, and a special listener signing bonus of 600 dollars. Are you or your colleagues trying to learn Python? Well, be sure to visit We now have year long course bundles and a couple of new classes released just this week. Have a look around, I'm sure you'll find a class you'll enjoy. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python, we should be right at the top. You can also find the iTunes feed at /iTunes, Google Play feed at /play, and direct RSS feed at /rss on Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. Cory just recently started selling his tracks on iTunes, so I recommend you check it out at You can browse his tracks he has for sale on iTunes and listen to the full length version of the theme song. This is your host, Michael Kennedy. Thanks so much for listening, I really appreciate it. Smixx, let's get out of here.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon