Monitor performance issues & errors in your code

#59: SageMath - Open source is ready to compete in the classroom Transcript

Recorded on Tuesday, May 10, 2016.

00:00 What do you do when you are a high caliber mathematician or a scientist and you want share your algorithms and code? This sounds like a job for github, but the problem is often this work is done on proprietary platforms such as Magma, Matlab, Mathematica or others.

00:00 Not only can you not share your licenses for say, Matlab, but there are often proprietary separate libraries and tools for specialized work. These are expensive products. One example from my distant past was using the Wavelet toolbox on Matlab. Matlab is 2,000 euros and the Wavelet library is another 1,000 euros! So to share my code, you must have both licenses for yourself. This is a problem.

00:00 Well, if you're William Stein you take this problem and turn it into an opportunity to build an open source competitor to Matlab and related platforms. This episode is all about SageMath, an open source, feature rich option for scientists and mathematicians built by over 500 contributors and consisting of over 500k lines of Python and Cython code.

00:00 This is Talk Python To Me, episode 59, recorded May 10th 2016.

00:00 [music intro]

00:00 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem and the personalities. This is your host, Michael Kennedy, follow me on Twitter where I am at @mkennedy, keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython.

00:00 This episode is brought to you by Hired and Snap CI, thank them for supporting this show on Twitter via @hired_hq and @snap_ci.

00:00 During the conversation, we talked about running SageMath on Windows and some of the challenges there. We mentioned that the new Ubuntu subsystem coming to Windows 10 this summer may make this easier. Well, just after we recorded this episode someone did indeed post a message to the SageMath mailing list showing it running on the Ubuntu subsystem on Windows 10, so look for more goodness in that are down the line.

00:00 Now, let's chat with William.

02:18 Michael: William, welcome to the show.

02:20 William: Thank you.

02:21 Michael: It's super exciting what we have to talk about today, we are going to talk about building quite a large ecosystem of computational mathematical tools with Python and a bunch of other related technologies, Cython, C, IPython, lots of stuff. It'll be a lot of fun. But before we get into it of course, tell me your story, how did you get into programming in Python? You are a mathematician by training, right?

02:47 William: Yes, so I've been into programming since I was like five years old, but I got very much into mathematics, as an undergraduate I was a computer science major for a little while and then I switched to mathematics since I loved it, especially number theory. And then I went to graduate school uc Berkley from 1995 to 2000, and while I was in graduate school I got heavily involved into computing with various objects that come up in number theory and so I had to write a lot of code in order to compute with those objects, so I just wrote a huge amount of C++code using a lot of the libraries in the mid 1990s, it was really painful, you know, it take like 30 minutes or an hour to compile, I did lots of templating. And I wrote a little interpreter so I could create modular forms, mathematical objects and compute with them, and I was doing this sort of all in isolation without using much beyond some C++ libraries and then I found out about a computer algebra system called Magma which was extremely powerful for computing in abstract mathematical domains, it's king of, it's in a way kind of a competitor to Mathematica or Matlab but aimed much more abstract mathematics so it has heavy support for group theory, abstract algebra, graph theory etc.

04:14 And very little support for symbolic calculus. But it's what I needed for my PhD thesis work so I worked really heavily on it, it's extremely powerful system and I wrote maybe 30 or 40 thousand lines of code that are included with that software. I went down and visited them in Sydney Australia a few times. But, one frustrating thing about Magma is that it's closed source, I saw a talk by Manjul Bhargava who recently won the fields medal, but back in 2003 he was talking in Paris about using Magma for some research on quadratic forms, and he was running into some brick walls due to it being closed source, and I also kind of took stock on my own computing environment at the time, I used Linux, Emacs and everything on my computer was open source except Magma which was the one program that I cared about, and which really mattered for my research, so I thought that was ridiculous.

05:08 Michael: Right, that's the most important thing, the most important thing that you actually were working with was the one that you had the least control or feasibility into, right?

05:18 William: Right. Exactly. And, at that time I was also using Python a lot, not for mathematics but for kind of scripting mathematics, so if I wanted to do a bunch of computations on a cluster I would start them, I would write little scripts via Python to startup the computations, I wrote websites using Python, so I was using Python a lot kind of as a complimentary tool to mathematical computation. And so I was at least familiar with Python because of this and it was back in 2002/2003 when Python was more primitive than it is now.

05:51 Michael: Right, that's just a little before it started to get its scientific legs under, it were things like Numpy and Scipy and it wasn't quite there but it was on the verge of that breakout right.

06:02 William: Yeah, but it still was a pleasant interpreter environment and it had modern features like multiple inheritance, you could define your own data types, it had good exception handling etc. And so I really appreciated some of the functionality it had over say Magma, Magma though has a massive libraries, it's incredibly powerful, it misses a lot of basic features of a modern programming language which by modern I mean, say at least as good as C++ in 1993, I was used to at least having exception handling and being able to define my own classes and so on.

06:36 Michael: Right, that's one of the challenges of not just Magma, I would say but things like R to some degree, Matlab, Mathematica, all these various highly specialized tools that are great at solving some type of computation or some kind of problem, but when you have to go and build the full application out of it or something, it kind of hits the wall, right?

07:00 William: Yeah, exactly, there are whole bunch of special purpose math languages designed by mathematicians and though they are pretty good for mathematics, they don't really compete with what you get when you have computer science people systematically put full time effort into designing a language and then a whole ecosystem around that language. And so, the difference between trying to write a non trivial program in Python versus Magma is, in Python you have this massive ecosystem with tools, plus a language has a very general purpose, whereas at Magma or Mathematics or Matlab it's really good maybe for math things but for the other parts, just manipulating strings or whatever it can be pretty painful.

07:41 Michael: Yeah, absolutely. You were working with Magma, and you couldn't really use it in the way that you wanted to, sort of like a web front end or as the back in for a web front end, and that's one of the problems you ran into a lot with these types of software, I am specifically thinking of Matlab because I have most experience with it, but there is all sorts of little, you know, pay for this bit, pay for that bit, it's not registered on that machine, like the rules are just crazy, right, and so you got frustrated with this, yeah?

08:11 William: Yes, so in particular with magma, the project director sent me a message saying that I wasn't allowed to use it, it was a general purpose compute engine on the back end from my websites, and that was very frustrating, it wasn't just the matter of pay them or not pay them, or get the right license, since there are little more informal than that, they don't really have kind of official licenses, so that was like a huge wakeup call that I really needed to do something, and so I decided I was going to survey the open source landscape again for math software and see what's out there and then switch, but when I looked around it was really kind of frustrating back then, there were a number of programs that kind of computed with Magma, but were much more specialized, and despite Magma's language being fairly primitive compared to say Python, the other systems had languages that were much more primitive yet, so I mean, it's kind of hard to emphasize how primitive they are, it's kind of like using simply programming, I mean, things like different name spaces, that sort of thing is an advanced feature compared to what's available at some of those languages.

09:21 I couldn't really bring myself to go back to that, and also there is a huge amount of functionality that got implemented in Magma over about a decade and it just wasn't implemented anywhere in open source which was a serious problem as well. So, there is just a massive luck of functionality, so what I decided naively first to do is just choose a language and I spent about six months to a year evaluating options like Ocaml and directly writing something myself in C++, and using Python and at the end of the day I choose Python because the language was most similar to Magma and so I thought it would be easier to get users over from Magma, and also the Python C API was kind of like what Magma had but much cleaner. One thing about Python is that if you want to write a function that you can use from the interpreter and is really fast, you can write it in C if you need to and it's just as fast as if you wrote it in C and that's kind of a basic requirement if you are going to write code that would be used for research, that you can write very fast code.

10:27 So, Python was good enough and then I decided I would just sit down and Python C and me and I would look at maybe code from other projects and lookup algorithms and implement everything I needed. And this lasted about three hours, and I realized how insanely hard that would be to do, and it completely switched courses and decided that purely our of laziness I would try to wrap existing code in whatever way I had to do so, so using Pexpect pseudo pty type stuff, using C libraries and we are adding C libraries to existing computer algebra systems, etc.

11:05 Michael: Maybe not everyone knows, I expect to actually a lot of people don't know, they know C of course, but what is Pexpect?

11:12 William: One way, if you have a say Python program and you'd like it to interact with some other kind of- let's say there is a program called Simetrical or something, just some one researcher wrote it as a command line interface, and it doesn't have any C library interface or anything, but you really want to call functions in there and run some code there and get the results. What Pexpect will do is let you basically simulate the terminal and use Python to programatically feed code into the other program, look at what output it produces and then parse that output and return it, so it abstracts things away and it makes it look like you are calling some functions in a Python library but really behind the scenes things are really more or less being copy pasted into a terminal. It is nice because you can interface with the close source black box this way if you need to, and the drawback is that it's brittle and it's potentially slow, there is a lot of latency instead of well, there is a several millisecond overhead to every single thing you do which is painful, you really want the overhead to be a microsecond not milliseconds, so there is like order of 1000 latency which can be really annoying.

12:25 Michael: Interesting, so yeah, so you were naive and hopeful, and just I want to go write this, and then you basically decided all right, forget that let's try to find all the good building blocks that are already built and build upon them and that's a very open source way of problem solving.

12:42 William: I think about it this way, like when I worked with Magma, my team was all the other people working on that project, and when I switched to trying to do something open source, the team was all other people in the entire open source sort of world, everybody that contributed to open source, and definitely the easy way to go would be to choose the best libraries for number theory and algebra and graph theory and so on and put them all together, and they weren't kind of built to be put together, or to be called from Python, but no matter how hard it would be way easier to provide a library interface to one of those other systems that you can then use from Python very efficiently than to write one of those systems from scratch, like each of those systems took decades and decades of work to write.

13:30 Michael: Yeah, and to debug them and fine tune them and you know, that kind of stuff is very hard to know even if it's right some of the time.

13:37 William: Yeah, and a lot of the algorithms you really kind of have to be in the throes of a PhD, focused for years just to understand what the algorithms are and how they work and like, there is no way one person can replicate that, despite wanting to.

13:50 Michael: That's interesting, so in the end you decided all right, Magma is cool but it's got all these restrictions and I am going to go put this thing together, and what you built was called SageMath, right?

14:02 William: Yes. I built Sage Math which is well, basically what it is, it's a Python library, but many of the dependencies are these tricky to compile C programs of Fortran programs or whatever, which were written by mathematicians only to run on Linux maybe, just for their research, and so in addition to just being a Python library it's also built system, kind of like Anaconda but obviously a little bit before that, and it targets different types of packages, it's a build system, it's a Python library, and then to make it more friendly for mathematicians it has a customized IPython command line and also around 2006 we wrote a web based graphical interface which is like the IPython notebook but we wrote it a few years earlier. And it was before things like web sockets and a lot of ice modern Javascript functionality existed so-

15:05 Michael: Back in the dark days of Javascript, when it was earning its bad reputation.

15:09 William: Yes, it was we started writing it when I think Google maps and Gmail appeared and there was this thing called Ajax which let you update web page without refreshing, and so we had that little bit of functionality, but the thing we didn't get was a persisting connection, so instead we had to do pooling techniques and all kinds of tricks that were ugly.

15:30 Michael: Yes, very interesting.

15:32 William: In order to make it like when you do a for loop in Python you want to see the output as it appears and in order to do that we had to do a lot of hacky stuff.

15:38 Michael: Yeah, so I think the way maybe to conceptualize it is this is an open source alternative to something like Mathematica, or Matlab maybe, but it leverages a lot of the good data science, scientific tooling of Python like IPython notebooks for example.

15:55 William: Exactly. So, initially I just wanted it to be good enough for mu number theory research, number theory being my research area, but the second developer wasn't a number theorist, the second person I could get onboard did research in coding theory, and so he wanted a bunch of functionality that had little to do with number theories, and then he also wanted to teach the undergraduate courses like calculus and differential equations, and so I started adding a lot of functionality related to those classes just so the range of applicability would get bigger and I could get a few more developers, and things started growing that way and so the system quickly went from being just for number theory to being much more general, and in order to kind of focus development and decide what it should be we chose the motto or the mission statement to create a viable open source free alternative to Magma, Mathematica, Matlab and Maple. Our focus, none of the developers really have a focus on numerical computation so we in Sage itself and the Sage library focus more on algebric aspects of things, so arbitrary precision, arithmetic, working with very quickly with large integers and rational numbers, doing linear algebra we have no round auth error, that sort of thing, and for the more numerical parts, the kind of viable alternative to Matlab, we just incorporate libraries like Numpy, Scipy etc. The Python community has done amazing job of developing over the ears.

17:26 Michael: Right, that's definitely gain in speed now, right?

17:28 William: Yeah, it's absolutely incredible now, I mean, we wanted all along to be a viable alternative to Matlab, and I remember in 2006 trying to write statistics functionality and all this kind of stuff for Sage directly and just being like I don't want to do this at all but it's part of what we are trying to do to get more users so here I am doing it. And I am so happy the community has come along and just made Python absolutely first three in statistics and numerical computing so that we can just incorporate that functionality.

17:56 Michael: Yeah. That's really great, one thing I think before we get to far into the details would be interesting for the listeners is just the scale of this project, the number of developers, number of lines of code, the technologies involved, could you just give us like a quick sense of that?

18:11 William: Sure. So, the project itself incorporates about a hundred packages, we have done a lot of development we call Sage days workshops, and those, we often have about 10 to 15 per year, we've had I think 80 sage days workshops so far, they were kind of inspired by PyPy workshops that happened about couple of years ago, but basically, we come together for a week and do lots of development. The number of contributors to Sage is around 500, so those are people who have contributed directly to the Sage library, of course, Sage itself incorporates all these other open source packages which have their own contributor groups, the amount of source code, I don't know the exact number but it's on the order of several hundred thousand lines of actual code, and the code is about maybe 60% Python and maybe 40% Cython code so there is very large amount of Cython code that we have written. And, functionality wise we do cover very wide range of areas of mathematics, so almost any time somebody comes to use it they find that there is something for the area of math they are interested in.

18:11 [music]

18:11 This episode is brought to you by Hired. Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.

18:11 Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company.

18:11 Typically, candidates receive 5 or more offers in just the first week and there are no obligations ever.

18:11 Sounds awesome, doesn't it? Well did I mention the signing bonus? Everyone who accepts a job from Hired gets a $1,000 signing bonus. And, as Talk Python listeners, it get's way sweeter! Use the link hired.com/talkpythontome and Hired will double the signing bonus to $2,000!

Opportunity is knocking, visit hired.com/talkpythontome and answer the call.

18:11 [music]

20:25 Michael: That's awesome, that's a really large scale project and it's a cool success story. So, it's written in Python and Cython which is a kind of a flavor of Python, but you also choose Python as basically the programming language of the system itself, right, Matlab has its own language, Mathematica has their own language, but you sort of said hey Python is a good match for the mathematicians.

20:52 William: Yes, so this is a decision that I made I would say purely out of laziness originally, as a CS undergrad I had written an interpreter as- I took a course in writing interpreters and I tried to do that again for my research and I've also just seen how hard it is to really develop a programming language and I really didn't want to do it and moreover the Magma group I had often made request for them to improve their language and they responded that they could get grant funding for the mathematical algorithms and a huge amount of trouble getting any funding for the actual language because their funding sources were in mathematics, so I was kind of scared for a number of reasons, off from writing my own language and basically I am like, I mean it just follows the same principle at reusing third party libraries and complete algebra systems to reuse a third party languages well. I decided to use Python, it was criticized a lot by enough people, because like literally every other project like saves it's ever happened up until a point user made up their own language, it was kind of like the thing you had to do.

22:02 Michael: How was that criticism changed over time? Do they still feel that way or has it become more accepted?

22:07 William: Oh, way more accepted, and it's now considered a huge advantage and often when I hear people convince the other people to switch from Mathematica to Sage at all levels both their research and undergrad teaching, one of the biggest maybe the strongest argument is that Python the language is just a better language because the syntax is easier for the students and it prepares them for other possible employment opportunities and down the line, so...

22:36 Michael: That's pretty interesting, because when I was studying math at the university I did study C++ and Scheme and other things because I took some computer science classes but my math education really taught me just the Matlab language whatever you call that, I am sure there is a name for it, but you know, that's a very limited skills set to take way if I go on to the job market, right, but if people as part of learning their math and doing the research, the statistics, or whatever become semi proficient in Python, just sort of silently in the background, that's a really cool side effect.

23:18 William: Yeah, and I have had many PhD students who learned Python really well as a result of working with me and they really do use it at their jobs today. Like, one of them Robert Miler he wrote huge amount of our graph theory functionality now works as a data scientist at Google, and another one, Simon who works as a data scientist but at Facebook, and they use, both of them use Python really heavily in their jobs.

23:45 Michael: That's cool, do you know if they use SageMath there?

23:46 William: I don't think either of them use SageMath for their jobs, but I do know that SageMath is used at Google by some people, but not my students.

23:53 Michael: Ok, yeah, that's very cool. What would you say to people who are maybe just straight up using Numpy or Matplotlib today, maybe they are using an IPython notebook, like should they consider using SageMath and what would it offer them?

24:09 William: It depends on what you are doing, but SageMath does symbolic things and it could be very efficient at doing them, like you have a large Matrix and you want to do something with it and you want to have no rounding errors at all, or you want to do something with linear algebra over finite fields or coding theory, basically, Sage provides a lot of functionality that is complimentary to the numerical things that are offered by Numpy and Scipy, it will overlap some with Simpy, though it has a lot of functionality that isn't in Simpy and in many cases it can be a lot faster, so but also, another remark is that Sage now uses Jupyter notebooks as a default graphical user interface so when you download and install Sage you get Jupyter notebooks. One big drawback which I hope to address in the future is that currently by far the easiest way to use Sage is to install our self contained distribution and it's completely separate really from PyPy the Python packaging repository, and it's not something that you can just install in your own Python environment.

25:14 Michael: Right, there is no pip install Sage Math.

25:17 William: Not yet. I really hope there will be, in the future, but right now this isn't the case. And so that interferes with easy adoption by existing Python users.

25:27 Michael: That's true, but you do have a pretty decent way of getting it, you can go like on the Mac, you can go download just a disk image, or on Linux you can get a tarball. And it's kind of all self contained there, so it's not too hard to get started with it, right?

25:43 William: Yeah. A lot of our users just want to do math rather than deal with an installation and so we do put a huge amount of effort into making it easy to just install. So we have Windows, Linux and OS X binaries ready to go, though Windows 1 currently is a virtual machine that you run and then it provides a server and then you get a Jupyter notebook that you can use Sage through.

26:07 Michael: I see, so maybe you run internet explorer but then your request went back into like a Linux thing on a virtual box or something-

26:14 William: Exactly, that's our main supporting environment right now, in Windows.

26:18 Michael: Ok, yeah, that's an easier way to get some of those libraries to compile. That is a seriously hard problem to get some of those libraries that were not built for Windows, the C libraries, or Fortran, to compile there, right?

26:31 William: It's really, really hard, we tried for years and are still trying to natively port Sage to Windows and I don't know if or when it will ever succeed.

26:42 Michael: Yes, interesting.

26:44 William: It's like, in the Sage project there is a lot of sort tall mountains that are very difficult to climb. And, four or five people will try and fail and then the sixth person will succeed. Like, writing a really good implementation of working with what are called finite abelian groups was one thing where it just seemed like one person after the other tried and failed, and then eventually somebody got it, and I hope porting Sage to Windows will be a similar thing, some combination of the right approach and Microsoft Windows getting more friendly to put 27:15 style programs.

27:16 Michael: Yeah, well, you know, did you, I don't know if you've heard, you are in Seattle right, so maybe you heard, Microsoft in the summer, they are shipping a new version of Windows 10 and it comes with the ability to run native Linux binaries on the command line.

27:33 William: Yeah, so that's just awesome, and Microsoft is going the right direction, we should have something that works natively by maybe using that it sounds really likely actually.

27:43 Michael: It's got to be a huge step in the right direction for you guys. So, I don't know if it'll fix it but it's definitely not a negative. So, one thing that's interesting, well, there is a bunch of interesting things I want to ask you about this, let's start with this one, so doing math is obviously computational, right, computationally intensive and having more computational power is key on modern hardware that means parallelism, right?

28:15 William: Yep, it can.

28:17 Michael: How does that leverage like multi core machines, does it support, like parallel computational parallelism and things like that?

28:23 William: So, one thing that surprises me a lot over the years at least in pure mathematics the sort of parallelism that we have to do is usually what is called embarrassingly parallel sometimes, so somehow it's a lot easier than what people do in numerical computing with earthquake simulation or whatever where they have some big distributive computation and each step depends on previous steps and there is lots of data flying all over the place, and you have to use mpi or something to do it, so the parallelization that typically comes up is you want to evaluate some function f 10 000 dollar use of n and so you just do it pretty easily and there is no real dependencies between them, so basically using fork just using stuff built around fork you can do pretty good parallelisation, that works pretty well for us as I found.

29:18 Michael: Yeah, ok, so in the embarrassingly parallel case, you just basically split it across processes or something like that and the problem is solved, right, they've got their own gil but who cares, because it's theirs.

29:29 William: And you also if use a database, so you have your, you can have a process start up query your database for some things that haven't been done, do them and put the results back in, so simple techniques like that have pretty, I don't know the community pretty happy, some of our parallelisation issues are- there are much more subtle interesting like low level parallelisation techniques that go into Sage, where there is some C library or C ++ library, some Sage developer writes, like David Harvey is a guy who wrote some super fast code for polynomial multiplication and it uses parallelisation techniques, and so Sage is kind of, Sage via Python just calls out to this library, and the library happens to do things in parallel and it gives back the result, and so there is lots of little places like that where we have libraries that just implicitly do things in parallel, they are like highly multithreaded, and you don't even know they are doing it unless you explicitly look- so there is that level ad there is also the level of other forking- I wrote a decorator for Python called Parallel, which is included in Sage so you put app parallel before a function and then if you call it with a list of inputs it will fork the process call the function with each of those inputs up to a number of processors you have and then get the results and give them back to you as a generator so there is some little things like that that are built into Sage and it takes care of some possibly tricky surprises that would happen behind the scenes involving subprocesses and so on.

31:05 Michael: That's really awesome, that's a very Pythonic little statement you made there, I'm going to put a decorator on it, it's going to for this off and return it as a generator, I mean, that is really great.

31:16 William: Yeah, it works really well, we have code that like nested functions written in Cython that just happen to have an Perl decorator on them and then they'll fork and regroup properly and the code itself is just probably two pages code, maybe three pages, so it's not a lot of code.

31:32 Michael: Yeah, it's just getting the right concept, or the zen of the style to make it really expressive or whatever, right.

31:40 William: Yes. There is a ton of pieces of little things like that all over in Sage where I would like to separate them out of Python libraries, like individual standalone Python libraries and then have Sage pip install them as part of its kind of built process, so that would make them much more widely available to the community and then when they are running the context of Sage they would know to do the extra things that Sage needs and they wouldn't do those things otherwise. We also have a pre parser so for mathematicians, just an example, if you type 2/3 in Python 2 which is what Sage uses you get 0, in Python 3 you get 0.6666666 and neither of those are what pure mathematicians expects, pure mathematician expects to get 2/3 which is like an exact rational number, so in Sage we have a little pre parser, so when you use Sage directly its own command line which is this modified IPython interactive command line it will take each line and then do a little bit of parsing to it, for example when you type 2/3 it replaces it by integer of 2/ integer of 3, and then that allows you to make your own custom integer type. And then there is a bunch of other things similar to that. Like, for example, when you use the carrot symbol, which in Python means exclusive or, a lot of mathematicians are used to that meaning exponentiation because that's what it means LaTex.

33:03 Michael: Right, as opposed to double star or something like this, yeah.

33:05 William: Yeah, it's really like so in Sage when you type the caret symbol it gets converted to double star before it gets sent to Python, so you don't do- mathematicians almost never do exclusive or but they do exponentiation constantly when typing in polynomials all kinds of things. So it's optional thing that makes the Sage interface more 33:27 mathematicians, it gives us our own kind of new language but in a very well defined minimal way which is just built on top of Python, and most cases the things that get pre parsed would be invalid Python code. And, it would be nice to connect us with Python packages, it would be nice if we could take this pre parser and make it a standalone Python package, which Sage just happens to use. And, maybe, like we have various interesting functionality and they are like square [1..10] gives you the list of numbers from 1 to 10, it's kind of like the notation you'd have in Matlab or Maple instead of using the range function, and there was a Python pep implemented but it didn't get accepted, but so many Sage people wanted to Robert Bradshaw just implement it anyways, and add it to the pre parser. So, it would be kind of neat if this whole pre parser would be more generally available and then Sage just happened to use it.

33:05 [music]

33:05 Continuous delivery isn't just a buzzword, it's a shift in productivity that will help your whole team become more efficient. The Snap Ci's continuous delivery tool you can test, debug and deploy your code quickly and reliably.

33:05 Get your product in the hands of your users, faster and deploy from just about anywhere at any time. And did you know that Thoughtworks literally wrote the book on continuous integration and continuous delivery? Connect Snap to your GitHub repo and they will build and run your first pipeline automagically.

33:05 Thank Snap CI for sponsoring this show by trying them for free at snap.ci/talkpython.

33:05 [music]

35:18 Michael: Maybe it could even have different grammars for different areas like maybe geologists do something different than mathematicians, who knows.

35:26 William: Yeah, and so maybe somebody is proposing some great matrix operations to include in Python eventually, and maybe they get out of the Python 3, but they are not in Python 2 you can use the pre parser and still have those. And everything just gets converted to straight Python so you could convert everything and then stick it into Python library if you need to.

35:43 Michael: That's cool, you said that in your documentation, you say you have a UI kit, for adding user interfaces to calculations and the app. That sounds pretty cool, what's the UI technology there and how does that work?

35:55 William: I am not quite sure what that refers to.

36:01 Michael: Yeah, ok, no worries. That was written long ago, ok.

36:04 William: And there is a lot of people- the other problem is there is 500 contributors, so, I have very limited knowledge.

36:10 Michael: Sure. No worries, I'll ask you another sort of math question, like so one of the things when you are writing scientific and especially mathematical papers for academics and so on, is use LaTex, so you can very precisely express symbolic mathematics like integral signs 36:27 and things like that. And this is built right into it, right?

36:33 William: So the connection between Sage and LaTex is of course mathematicians all use LaTex but for any- so we have a dunder method, it's really just a single underscore method, _latex_ and it's defined the most objects that you create in Sage, and it gives back the LaTex representation of that object, it works a lot like the repr method in Python but it's to give you a LaTex representation rather than-

36:58 Michael: Right, right, that's awesome.

37:00 William: Yeah, so we have and it kind of recursively goes down the steps, if you make a matrix or entries that are rational functions over some other thing then it will there is automatically a way to turn that entire matrix into a LaTex representation. And, that gets used in the graphical user interface fairly automatically if you want like in Jupyter notebooks or in Sage worksheets. Also, we have a LaTex package called SageTex, which lets you write a LaTex document and then use commands like /sage right inside the LaTex document so you can write something like consider and then a $/sage and then in braces you can put some expression and it will get evaluated using Sage whenever you LaTex the document. And then the results cashed, and you can also put blocks of code and the output gets automatically put into your LaTex document. So basically it makes this so your LaTex document can have executable Python code, there is a similar thing like this for just Python but for Sage it also gives functionality for graphics and it knows that Sage objects can have a LaTex representation and it uses that automatically when typesetting the results.

38:11 Michael: Yeah, that's really cool.

38:14 William: It's something that was written at the first Sage Days way back in 2006.

38:18 Michael: Yeah, yeah, really nice.

38:20 William: Long, long ago and it's been developed ever since.

38:22 Michael: So, another part of this system that I think is really interesting is Cython. And, Cython actually, this I had no idea about, I just wanted to talk to you about SageMath but Cython actually came out of this project, right?

38:35 William: Yes, in the following sense, so when I started Sage I mentioned before that Python + the Python CAPI was kind of like the killer combination that meant that I could implement when I wanted in Sage on top of Python. So I had these little benchmark programs and I would try to implement them in other languages like Ocml like these really fast functional compile languages and I would get something that was fast but it wasn't quite as fast as I could write in C. And if you kind of look, basically overall C lets you write really fast code, and Python lets you use C code to write new functions, but the Python CAPI is pretty challenging and potentially error prone, you have to do manual reference counting all over the place and the ways in which the Python CAPI functions work could take a while to learn and I really envision that a lot of people including me when I am tired or just really want to get work done writing code, and I wanted to write a huge amount of code and I didn't think it was really viable to write directly against the Python CAPI. I would mess up some reference counting and get faults and that just wasn't what I wanted to do. And so I stared planning to write some Python program that would generate code against the Python CAPI and I got excited about the possibility doing that and then I start searching around and found a program called Pyrex which is like glassware, P-Y-R-E-X by Greg Ewing who is a Python contributor, and he wrote this amazing program where you can write code that looked almost like Python but it would get converted into C extensions and those will get linked into Python. And so that was really amazing but it lacked a lot of functionality, like it didn't have list comprehensions and there were just a lot of little things, you couldn't do nested functions, also you didn't use revision, you just kind of made a zip archive every once in a while. It seemed like your work done at mainly doing the Christmas break each year.

40:33 Michael: Those were the early days, long ago.

40:36 William: Right. So, this is like 2004, 2005 and it was absolutely awesome project but it kind of wasn't really going anywhere and so I started adding functionality I needed to Pyrex and that was, I mean, I don't really like writing compilers very much, it's not my thing, I am not very good at it, but I had this PhD student Robert Bradshaw who I mentioned before who now work at Google and he started diving in and writing a lot of really tricky stuff and doing all these optimizations to use Pyrex and I thought that was pretty cool, and then another guy Stefan Behnel who is the developer of lxml which is a Python library, he was forking Pyrex to add all kinds of thing as well, and then at the same time I made up a project called Sage X for making Sage faster and it was just a combination of some of these Pyrex works, the name was really bad though, Sage X and we were just distributing inside of Sage, so that was kind of stupid, and so after about a year of that, I looked at this book by Karl Fogel called Producing OSS it's on his website producingoss.com, and I think it's inspired by the subversion project, it's a great book, and I started- like, when you read a book, to me like when I read a book, I kind of exercise think about a lot of stuff while I am reading a book, and so I started thinking about SageX and Pyrex and then, I was brainstorming for a name for this thing and kind of thinking about I should make this in your proper open source project, and then I came up with this name Cython which is like Python but C, but I thought it was a better name than SageX which is hard to say and it doesn't have anything to do with Python or C, and so I Googled it and there was only one hit which I could find and it was a picture of this punk rocker in England who called himself Cython flipping you off. And so I thought I'm safe.

42:31 Michael: [laugh] He probably had not lay claim to this really.

42:34 William: Exactly so I thought I'm safe, I could choose this as a name, and then I talked to my PhD student Robert and Stefan Behnel and I'm like so I am going to start this project and make you the lead developer and each of them said no. And so I was little worried about that so I made them both the lead developers, and kind of like step back from it, made the website, bought the domain name, and they both sort of decided not to be the lead developer, they together did an amazing amount of work and then other people started contributing and I mean it really has blown up a huge amount since then. So, I did try to push their direction to be make it so Cython can compile all of Python, like you could take the standard Python test suite and just build it and see how fast it is compared to Python and the system has really got deep, because part of the test cases for Cython is compile all of Sage which is huge and then run the Sage test suite, and also there are certain things we need like when you are running blocking C code you want to build and hit control C and have that interrupt the running code and so in Sage we put a lot of work into writing something that would let you interrupt blocking C code that was written using Cython. Martin Albrecht recently separated that out as a separate project that you can pip install, so you can use it in your Cython code.

43:56 Michael: Right, ok. That's really cool.

43:58 William: So that's kind of where Cython came from and I think it's been steadily progressing over the years, very often there are things that come along which kind of sound like they do something similar, but Cyhton is just so battle tested and no matter what, like you want to write some fast code you can use from Python and you are willing to understand the basics of C data types you can write, you can use Cython and do it, and get it done, and if you want to make some code that is usable from C or C++, like a C or C++ library available to Python with no overhead then you can do that via Cython. So it's very different in SWIG- standard wrapper and interface generator where SWIG will automatically take a C++ library and make it usable for Python but there is some Python code between the C++ library and you and that slows things down a lot. So, with Cython we do things like make it so you can use a third party library that provides basic arithmetic addition and multiplication saying a finite filed, and we can make that so it's very fast, so it just takes on the order of a 100 nanoseconds to do additions and multiplications. So it really eliminates overhead and makes using very low level C code possible.

45:09 Michael: Yeah, that's really awesome, how applicable is that in general, is this something you would only use on math or you've got like a web service back end and there is a section that needs to go faster? Would Cython be something you might consider?

45:21 William: I think it's really aim towards very compute heavy loads, so like it doesn't help much if your problem is IO bound, and then like the new Python async stuff is probably way more useful, something like twisting, so it's really all about compute heavy, non asynchronous- you have asynchronous thing, like the for loop or you have a double nested for loop that is doing some calculation in the inside, you want to make that one part really fast, as fast as you could possible write in C, that's what Cython lets you do.

45:56 Michael. Ok, very cool.

45:58 William: And that's the sort of thing that comes up a lot, in math programming.

46:01 Michael: Yeah, of course it does. Matrix, multiplies and all sorts of things right. So, the sort of next step that you might take this whole project is rather than having people download 800 MB of all those packages and run it locally, just fire up a web browser, and so you started cloud.sagemath.com, right?

46:21 William: Yes, I started that in 2013, and my initial motivation was that I had been teaching courses for just repeatedly every year for about a decade to undergraduates, on how to use open source math software for kind of math, so I get kind of 40, or 50 math cs stats econ undergrads and the whole course is about how to program in Python, how to program in Cython, use lots of functionality from Sage, use Numpy, use LaTex a little bit, and the installation challenges for the students were pretty bad, and so for quite a while they would often just use the Sage notebook server we had running, but that could only handle maybe a dozen people at once, and it really wasn't very scaleable. And all it did was give you Sage so you could type in Sage code and see the output, but it didn't have a terminal, it didn't have LaTex, so I decided to write something that was much more all inclusive where you could teach a whole course about anything related to basically open source scientific software and do it all in one unified place where the students in about 15 seconds are up and running, so that was the motivation, and something I've refined over the last few years, I've taught using it a couple of times now and written some research papers and a book all within Sage Math cloud and there have been several hundred other people who have been on courses using it now.

47:48 Michael: Yeah, that's really great, I can just imagine the first day of these sort of computational math classes, so what we are going to do is we are all going to download this like my download canceled, mine won't install, mine installed but it won't launch, so it's like, why this is not what this class is about, right, but you've got to plough through that if it's on the machines, right, so this is really cool and you know, it's a really sleek web app that you guys have built.

48:17 William: Thanks. So I've been working on enormous amount for the last couple of years with some input from Jonathan Lee and Nicholas Ruhland who are two udab undergraduates and also Harald Schilly who has been working on Sage since about 2007 and who lives in Vienna Austria. So we've written this application and it has I don't know, around about 4000 users each day, and it will often have about 400 active users at once, so it's pretty heavily used, and the usage is heavy because people are really running code and they are running lots of Jupyter notebooks and Sage worksheets and other LaTex documents, maybe running long term calculations, it all runs on top of Google compute engine but it's also entirely top to bottom open source and some people download it and install it on their own computers.

49:10 Michael: Ok, that's a really interesting component of it that you can get the local server version.

49:17 William: Yeah, and it provides so what it gives you is real time collaborative editing of terminals Jupyter notebooks LaTex documents, it also has a course management system built in for assigning homework to students and collecting it, and everything is collaborative and that you can see other people editing kind of like Google docs, and there is also chats and stuff.

49:37 Michael: Yeah, that's fantastic. And, you've got a lot of features there, like it's not just for say running Python code, but you can run C or Go or there is ability to open terminals, just Linux bash shell or something like this, right?

49:51 William: Yes, absolutely. So it's kind of like those online ides but you know, like cloud9.io or one of those but our target audience isn't at all programmers, it's just people who especially students who just want to get up and running quickly with open source math related software. And we also have really good support for R and Julia and Octave and so on. Especially because Jupyter notebooks have kernels for all these languages.

50:18 Michael: Right, just pass it right along to Jupyter and let it deal with it, right?

50:21 William: Yes. Exactly.

50:23 Michael: Nice, so you've got a lot of cool stuff in your github which I'll put in the SageMath github and I'll put a ling to that in the show notes, and one I saw in there was stuff to do with Docker, how does Docker fit into this?

50:35 William: First, there is a SageMath Docker image that Eric Bray who I think is a guy in Europe, in France who I think as the distinction of being the first full time person to work on Sage, so nsf grants for Sage have kind of dried up recently but in Europe they just got this huge 8 million dollar grant called Open Dream Kit, it's a across the European Union and it supports open source math software, including Sage, so the Europeans are really supporting open source math software. And Eric created some really nice Docker images for running Sage, so you just do like docker run - something sagemath and you are up and running with Sage, so it's yet another way of installing Sage. And there is a Jupyter version of it. So that's one thing, so SageMath cloud doesn't currently exactly use Docker, we use C groups and a lot of the stuff that Docker is built on, but I think it will use Docker extensively and in next few months under that. You won't know the difference but it will just make things run a little smoother and be more scaleable.

51:41 Michael: Yeah, and let the Docker folks manage that kind of stuff, and not you, right?

51:44 William: Yeah. Exactly.

51:46 Michael: Yeah, very cool. All right, we are getting kind of to the end of the show, one thing I wanted to recommend for people is if this is all interesting to you guys go to cloud.sagemath.com, create an account there and check it out, but there is also a video like right on the landing page there that gives the two minute story of this, if you want to share it that's cool, it features you and a bunch of other people in there, which is cool and you've got some pretty sweet skating in there.

52:13 William: Yep, I own the biggest skateboard ramp in the North West and I skate it a lot, so I am really into skating.

52:21 Michael: Wow, that's really cool, this is at your house?

52:23 William: It's at a friend of mine's house, thankfully. It's not at my house. [laugh]

52:28 Michael: Nice. That's really cool. When I was growing up my brother built a quarter pipe so we would periodically hurt ourselves, that was cool.

52:35 William: I got into skateboarding with my brother, we skateboard together a lot still.

52:41 Michael: That's great. All right, I think it's probably a good time to wrap up our conversations so let me ask you two questions I always ask my guests- first, even though you can't pip install SageMath,t here is still a lot of packages that you guys must use, so do you have a favorite PyPi package that is really helpful to you that maybe other people don't know about?

53:02 William: Well, ok, let's see, first, again, I really wish the answer was SageMath, I posted my first ever package to PyPi- how do you say it, P-Y-P-I?

53:15 Michael: Guido says PyPi [pai pi ai] and some of the core developers say [pai pi ai] and [pai pai] conflicts with PyPy the runtimes, so I am with [pai pi ai] as well.

53:25 William: Ok, so I posted to PyPi a PyGSL package so I'll say that that's a brand new one that wasn't there before I did it because a user, a physicist who wanted to use it in their class in SageMath cloud wanted this GSL is the good new scientific library and they wanted to use Cython bindings to the good new scientific library to be available and the only way to do it was to download some zip file off the source forge and that sucks.

53:51 Michael: Yeah, anything on source forge these days is a little suspect.

53:54 William: So, I setup a PyPi account, and figured it out how to I took the zip file and then I pushed it so I myself became the kind of manager of the PyGSL package on PyPI, and it took me about 2 to 3 minutes to learn how to do so, so my answer is that package or push your own package especially some old thing on source forge that should be pip installable or make it pip installable, it's easy for anybody to make things pip installable.

54:23 Michael: Yeah, it's surprisingly easy to register a package and upload it and so you just more or less create an account and just use the built in tools and the setup tools and stuff, so yeah, it's really nice. Awesome. And then, what editor do you use when you write Python code?

54:40 William: So, I dogfood everything, so i do absolutely all of my development of SageMath Cloud and Sage from within SageMath Cloud. And the editor of SageMath Cloud is it's built on top of codemirror which is a web like a Javascript based code editor, so I added things like multiple points so you can split the view and see two points in a document at once, and also because the editing is synchronized we can just open multiple browser tabs to see multiple points in the same document at once. And then it also has various plugins for the kind of functionality I felt like I really needed, like deleting trailing white space and so on. So my answer is codemirror, which is probably not the most popular answer but it's my answer. Oh, one other thing-

55:25 Michael: I think that's really cool.

55:26 William: One other thing is in SageMath Cloud there is a history, so whenever you add it every key stroke almost is recorded and there is a slider that lets you slide back and forth and see all past versions of your document, and so like when I am editing code I'll be like oh, I wanted to look at what it was like 3 minutes ago, or 10 minutes ago, and copy something from that. And you can very easily do that with the little slider, so you get kind of a whole third dimension of time when editing, so it does have some benefits.

55:53 Michael: Yeah, that's a really nice feature. I feel like a system has reached some level of maturity and well roundedness and goodness basically when it can crate itself. When you compile C with the C compiler, when you write like PyPy, you know, you run Python with Python, these types of things, it's pretty cool, so it's nice to hear you are doing that for your editor. Awesome. So, before we say goodbye, what should people know about getting started with this if they wanted to check it out, what should they do?

56:30 William: So just type Sage or SageMath in the Google and you'll get to the SageMath.org website and then there is all kinds of documentation, there is like 10 to 15 thousand pages of documentation there, links to books and so on, and there is a big link to which says SageMath online and if you click that it takes you to the SageMath Cloud site that we have been talking about. And there is also something called SageMath cell which gives you a single one single input box where you can type in a block of Sage code and see output. And you can also embed those blocks inside of your own static website easily and there is an API for doing that. So got to he website SageMath.org you can either download it or go to the cloud site and use it online. So it's 100% open source top to bottom.

57:16 Michael: William this is quite an achievement, i really love looking inside of it because you solved a lot of cool problems and a lot of excellent technologies come out of it, so, congratulations.

57:29 William: Thank you.

57:29 Michael: Yeah, it's been good to talk to you, take care.

57:32 William: Ok, thanks a lot, great talking to you.

57:33 Michael: Yeah, bye.

57:33 This has been another episode of Talk Python To Me.

57:33 Today's guest was William Stein and this episode has been sponsored by Hired and Snap CI. Thank you guys for supporting the show!

57:33 This episode has been sponsored by Hired. It's time to find your next big thing. Visit hired.com/talkpythontome to get 5 or more offers with salary and equity presented right up front and a special listener signing bonus of $2,000 USD.

57:33 Snap CI is modern continuous integration and delivery. Build, test, and deploy your code directly from github, all in your browser with debugging, docker, and parallelism included. Try them for free at snap.ci/talkpython

57:33 Are you or a colleague trying to learn Python? Have you tried books and videos that left you bored by just covering topics point-by-point? Check out my online course Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python.

57:33 You can find the links from the show at talkpython.fm/episodes/show/59

57:33 Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, Google Play feed at /play and direct RSS feed at /rss on talkpython.fm.

57:33 Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. You can hear the entire song at talkpython.fm/music.

57:33 This is your host, Michael Kennedy. Thanks for listening!

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon