00:00 Michael Kennedy: Listeners often tell me one of the really valuable aspects of this podcast is the packages and libraries that they hear about and start using in their projects. On this episode, I've invited Brian Okken, my co-host over on Python Bytes, to take this idea to 11. We're going to cover the top 30 Python packages from the past year, a metric to be determined later in the show. This is Talk Python To Me, Episode 181, recorded October 3rd, 2018. Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy, keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython. This episode is brought to you by Linode and CloudBolt. Please check out what they're offering during their segments, really helps support the show. Brian welcome back to Talk Python.
01:06 Brian Okken: Oh yeah, thanks.
01:07 Michael Kennedy: Yeah, you've been here a bunch times on this show and of course, we meet up quite often on Python Bytes, right?
01:14 Brian Okken: Yeah, every week.
01:15 Michael Kennedy: Every week, that's such a fun show, and so we get to talk. You know, this is like the sort of, extended edition of Python Bytes in some sense for our topic today because on Python Bytes, we talk about a couple of cool projects we found this week. This one is, hey, here's a whole bunch of cool projects found throughout the year but this time, we didn't pick it, did we?
01:36 Brian Okken: No, it came from, where did it come from?
01:38 Michael Kennedy: Mybridge AI.
01:40 Brian Okken: Wasn't even from a person then?
01:42 Michael Kennedy: No, so there's this app, this company, I guess, called Mybridge and they kind of like Zite or Flipboard, curate a list of articles and other interesting reading but instead of just trying to do that for a general population, they try to do that for professionals in various industries. So one of the things they came up with was 30 Amazing Python Projects, the 2018 Edition and we're just going to go through that list and just talk about these projects and sort of shine a light on some cool stuff that well, this AI picked out for us.
02:15 Brian Okken: Yeah, and one of the interesting things is you can, I mean individuals can go in and do just the popularity of a talk, like that the number of stars or the number of, maybe that number of downloads on PyPI or something but this is also used the engagement, I'm guessing, how often things get added to possibly and recency so if things haven't been edited for a long time, they're probably not going to be on the list.
02:44 Michael Kennedy: Right or stuff that maybe is new but it has the same number of stars, just something that's old, right, that would make it more more valuable. So before we get into our main topic though, let's just do a little background on you since it's been a while since you've been on this show, although I'm sure many people to listen to Talk Python also listen to Python Bytes. A couple things that you're notable for; one is the pytest book, what's the title of your book?
03:06 Brian Okken: Python Testing With Pytest and not very creative but is exactly what it is.
03:11 Michael Kennedy: Yeah, it's quite descriptive.
03:12 Brian Okken: It starts out with an introduction just to get people's feet wet with pytest. I've had a lot of people say that it didn't get lost with it but a lot of the stuff there is stuff that they'll use down the road, like we get into building plugins and things like that.
03:28 Michael Kennedy: Yeah, yeah, very nice. And what do you do day to day? Do something testing-related, right?
03:32 Brian Okken: Yeah, I make test equipment. So I work for Rohde & Schwarz and make electronic test equipment and me, specifically, I'm a manager for a some C++ people that do embedded and a test team that does testing of the hardware, so.
03:48 Michael Kennedy: Nice, yeah, that's very cool. All right, so are you ready to get into our topics?
03:52 Brian Okken: Yeah, definitely.
03:53 Michael Kennedy: What's our first one?
03:54 Brian Okken: The first one is Home Assistant, at home-assistant.io and it is a open source automation, home automation, that puts local control and privacy first. So I haven't actually played with this but it looks super awesome. So there's a lot of different home automation things out there like Nest and others and then some do-it-yourself stuff like with Raspberry Pi and Arduino projects but this is an overreaching one that like interacts with all of these things and you can control multiple devices with one interface, track multiple devices. And then have some cross device automation and notifications. Like for instance, you could monitor the soil moisture outside and then like alert you, send you an email or something if it's too dry or something like that.
04:49 Michael Kennedy: Even better, how 'about have it automatically turn on your sprinklers 'cause you have smart sprinklers, right?
04:55 Brian Okken: You're right, exactly.
04:56 Michael Kennedy: Only when it's needed, 'cause not when it rained.
04:58 Brian Okken: Yeah, one of the things I'm going to play with is the ability to, there's a location things where you can hook up through your Allow an App on your cellphone or something to know where you are and you can say hey, when I leave my work, go ahead and turn the air conditioning up or the heat up in the house so that the temperature's right when I get home, regardless of when I left work, so that'd be fun.
05:19 Michael Kennedy: Yeah, there's all sorts of cool stuff. Another thing I want to give a shout out to here is Hass, I guess the way you'd say it short name would be Hass.io, H-A-S-S.io?
05:29 Brian Okken: Or maybe Hass like the avocado.
05:31 Michael Kennedy: Yeah, I guess so. So this is a little thing that you can run on a Raspberry Pi that is like an embedded home assistant. So you can just have a Raspberry Pi in your house and you have a private server that controls all of your devices and it's an iPhone app you can install that then talks to this private server and all sorts of cool stuff.
05:52 Brian Okken: That's cool, so you're not dependent on a cloud service or something for these sorts of things.
05:58 Michael Kennedy: Right, exactly. Isn't that cool?
06:01 Brian Okken: Nice.
06:02 Michael Kennedy: Yeah, so, Home Assistant, if you're doing anything with a smart home and you care at all about Python, there's tons of integration and it's all in Python which is amazing.
06:08 Brian Okken: Well, what's next?
06:09 Michael Kennedy: Next up we have PyTorch which is a deep learning library, which is pretty awesome, right?
06:15 Brian Okken: Yeah, is that from Microsoft or who's that from?
06:18 Michael Kennedy: I think it's its own project.
06:19 Brian Okken: Okay, it's even coming out with a new version pretty soon, or maybe already.
06:23 Michael Kennedy: Yes, so what's really cool about you know PyTorch, it's basically lets you do a lot of GPU-accelerated like tensor, deep-learning type of stuff, deep neural networks and so on. But the new thing that's cool is in PyTorch 1.0, coming along, you can accelerate your Python machine learning code with native code. So they've actually created their own little mini-language, called Torch Script. Torch Script is a subset of Python that can be JIT compiled into either C++ or other high speed languages.
06:57 Brian Okken: Nice, this is very cool.
06:58 Michael Kennedy: Yeah, it's pretty cool so it's already had like support for Cython to extend it but now it even has support for its own special language coming soon, so I think that's an RC-1. So if you're doing any machine learning PyTorch you probably already know about it but quite cool and this little Torch script subset thing is quite interesting as well. Normally I'm pretty happy about these new libraries but sometimes you know, not too much right?
07:22 Brian Okken: Yeah, kind of get a little grumpy.
07:23 Michael Kennedy: Wasn't nice. You can definitely get grumpy.
07:25 Brian Okken: Well Grumpy is next on our list and Grumpy is a Python to go or transcompiler, maybe it's the same thing. And runtime so you can write Python code and have it run on a Go Runtime, sure why not?
07:41 Michael Kennedy: That's pretty interesting yeah, why not? I mean one of the things you get is pretty good support for concurrency 'cause Go is all about that right? So that's kind of cool. One of the things that kind of makes me grumpy about Grumpy is that it's a legacy Python not modern Python.
07:54 Brian Okken: Oh it's 2.7.
07:56 Michael Kennedy: Yeah the reason is this is a super focused project from the YouTube team and the YouTube team is running YouTube on Python 2.7 and they're trying to make it go faster. Because they have requirements that I suspect most people don't have. I mean our podcasts Python Bytes for example gets a lot of traffic right? But yeah if you look at YouTube they get a million page views per second. That's a lot of traffic.
08:25 Brian Okken: Yeah, that's a lot.
08:26 Michael Kennedy: So actually I had yeah the Grumpy, I had the Grumpy guys on well, the guy behind Grumpy, on Talk Python, Episode 95. And a lot of these topics we've covered in one place or another so I'll try to give a shout out to like more deep coverage on them. If that's interesting to you, but yeah Grumpy's interesting. I hope I would love to see this go somewhere actually. I would love to see alternate core implementations like I would love to see a Rust re-implementation of CPython the runtime as well. So pretty cool.
08:54 Brian Okken: Yeah so Go has asynchronous stuff and so does Python.
08:57 Michael Kennedy: And as of Python 3.5 we have async and a wait, right. And that kind of unlocks a certain type of really good concurrency which is the kind that is based on IO. That's waiting on databases waiting on micro services, waiting in the file system. You know what websites do a lot of?
09:18 Brian Okken: Wait on stuff.
09:18 Michael Kennedy: They wait on stuff, right. Wait on databases, wait on web services, et cetera. And so Sanic is one of the big frameworks that came out pretty recently and is one of the really fast async and wait based Python web frameworks. And like it seems like all the new web frameworks that are coming out they're like yeah we're basically the Flask API with a different implementation, and this is no different.
09:40 Brian Okken: Yeah one of the things that's fun with Sanic is that there's been a lot of like proof-of-concept things coming up but this one does look like it's gaining traction.
09:50 Michael Kennedy: Yeah that it's pretty cool and it's based on this thing called uvloop which is an alternate implementation of asyncio in Python that's about twice to four times as fast, so that's also pretty cool that it's based on that. And also Sanic just announced, we'll probably cover this over on Python Bytes at some point. But they just announced that they're a community driven organization. It used to be a project by a guy, now it's its own thing, a little bit like Pallets is to Flask, there's that organization.
10:15 Brian Okken: Oh nice.
10:16 Michael Kennedy: So if you're looking to contribute to projects this is a good chance to do that. They even, I don't know how many projects do this but I think this is a great idea. They say in there that you can go to GitHub and look through the various items and they'll put a Help Wanted tag on things they think would be a good thing for somebody to grab and contribute.
10:34 Brian Okken: That's cool.
10:35 Michael Kennedy: Yeah, it's pretty nice Yeah, they're on fire aren't they?
10:39 Brian Okken: Yeah and they are on fire. And next up is python-fire. So python-fire it's an interesting little project, in well I don't know how little it is. I haven't looked at the source code. But it's a different take on command line.
10:51 Michael Kennedy: It's little in scope though?
10:53 Brian Okken: Yeah, little in scope, a different take on command line interface generation. I'm familiar with Click and there's quite a few others. But python-fire does it kind of a little different. You put some boilerplate, not a whole bunch but a little bit in one of your files. And then you can, all the functions within that module get exposed as command line arguments and it's kind of magical how it works. And it's fun, if you're confused by generating command line interfaces check this out. It also, they kind of push it for people that are like you just quick and dirty, just for yourself, for your small team, this might be a good thing.
11:32 Michael Kennedy: Yeah it's pretty cool, their little subtitle or whatever is automatically generating command line interfaces from absolutely any Python object, that's pretty cool.
11:41 Brian Okken: Yeah and then their documentation is beefed up recently, so they they've built a guide and there's a lot more easier ways to get into it now. So that's cool.
11:49 Michael Kennedy: Yeah that's cool and these are from Google so you can pretty much trust it's probably pretty good and nicely put together. So going back to machine learning our next one is spaCy. When you think, what I think at least when I think of machine learning and like text understanding, natural language understanding, I always thought of a NLTP, that seemed like the way to go. But spaCy is the the new fancy way to do that apparently and it's written in Python and Cython, so it makes it super fast.
12:19 Brian Okken: Yeah this looks looks very interesting and natural language processing is actually getting used quite a bit.
12:25 Michael Kennedy: Yeah any time you want to take text and understand what the words mean this is it right? If you say wanted to maybe you were an algorithmic trader and you were studying the live flow of data on Twitter and trying to look for sentiment analysis around a particular stock and then having automatic trading happen on that, you probably would put spaCy in there right?
12:47 Brian Okken: Yeah, sure right. Or if you want to...
12:49 Michael Kennedy: Just as grabbing an example out there.
12:51 Brian Okken: Or maybe tracking the emotions of a certain chief of a country on Twitter, you can maybe use this.
12:59 Michael Kennedy: I'm not sure I'm not sure I'll understand that. But you can try. All right so speaking of something I don't understand and I want to understand it. Help me understand Pipenv?
13:08 Brian Okken: Pipenv?
13:09 Michael Kennedy: Pipenv. There's so many ways to create virtual environments and work with them and dependencies and so on. And then there's this other way with Pipenv, which is really awesome that applies some of the time, it's done partly by Kenneth Reitz. So very cool but I don't know where it fits in my world still?
13:27 Brian Okken: It's for applications and its really good for coordinating all of the dependencies for application developers. So if you've got a team of people working on an application I think this is a good way to go. Some of the ways is built in work to separate deployment environments versus development environments because you might need you might need extra things around like pytests and document generation and stuff like that. If you're during development that you don't need when you're just deploying, having those separated out is handled by Pipenv rather beautifully so.
14:04 Michael Kennedy: Yeah nice so it creates and manages virtual environments. It replaces requirements.txt and friends with the pipfile and the pipfile.lock and it sort of pins the versions and all those kinds of things, so pretty interesting. I got to say I'm still pip install -r. Sounds tough these days, but.
14:22 Brian Okken: Well it's also it can generate your requirements file so it can play nice. You can have an up project play in both worlds at the same time.
14:29 Michael Kennedy: Yeah that's cool. Yeah it's a good project and I just I need to, you know, shake myself out of my my rut and learn this as well and then I can decide better where it fits for me. So one thing I always wanted to do is create like a product that I could sell out of little tiny like IoT smart things and I'm still trying to understand like what that might be, but if I were to do that I know what project I'd use.
14:53 Brian Okken: Yeah, MicroPython.
14:54 Michael Kennedy: Right on, MicroPython. Tell us what that is?
14:56 Brian Okken: No it's your turn.
14:58 Michael Kennedy: All right, so MicroPython is basically a Python implementation of the operating system that runs microcontrollers. So it's not like I have a Raspberry Pi which is running some variation of Linux and then I can install CPython on there. It's like Python itself is the operating system and you can do insane things like you can hook up a lambda expression to a hardware interupt. Like that low-level. Isn't that amazing. So you could like put this on like a $5 little embedded chip.
15:31 Brian Okken: And it's got a pretty good support, it's Python 3 and there's the Mu editor and other editors that can talk, that talk directly to it. So developing in this world sounds fun.
15:42 Michael Kennedy: It's been going for a few years and it's just it's still going strong so I'm super excited for that.
15:46 Brian Okken: Yeah a MicroPython evangelist. Or maybe a prophet.
15:51 Michael Kennedy: Yes, that's if, a prophet for it. Indeed Prophet is the next one, what's up?
15:57 Brian Okken: Yeah, I actually don't know. I should have taken the last one. It's a tool for producing a high quality forecasts for time series data that has multiple seasonality with linear and nonlinear growth. I don't even know what that, I just read.
16:11 Michael Kennedy: So yes, the idea is so this is a project from Facebook. So it's for predicting trends and time series data that might not be like...
16:20 Brian Okken: Oh, like stock prices.
16:21 Michael Kennedy: Yeah like stock prices or things that are not just completely linear, completely the same right? Like the seasonality and the nonlinear growth and things like that. So it's just a really advanced library for predicting given a time series of data what's going to happen in the future.
16:36 Brian Okken: Oh there's a lot of uses for that, that sounds neat.
16:39 Michael Kennedy: Yeah yeah it's pretty cool and from Facebook, I bet they do a little bit of predicting and analyzing and stuff like that with like your feed and whatnot.
16:46 Brian Okken: Yeah. And maybe for like predicting ad prices and things like that.
16:50 Michael Kennedy: Yeah exactly. This portion of Talk Python to Me is brought to you by Linode. Are you looking for bulletproof hosting that's fast, simple and incredibly affordable? Look past that bookstore and check out Linode at talkpython.fm/linode. That's L-I-N-O-D-E. plans start at JUST $5 a month for a dedicated server with a gig of ram. They have 10 data centers across the globe so no matter where you are there's a data center near you. Whether you want to run your Python web app, host a private GIT server or file server, you'll get native SSDs on all the machines, a newly upgraded 200 gigabit network, 24/7 friendly support, even on holidays and a seven day money-back guarantee. Do you need a little help with your infrastructure? They even offer professional services to help you get started with architecture, migrations and more. Get a dedicated server for free for the next four months, just visit talkpython.fm/linode. The next one is something called SerpentAI. And we actually talked about this back on Python Bytes Number 50, but this is really cool.
17:54 Brian Okken: It is cool, it's in a like AI with game development or actually, you better take this one too. I'm not sure what this is.
18:04 Michael Kennedy: So the idea is if you have an AI and you want to train it you could construct like a fake world for it right. You could tell it all right you're in a box, you can move around or like you're in a maze. And you'd have to build that maze right, for your git you basically have to build a miniature game and then try to teach a character in the game a thing, if you want to develop AI. But we have all these real games that are super, super interesting, super nuanced, right. Like games you might play. Racing games, first-person shooter games, whatever. So SerpentAI attempts to let you control that game with Python, whatever that game is. They claim you can turn any video game into a sandbox environment for AI and bot programming. So imagine I'm trying to like create a robot that can like interact with the world. You could try to create your own world or you could stick that thing in Grand Theft Auto or that's another game and make it run around and go okay well it's really good in this world so it's a little bit, let's use sidestep creating the environment for testing AI.
19:06 Brian Okken: That's so cool.
19:07 Michael Kennedy: Yeah it says it even works with like games based on Steam or stuff like that. So it's pretty cool, the guy who creates it also does a Twitch programming channel. It's kind of interesting, right? So like if you fire it up and say, alright I'm going to go and take this game and I create an AI that's going to do this and he'll just like start from scratch. So if if you got a Friday night and you don't got anything going on you could you'd watch that.
19:32 Brian Okken: Oh yeah that sounds great, 'cause I don't really want to do it myself, but I'd like to watch somebody else do it.
19:36 Michael Kennedy: Exactly. All right so that's a good one. If you're into like creating AIs and bots and stuff like that, pretty awesome. The next one is just beautiful right?
19:45 Brian Okken: Yeah, it is nice and this one I actually can understand, so it's good. Next one is Dash.
19:51 Michael Kennedy: All right, tell me about it.
19:52 Brian Okken: It's a framework for building analytical web applications, but so it's kind of like you need you need plots and you know Python code interactions, but you need some real-time stuff. So like you want to zoom in and look at new data and have data go back and forth and usually you do JavaScript or something like that, and maybe React or something and this is, it's real, it's built on top of Plotly and React and Flask, and it's just really a beautiful interface for for data and for interacting with stuff on the web. So I actually tried to use this and I think it would be fine for most applications but you do need to have either use their server or you can set up your own server but there needs to be a server to do a lot of the crunching so.
20:41 Michael Kennedy: Right, so basically you need a server that runs Flask? They're a thing that's derived on top of Flask. Right. So if you go and open up the link, go to their main website, there's a part says Dash is productive and it shows the amount of code to create this cool graph that's totally interactable, zoomable, drop-down to choose different data sources and you basically have to say, I would like to graph these things and my data source is Google and my date range is this year. And somehow it can just go and get like stock data and other stuff like straight from Google. You don't even need to provide it the data.
21:24 Brian Okken: Wow, that's neat.
21:25 Michael Kennedy: So there's a lot of, if you're trying to create visualizations on the internet and you're mostly happy writing in Python this is a really pretty sweet thing.
21:33 Brian Okken: Yeah it is and it's really pretty.
21:35 Michael Kennedy: Yeah it's definitely pretty. So you know another place where a lot of pretty stuff gets put is on Instagram.
21:40 Brian Okken: Yeah. This is actually an amusing little project called InstaPy. It's for trying to make your own Instagram bots and it's built just as if you were interacting with Instagram yourself through a web page because it's built on Selenium and it's Python powered Selenium. It's kind of a little pet project that might be used for like automatically farming out likes and commenting on certain pictures that have something in it or something like that, or comments and followers. There's a whole bunch of stuff it can do and I'm not sure, it's one of those, is this a good thing or a bad thing?
22:18 Michael Kennedy: Yes, I know, it is a little bit like that.
22:21 Brian Okken: But it's definitely interesting.
22:22 Michael Kennedy: Yeah, it's definitely interesting. like I can see if you were say a social media company like you did social media as a consulting project for large companies, you could write some cool automation that maybe automatically analyzes all the ways people interacted with you, store them, generate reports, stuff like that. Yeah, so there's some good uses but there's also you know bots, in general I feel the same way about Twitter bots, they kind of bug me, but what are you going to do? It's the internet. Speaking of the internet, the internet's full of API's. And the next project is API Star, which this is a project that fully embraces Python 3. So this is from Tom Christie the guy who wrote a Django Rest Framework. But this is sort of his re imagining of what a Rest framework would look like in Python with Python 3.5 and above. And it's really cool because it does things like the arguments to your API methods, use type annotations to get their values so you can have like a variable or an argument, colon and say this comes from the header and it's automatically extracted out of the header because of the type hints and things like that, it's pretty crazy. So yeah this is a good one.
23:36 Brian Okken: Yep very good and you said you talked with him about it on one of your episodes right?
23:39 Michael Kennedy: I did back in Episode 125, I had Tom on there to talk about that. So definitely a link to that as well. All right this next one I'm going to let you do this because this one, this one sounds simple.
23:50 Brian Okken: Yeah sure, it's very simple. It's a, I don't even know how to pronounce it. It's spelled, F-A-I-S-S.
23:57 Michael Kennedy: Faiss? Face, Fass? I'm not sure, Faiss because is has to do with similarity and stuff, I'm going Faiss.
24:04 Brian Okken: Okay. It's a library for efficient similarity search and clustering of dense vectors. Of course I have to do that all the time.
24:14 Michael Kennedy: Of course.
24:16 Brian Okken: It's a data science-y thing for people that understand that, you probably understand. It contains algorithms that search in sets of vectors of any size, even ones that don't fit in your RAM, so that is big. It is written in C++ and so it's going to be fast and it's written by Facebook.
24:31 Michael Kennedy: Yes, another one of the Facebook AI projects and it also runs on GPUs, so it's another specialized AI system. I feel kind of, I feel a little bit like I'm not entirely keeping up with the whole world of AI, it's going so fast.
24:46 Brian Okken: No, but now I get the name, like it's Facebook AI Similarity Search.
24:51 Michael Kennedy: Yeah there you go. Right on. Yeah and who knows maybe even when it auto tags you, who knows maybe it's using this?
24:58 Brian Okken: Maybe?
24:59 Michael Kennedy: All right so another one that's interesting, sticking with the web for a little bit is MechanicalSoup.
25:04 Brian Okken: Yeah actually I hadn't heard of this before I ran across this list, so this is neat.
25:07 Michael Kennedy: I haven't either, yeah that's why I like this, this little article project that we're covering because I would say a good five or six of them, I'm like whoa I've never even heard of this. MechanicalSoup is an automation library for interacting with websites. So a little like Selenium, a little like Requests, it'll keep cookies and send them around. It'll follow redirects and interact with links and forums and all kinds of stuff. Which is pretty awesome. I want to give sort of notable mention as well l to another one from Kenneth Reitz, here called Requests HTML. Which is somewhat similar, so this is MechanicalSoup, like Beautiful Soup and requests. You know everyone knows what a requests is. Well Kenneth Reitz created another project called Requests HTML which kind of merges Beautiful Soup and Requests as well. So they're in the same space. And you know if something goes wrong you can get an exception, but at least you can make them better right?
26:01 Brian Okken: Yes, with better-exceptions.
26:03 Michael Kennedy: What's that?
26:04 Brian Okken: And better-exceptions is just, it's a you kind of have to modify your code a little bit to make use of these but it intercepts the exception process so that you're the output, the tracebacks from your exceptions are easier to read. And prettier, there's colors in there and it tells you what values are and it kind of points the value to the variable. And actually it's kind of fun, it's a neat thing to check out.
26:28 Michael Kennedy: Yeah it basically creates a little, like line mapping parts of the code e and one of the things that's actually super cool, if you look at the GitHub screenshot there, when it shows you the error, do you see Brian how it actually shows you the values of the functions, the values that were passed to the function? That is so incredibly cool. So you say this function was called and it crashed and not only does it just say, too bad it was on this line and color it and like indent it, but like if it has a say, here's a function called deep and it takes a plus b and you have little visual lines going a is 2 and b is 15 in this crash. Like that is awesome.
27:07 Brian Okken: Yeah and it's even though, it'll even go back a little bit and interpret things beforehand. So I'm not sure how this is doing it but it's pretty cool.
27:14 Michael Kennedy: It's definitely cool, it's definitely better isn't it?
27:17 Brian Okken: Sure, yeah. Fore me it's similar to most of my, a lot of time in the testing world and pytest does a lot of this stuff for you also. Isn't this pretty with the colors and everything.
27:27 Michael Kennedy: Yeah, quite cool. So have you heard the saying that if you have a problem that you solve with regular expressions then you solve it with regular expressions now you have two problems? This next one is meant to kind of make that go from one to zero, not one to two. Tell us about it?
27:47 Brian Okken: One to zero? I don't get that?
27:49 Michael Kennedy: You have a problem and it solved, rather than I had a problem, now I have two problems.
27:53 Brian Okken: Now you have two problems. So it's flashtext and it's a very focused solution for searching and search and replace and things that you would often use regex for and finding words within sentences and grouping them into, you can have a bunch, a set of phrases that all kind of mean the same thing. Like if you want New York and Big Apple to mean the same thing you can search them together. And it's like really, really fast, like if you look at the link that will conclude and it has some graphs on search size compared to regex and it's kind of incredible.
28:28 Michael Kennedy: Yeah it's really nice and so you can say it'll take like the sentence, I love Big Apple and thought Big Apple and the Bay Area or something like that. But if you've told it Big Apple is a stand-in for New York and you ask hey what keywords are here, it says New York and Bay Area. Or you could also put San Francisco right? So quite cool and way to sort of normalize text data for various things people might say, I like it, super simple and you don't have to write regular expressions, so that's good.
29:00 Brian Okken: Yeah, I'm not really afraid of regular expressions but yeah anyway.
29:04 Michael Kennedy: Yeah, I'm not terribly afraid of them but they're not the first thing I jump for. Okay, so I find dates in Python to be surprisingly not obvious. How about you?
29:13 Brian Okken: Well they're really easy until they're not easy.
29:15 Michael Kennedy: Exactly you're like wait a minute that one has a time zone in it, that one doesn't, you're done, two values.
29:22 Brian Okken: Yeah. there's a lot of projects A in Python to deal with datetime. And Maya is one that is from Kenneth Reitz. So it's a for humans application.
29:36 Michael Kennedy: Yeah people feel better when whatever their project that they're using like requests HTTP, for humans right so this is datetimes, humans quite nice. So if you're working with datetimes it has a lot of a lot of cool features and simplifies things around datetimes.
29:53 Brian Okken: Some of the slang features are kind of fun, so you can say do things like give it a date and say a week ago, or and have it generate things like those sorts of slang versions of words and so interacting with putting out text/datetime information on a webpage where especially when you don't know where somebody is and their time zone and whatever and you don't want to use JavaScript, you want to keep it on the Python side, Maya is a good choice. There's but anyway.
30:22 Michael Kennedy: Yeah cool so you can say like slang time and it'll say that's 23 hours from now or that's tomorrow or it's yesterday. Which is a real nice way to not be overly precise if you want to just put that on like a forum or something to that effect. This portion the Talk Python to Me is brought to you by CloudBolt. Friends don't let friends violate PEP 8. Nor do they let them spend their days in an unfulfilled work environment. Good news, your friends at CloudBolt want your help developing their state-of-the-art cloud management software. Built with Django and ranked as the number one product of its kind, CloudBolt is looking for talented engineers of all types. Located in beautiful Portland, Oregon Cloud Bolt is an hour from the Pacific Ocean and Mount Hood. If you're not in Portland not a problem, CloudBolt offers a relocation stipend to the Pacific Northwest and is also hiring solution engineers from everywhere around the globe. Whether you're interested in containers, hypervisors or just writing clean, performant Python code, CloudBolt would love to hear from you. Visit talkpython.fm/cloudbolt for more information. That's talkpython.fm/cloudbolt. So I think this next one is right up your alley Brian.
31:33 Brian Okken: Really?
31:34 Michael Kennedy: I do, I do. I'll take it for you but I think it's up your alley. So I don't know how to say it? Memesis? I'm going to go with memesis?
31:41 Brian Okken: I don't know.
31:42 Michael Kennedy: So it's another one of these fake, it's another one of these faker libraries. So the idea is you can go and say I would like to create a personal object and you can ask what's their occupation? How old are they, what's their name? And you can tell it things like, please answer me in English or German or whatever. And it will basically generate fake test data so you could create a fake database and put it in a SQLite file then use that for testing or things like that. So pretty cool.
32:11 Brian Okken: Oh yeah, this does look neat and it looks like they have a page on like well why am I using this versus something else? So they have a comparison to other libraries so it's good when people are trying to decide.
32:23 Michael Kennedy: Yeah, it has a cool function called identifier and you can give it a mask just like, number, number, number, dash, dash, number, number, slash, number and it'll like create one. So if you need say, like Social Security numbers or something else like that it's really easy to generate those, quite nice.
32:39 Brian Okken: Yeah, I could see where that'd be useful. Neat.
32:42 Michael Kennedy: Yep. So this next one is called open-paperless. For scanning documents? I don't know anything about this. Tell me about that one?
32:49 Brian Okken: Well this is a, I didn't know about this before reading this either. So apparently there's a document management system called My End, EDMS. So I'm guessing My End Electronic Document Management System, probably. For some reason open-paperless is a, even though the My End EDMS is an open project also, the interface apparently at least some people don't like the interface. So open-paperless is a Python interface to this system and a simplified interface to it. So probably only of concern for people that there's not, but maybe I should be?
33:28 Michael Kennedy: Yeah, well it's got a lot of possibility for the automate the boring stuff type of things. Like right I'm so tired of being in this office and doing this, like can I write Python and make my job better?
33:37 Brian Okken: Yeah, definitely.
33:38 Michael Kennedy: Right, like I don't use paper for anything. I'm not sure it'd be any value to me but I can see how it would be for others.
33:44 Brian Okken: Yeah, and things like places where you have to keep copies of signed contracts and receipts and there's a lot of jobs where you have to keep at least a scan of the physical version. So that'd be good.
33:58 Michael Kennedy: Alright this next one is a shout-out to one of my absolutely favorite TV series, and it scares me to death.
34:05 Brian Okken: Really?
34:06 Michael Kennedy: So fsociety. That's the hacker crew from Mr. Robot. That means a couple things. It's a play on like that little arcade where they all hung out, where the U and the N fell off but also you know the more obscene version of it as well. So this one is a set of tools for hackers if you're doing penetration testing or things like that. So you can check out, this is on GitHub and you can go and grab it. So if you say wanted to turn this loose against your code or your infrastructure to see what would happen if somebody else did that? You might want to give that a look.
34:41 Brian Okken: Yeah and hopefully these people that you're trying to protect from aren't using the same tools, but you know that's one of those good thing, bad thing sort of things, but.
34:49 Michael Kennedy: Yeah, for sure. And it turns out that Python is a big hit with hackers. Actually, an interesting article about like the growth of Python adoption amongst hackers is like 77% over the last year or something crazy like that.
35:03 Brian Okken: Well they're a segment of the population and Python's growing in all segments.
35:07 Michael Kennedy: Yeah, that's right.
35:08 Brian Okken: Maybe it's because of all the opportunities to do live-coding?
35:13 Michael Kennedy: It definitely could be, so you can take your program and you can hook up a debugger to it and sort of step through, that could be like PDB, or that could be PyCharm or something like that. But this livepython one is a little bit more of a I'd like to sit back and watch how my program works, right?
35:34 Brian Okken: Yeah, and I'm still kind of a little confused by this one, but it's an application that, it sits on your desktop and yeah you kind of watch as your program's running, the tracing sort of thing and then you can see lots of stuff going on inside.
35:50 Michael Kennedy: Yeah, so it says, it lets you basically watch your program run like a movie. So you can say, go and sit back and it'll show like the execution through your program, as well as like a stack of variables and you can watch window of variables just on the side there, which is pretty cool.
36:08 Brian Okken: Yeah. You could have that just running while you're at lunch and people will think you're working while you're...
36:16 Michael Kennedy: Sorry, I'm not coming back, I'm compiling.
36:18 Brian Okken: Yeah. I don't know, it might be fun.
36:21 Michael Kennedy: But I think it's a good idea. I saw there's some coffee cup that you can get that it'll like heat itself up, at like 6:30 in the morning and like start to steam before you get into the office so it looks like you're in early, you just stepped out. So this could go along with that. So I told how I was confused about Pipenv, right? Yeah, still confused and hatch is also pretty awesome but it also makes me confused because there's all these different ways in which I might work with stuff. So hatch is another package manager, virtual environment manager for Python which is pretty cool and on our list.
36:57 Brian Okken: Yeah, and it also like they're taking over lots of other stuff too. So it does things like start a project, like Cookie Cutter kind of does. And it also does things like push up to PyPI and a whole bunch of other things.
37:10 Michael Kennedy: Right, it has pytest support.
37:11 Brian Okken: Yeah, I don't have any issues with all the tools by themselves. So I'm still yeah, I don't think I need this, but a lot of people use it and like it over things.
37:22 Michael Kennedy: Yeah, it's cool. I think what this really means that there's all these different ways of doing this stuff. Like it's just not really, totally nailed down. Like I recently interviewed the folks around the Python Language Summit which is where the core developers get together and one of the topics at the core developers meeting in 2018 was virtual environments. Do we need them, why are they so hard to teach? Can we make them simpler? Right? I'm telling you it's just like the fact that there's all these different solutions means it's just not quite a solved problem. So I guess that it's good people are trying to solve a problem.
37:59 Brian Okken: Yeah, definitely.
38:00 Michael Kennedy: This next one that came up is I guess if you're in the right space it'll be useful. I'm not totally, entirely sure, but Number 24 is Tangent. Which it does source-to-source, debuggable derivatives in Python.
38:14 Brian Okken: Sure, why not?
38:15 Michael Kennedy: As in you know DYDX, Yeah, the first derivative of a thing. So anyway if you have to do numerical differentiation in Python,
38:27 Brian Okken: No, yeah.
38:27 Michael Kennedy: Sound pretty cool, I probably don't want to say more than that about this one.
38:31 Brian Okken: Sure.
38:31 Michael Kennedy: So previously we had Prophet, did you foresee that Clairvoyant was coming?
38:35 Brian Okken: Yeah, I knew that. 'Cause I wrote it in here, in the list. So Clairvoyant, which is like kind of an awesome name for a project. It's software designed to identify and monitor social and historical cues for short-term stock movement. So hey it's kind of like Prophet but focused on stocks, I'm guessing. Yeah, nice.
38:58 Michael Kennedy: Yeah, for all the algorithmic traders out there, here you go. One that is pretty promising is called MonkeyType and that one came from Instagram hopefully, right? Yeah, that's from Instagram. So we have MyPY, we have MonkeyType and a couple of these other systems that are out there trying to take code annotations and help us either evolve or better understand our code or lint them to make sure they're correct based on their typing right?
39:24 Brian Okken: Yeah, and so this one's kind of a runtime thing. So if you're not sure how your code is being used it can take a look at your how it's being used in runtime and tell you what the types are.
39:36 Michael Kennedy: What's really interesting is it'll generate a stub. You know the stubs, like it might say def add(a:int, b:int) -> int, then just ... instead of an implementation. And then Python itself and MyPY and PyCharm and what not can use that to actually validate your code. So it doesn't even have to modify your code, it can generate these sort of parallel side, type definition files and stub files and put them up in TypeShed, it'll be pretty cool.
40:07 Brian Okken: Yeah, I think it might be kind of fun to use just to go over some code to see if what you think you're doing is really actually helping you.
40:15 Michael Kennedy: Right and it's based on how it works at runtime, so it looks at your code and says, well every time you called this function, you pass an int, so that's getting a :int put on it right there.
40:24 Brian Okken: Yeah, you thought it was a string, it's not, it's an int. That's why it's never equal to the string. So and you can even say, okay do that and apply those back to my code and it'll put Python 3...
40:38 Michael Kennedy: Oh you can? Yeah, annotations. It'll rewrite your code that has no annotations with annotations it guessed from runtime behavior, that's pretty awesome.
40:45 Brian Okken: Yeah, that's nice and then you can just diff it and see what it does and whatever.
40:49 Michael Kennedy: Yeah, and then we have, there's so much stuff happening in this space. These are not, these next two are not really on the list, but we have MyPY, which we talked about right. And there's MyPYC, which will take this code here that we're talking about and compile it to C, standard Python with annotations compiled to C, automatically. That's a new project coming from DropBox actually.
41:11 Brian Okken: Okay, because compiled Python needs annotations?
41:15 Michael Kennedy: No, if you want to, let's say you want to write a C extension, one thing you could do is write the C code, compile it and then import it. The other thing you could do is just say this module, I would like to run through MyPYC and now it's a C extension.
41:29 Brian Okken: Oh, okay. Now I get it. Yeah, I'm slow.
41:34 Michael Kennedy: I know you work at a place where they do C and C++, but not everyone necessarily wants to write it, right. So this'll let you skip that or maintain all Python but still get this equivalent of what had happened if you'd done a C compilation.
41:45 Brian Okken: Why are we even writing C anymore if we can just have this?
41:48 Michael Kennedy: I know and the other thing I want to throw in there is Cython will also take this code and compile it to C as well. Now it just does the Python 3 type annotations and it would make that work as well. So you could take MonkeyType and get yourself into a place where either of those things would work on it. It'd be pretty cool.
42:07 Brian Okken: Nice, they all fit together.
42:08 Michael Kennedy: Yes. You know I don't know how many people listen to Python Bytes but if they did they'd have to know we've been on a GUI trip, right?
42:15 Brian Okken: Yeah, with the help of a lot of people telling us that.
42:17 Michael Kennedy: Exactly. We're like oh there's four, they're like no, no, no. Let me tell you, let's count the ways.
42:24 Brian Okken: Yeah, there's lots of ways to do user interfaces in Python.
42:27 Michael Kennedy: One of the interesting ones is Eel.
42:28 Brian Okken: Yeah, which I'm amused by the name because I got to make the obvious pun, or just state the obvious. It's an electric python. Like electron, but with Python, get it? Eel?
42:40 Michael Kennedy: I love it, I love it. The thing that's actually really awesome. So for those who don't know, Electron JS, a lot of apps, probably one of the more notable ones is actually Slack I believe. I'm pretty sure Slack is, certainly Visual Studio Code is an Electron JS app. Adam is an Electron JS app. Basically write in web stuff, HTML, JavaScript, CSS and then running on top of Node and embedded Chrome. So this lets you take Python and replace the Node side with it somewhat. And it says it's for simple, electron-like apps, which is pretty cool. So another one, speaking of listeners helping us out, that's pretty awesome that I think is more full-featured is called Python Electron. I'm going to put a link to that in the show notes as well.
43:25 Brian Okken: Oh, okay.
43:27 Michael Kennedy: And there's actually some pretty cool interaction of here's how you write your Python code. And use ZeroMQ to communicate back and forth with the browser. So pretty cool, you can write just like standard Python and then you have an object and you just say, this object is, I would like to take this object and make it the API that my app can communicate with, it's cool.
43:50 Brian Okken: Yeah, neat.
43:51 Michael Kennedy: I would say that was even surprising.
43:52 Brian Okken: Cool, it was surprising. And next up is Surprise. So Surprise stands for, simple Python recommendation system engine. The U and the R are silent. A Python scikit, is scikits a noun? Are Python scikit for building and analyzing recommender systems.
44:12 Michael Kennedy: Yeah, that's pretty cool. So if you have any sort of ecommerce system or if you read this article you might also like that article. Here you go, surpriselive.com, right?
44:21 Brian Okken: Yeah, nice.
44:22 Michael Kennedy: Nice, so we talked about MechanicalSoup. The other one that we have previously talked about on Python Bytes not long ago is something called gain, which is a web crawler much like the other ones we talked about but this one is special because it's based on asyncio. Which is really cool and uvloop and aiohttp, and things like that.
44:42 Brian Okken: Yeah, it's a web crawler framework thing.
44:45 Michael Kennedy: Yeah, so you basically give it, you know here is let's say a URL, start here and then go actually find all of the other URLs and crawl them. So basically if you want to create DuckDuckGo, start here.
44:59 Brian Okken: Yeah, we need another one of those.
45:02 Michael Kennedy: That's right, if you want to go to a website and sort of explore all the links and start downloading and processing that stuff, gain is pretty awesome because it does it super, super low latency and in parallel with asyncio.
45:15 Brian Okken: And actually I like projects like this, not necessarily to make commercial products out of them, but for your own things. So if you've got a large website that you're maintaining or a company website or your own personal one, or whatever and you want to make sure things are working right, being able to crawl it and then interact with, instead of having hooks you've got a tool for web crawling that you can hook other Python up to. There's a lot of stuff you can do with that. And having it do it quickly on maybe your limited little CPU that you've got playing around, I think it's neat.
45:48 Michael Kennedy: Yeah, that's awesome. So yeah, that's a really good example as well, is if I've got a large website maybe it's even like a CMS or something where I don't necessarily control all the stuff that goes into it, you want to make sure there's no broken links and you can easily do that with Game. That'd be cool, make that part of your automated build. All right, the last one here is one of the automate the boring stuff type things. It's not interesting to many people but if you need it, you really need it. So if you would like to interact with PDFs and get data out of them then pdftabextract is for you, right?
46:23 Brian Okken: Yeah, and I mean the article and even the Read Me on this shows some kind of amazingly horrible scans of documents. They're not straight.
46:34 Michael Kennedy: Oh, they are horrible, oh my goodness, they're so bad. I could hardly read them.
46:39 Brian Okken: Yeah, you can hardly read 'em and yet this has this a way to OCR and extract data and then create data sets out of, out of like scans with things with tables in them. So one of those things of, if this is your job, yeah, automating this is a good idea.
46:57 Michael Kennedy: That is so awesome because normally what you could do with some of the tools for PDFs is the get the text represented in the file, but these examples are not that, these are here's you know tabular data just scanned from an image, now get the text out of it. Cool, so I'm sure that's going to make someone's day. Right, so that's our Top 30 from 2018, which is really based a little bit on slightly older data, but it was fun to cover with you, Brian.
47:25 Brian Okken: It definitely was fun. And similar feel as Python Bytes, so if you like this sort of a thing, we get to spend a little bit more time per topic on Python Bytes so head over to there and check us out there.
47:38 Michael Kennedy: Yeah, absolutely. It's a fun podcast for sure, so do check that one out. All right, before you go though there's the two questions I always get to ask. If you're going to write some code these days what editor Python code, specifically?
47:52 Brian Okken: PyCharm, always now, has incredible pytest support. Then I also got, I'm going to get this wrong, but I got, Oliver Bestwalter keyed me onto a project called like Power Mode, or something. And when type it's like sparks and fire comes out and it's just a blast, so I'm using that all the time now.
48:14 Michael Kennedy: You want to feel powerful, like I'm kind of like just low energy this morning, all right Power Mode baby, it's on.
48:22 Brian Okken: Yeah, when you copy-and-paste a chunk of code also or cut a big chunk of code, it pops up like a bam, like on Comics, it's neat.
48:32 Michael Kennedy: Oh my gosh, I have to find this. You're going to change the way my coding works. This is going to be bad.
48:40 Brian Okken: I had to turn off the flames and the shaking of the screen because I can't work like that, but the sparks I can just work with those sparks, so.
48:48 Michael Kennedy: That's pretty excellent. All right, normally I would ask you for an notable PyPI package, but we literally just covered like 35, PyPI packages.
48:56 Brian Okken: Definitely so I think we're good.
48:58 Michael Kennedy: Yeah, I think we're good on that one. But let me swap that one out, just let people, why don't you tell people about Test and Code, your other podcast which you didn't give a shout out to yet.
49:05 Brian Okken: Oh yeah, I also do another podcast called Test and Code. That's at testandcode.com, I can't spell today. T-E-S-T-A-N-D-C-O-D-E, DOT FM, it's not .fm, why am I doing that? It's testandcode.com, that's it. Yeah, we cover a whole bunch of stuff, we don't just do testing. We cover a lot of other things to. I know that we already have talked Python, but I do a little bit different take on things. And it's fun and I'm ramping things up, we're doing weekly podcasts for the rest of the year.
49:35 Michael Kennedy: That's awesome, you've had some pretty notable folks on there, so that's good, keep it up.
49:38 Brian Okken: Thanks.
49:39 Michael Kennedy: All right, well hopefully people have found a couple of things that really apply to what they're doing that they maybe hadn't heard of. I know I had when I went through those.
49:46 Brian Okken: Yeah, I did too, it was surprising.
49:49 Michael Kennedy: It was very surprising, my Clairvoyance wasn't good, my prophecies were not, no.
49:53 Brian Okken: A lot to gain.
49:54 Michael Kennedy: We're going to stop now.
49:55 Brian Okken: Oh, sorry.
49:56 Michael Kennedy: Oh my gosh, there was a lot to gain.
49:59 Brian Okken: All right.
50:00 Michael Kennedy: All right, let's just leave it there, Brian. Thank you so much for coming, bye.
50:03 Brian Okken: Thank you, bye.
50:04 Michael Kennedy: This has been another episode of Talk Python to Me. Our guest on this episode was Brian Okken and it's been brought to you by Linode and CloudBolt. Linode is bullet-proof hosting for whatever you're building with Python. Get four months free at talkpython.fm/linode. That's L-I-N-O-D-E. Spend your work time fulfilled. Write Python and Django code at CloudBolt developing their state-of-the-art, cloud management software in beautiful Portland, Oregon. Visit talkpython.fm/cloudbolt to join the team. Want to level up your Python? If you're just getting started try my Python Jumpstart by Building 10 Apps or our brand new, 100 Days of Code in Python. And if you're interested in more than one course be sure to check out the everything bundle, it's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite pod catcher and search for Python, we should be right at the top. You can also find iTunes feed at /itunes, Google Play feed at /play. And direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy, thanks so much for listening. I really appreciate it. Now get out there and write some Python code.