Top 10 machine learning libraries

Episode #131, published Tue, Sep 26, 2017, recorded Thu, Jul 20, 2017

Episode Deep Dive Links Transcript

Data science has been one of the major driving forces behind the explosion of Python in recent years. It's now used for AI research, controls some of the most powerful telescopes in the world, tracks crop growth and prediction and so much more.

But with all this growth, there is an explosion of data science and machine learning libraries. That's why I invited Pete Garcin onto the show. He's going to share his top 10 machine learning libraries. After this episode, you should be able to pick the right one for the job.

Episode Deep Dive

Guest introduction and background

Pete Garson Pete started programming at an early age, building BASIC and Pascal programs for BBS games and later working in the gaming industry. He gravitated toward Python for automating build pipelines, rigging, and other workflows in game development. Today, Pete is a developer evangelist at ActiveState, focusing on language distributions like ActivePython. He joined the show to share his insights into Python’s vibrant ecosystem of machine learning libraries and how developers can best choose among them.

What to Know If You're New to Python

Here are a few concepts and resources that will help you follow this discussion on machine learning libraries in Python:

Python.org: Make sure you’re using at least Python 3. If you’re completely new, you can grab an installer and go through the official tutorial.
pip User Guide: You’ll need to install libraries like numpy, pandas, and others. Pip is the standard package manager for Python.
Virtual Environments: Using virtual environments (via venv or conda) makes it easier to manage different library versions and keep projects isolated.
Jupyter: This interactive notebook environment is popular for data science and will help you experiment with the libraries covered in this episode.

Key points and takeaways

Choosing the Right Machine Learning Library One of the main themes of the episode is how to pick the correct library among the many excellent Python options. Each project has different goals, such as quick experimentation, large-scale data processing, or specialized deep learning tasks. Pete highlights how the ecosystem offers both beginner-friendly solutions (like scikit-learn or Keras) as well as production-grade powerhouses (like TensorFlow or Caffe).
- Links and tools:
  - ActivePython from ActiveState (curated Python distribution)
  - Consider CPU vs. GPU requirements and your deployment environment before deciding
NumPy & SciPy: The Foundational Libraries NumPy and SciPy are core dependencies for almost every other machine learning library in Python. NumPy offers efficient multidimensional arrays while SciPy builds upon NumPy’s capabilities with specialized math, scientific, and engineering functions. Many ML libraries rely on these for linear algebra, Fourier transforms, and other calculations.
- Links and tools:
  - NumPy
  - SciPy
scikit-learn: A Gateway to Machine Learning scikit-learn is one of the best-known ML libraries for Python. It’s designed for quick access to common algorithms such as classification, regression, clustering, and more, all without requiring deep knowledge of GPU hardware or large-scale backends. This makes it ideal for beginners or projects that don’t need complex deep learning architectures.
- Links and tools:
  - scikit-learn
Keras: Accessible Deep Learning Keras aims to take the complexity out of defining neural networks. It wraps powerful deep learning engines like TensorFlow, Theano, or CNTK, allowing you to switch the “backend” with minimal code changes. With a dozen lines of Keras, you can define complex neural network layers, activation functions, and training loops.
- Links and tools:
  - Keras
TensorFlow: Google’s Machine Learning Powerhouse Created at Google, TensorFlow can scale from CPU to multiple GPUs or even specialized TPUs (Tensor Processing Units). It represents ML models as computation graphs, making them efficient and portable across various environments, desktop, mobile, and cloud. The community is massive, and Python is the recommended language interface.
- Links and tools:
  - TensorFlow
Theano: A Symbolic Graph Pioneer Theano was one of the first libraries to introduce symbolic expression graphs for ML and remains a respected project for deep learning research. Although TensorFlow has become more dominant, Theano still powers libraries like Keras’ backend and excels at tasks needing symbolic differentiation.
- Links and tools:
  - Theano
Pandas: Data Wrangling for Machine Learning Real-world data is messy, and machine learning success often depends on data cleaning and preprocessing. Pandas is the go-to library for reshaping, filtering, and aggregating datasets in Python. Before feeding data into an algorithm, many developers rely on Pandas to handle missing values, normalize scales, and transform formats.
- Links and tools:
  - Pandas
Caffe & Caffe2: Vision-Focused Frameworks Initially developed at Berkeley, Caffe was designed for image processing tasks, offering speed and efficiency for training on visual datasets. Facebook’s Caffe2 extends this focus to broader mobile and web deployments. With GPU acceleration, Caffe can process tens of millions of images daily.
- Links and tools:
  - Caffe
  - Caffe2 on GitHub
Jupyter: The Interactive Data Science Environment Jupyter notebooks are central to the Python data science workflow, enabling live code, math equations, and visualizations all in one environment. Sharing a Jupyter notebook allows other researchers or team members to replicate experiments, see results, and run code cells interactively.
- Links and tools:
  - Jupyter
CNTK: Microsoft’s Cognitive Toolkit CNTK (Cognitive Toolkit) is Microsoft’s answer to large-scale deep learning, featuring efficient graph-based computation like TensorFlow and Theano. It integrates seamlessly with Keras as a backend. CNTK’s design focuses on performance for speech recognition, image tasks, and robust training pipelines on Azure.

Links and tools:
- CNTK on GitHub

NLTK: Natural Language Processing in Python NLTK (Natural Language Toolkit) is a stalwart library for tokenizing text, tagging parts of speech, and running a variety of advanced NLP tasks. It provides massive text corpora for academic and industry research, forming the backbone for chatbots, language models, and even more advanced systems like translation services.

Links and tools:
- NLTK

Interesting quotes and stories

"I started programming at a pretty young age... building games for my BBS back before Stack Overflow existed." -- Pete Garson

"Keras is so simple to use it’s almost shocking. You can define a multi-layer network in just a dozen lines of code." -- Pete Garson

"A GPU is a highly parallel computer. We’re talking thousands of cores, not just four or eight." -- Michael Kennedy

"All of these libraries are built by super smart people pushing the boundaries of what’s possible in open source." -- Pete Garson

Key definitions and terms

GPU (Graphics Processing Unit): A chip originally designed for rapid image rendering, now heavily used in parallel computing for machine learning tasks.
Deep Learning: A subset of machine learning based on neural networks with many layers (“deep” architectures) for tasks like image recognition and language processing.
Computation Graph: A structured representation of operations (nodes) and data flows (edges) widely used by libraries like TensorFlow, Theano, and CNTK.
Tokenization (NLP): The process of splitting text into meaningful units such as words or sentences before analyzing them.

Learning resources

Python for Absolute Beginners: If you are completely new to Python, start here to gain a solid foundation in coding with Python 3.
Data Science Jumpstart with 10 Projects: Looking to get hands-on with practical, real-world data science projects? This course takes you through multiple guided challenges.
Getting Started with NLP and spaCy: If you find NLTK interesting or want to dive deeper into modern NLP techniques, this course covers spaCy and other related tools for text processing.

Overall takeaway

Python’s machine learning ecosystem has become extremely rich and approachable. Whether you are starting with scikit-learn for straightforward models, using Keras to build deep neural networks with minimal boilerplate, or scaling up to TensorFlow and CNTK for enterprise-level computations, there is a library perfectly aligned with your goals. These tools, together with foundational packages like NumPy, SciPy, Pandas, and Jupyter, enable both beginners and experts to perform cutting-edge ML research and production-grade deployments. Above all, the community-driven, open source nature of Python’s ML stack continues to accelerate innovation and make machine learning more widely accessible than ever.

Links from the show

Pete on Twitter: @rawktron
Pete on GitHub: github.com/rawktron
ActivePython: activestate.com/activepython
NeuroBlast AI Game: github.com/ActiveState/neuroblast

The 10 Machine Learning Libraries
Numpy/Scipy: numpy.org
Scikit-Learn: scikit-learn.org
Keras: keras.io
TensorFlow: tensorflow.org
Theano: deeplearning.net/software/theano
Pandas: pandas.pydata.org
Caffe/Caffe 2: caffe.berkeleyvision.org
Jupyter: jupyter.org
CNTK: microsoft.com/en-us/cognitive-toolkit
NLTK: nltk.org
Episode #131 deep-dive: talkpython.fm/131
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #131 deep-dive: talkpython.fm/131

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Data science has been one of the major driving forces behind the explosion of Python in recent

00:04 years. It's now used for AI research, it controls some of the most powerful telescopes in the world,

00:09 it tracks crop growth and prediction, and so much more. But with all this growth,

00:14 there's an explosion of data science machine learning libraries. That's why I invited Pete

00:18 Garson onto the show. He's going to share his top 10 machine learning libraries for Python.

00:23 After this episode, you should be able to pick the right one for the job.

00:27 This is Talk Python To Me, recorded July 20th, 2017.

00:31 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem,

00:50 and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy.

00:55 Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter

01:00 via at Talk Python. Talk Python To Me is partially supported by our training courses. Here's an

01:06 unexpected question for you. Are you a C-sharp or .NET developer getting into Python? Do you work at a

01:12 company that used to be a Microsoft shop, but is now finding their way over to the Python space?

01:18 We built a Python course tailor-made for you and your team. It's called Python for the .NET

01:23 developer. This 10-hour course takes all the features of C-sharp and .NET that you think you

01:29 couldn't live without. Unity Framework, Lambda Expressions, ASP.NET, and so on. And it teaches

01:34 you the Python equivalent for each and every one of those. This is definitely the fastest and clearest

01:39 path from C-sharp to Python. Learn more at talkpython.fm/.NET. That's talkpython.fm slash

01:46 D-O-T-N-E-T.

01:48 Pete, welcome to Talk Python.

01:50 Thanks. I'm happy to be here.

01:52 That's great to have you here. And I've done a few shows on machine learning and data science,

01:58 but I'm really happy to do this one because I think it's really accessible to everyone. We're

02:02 going to bring all these different libraries together and kind of just make people aware of

02:06 all the cool things that are out there for data science, machine learning.

02:09 Yeah, it's really crazy actually how many libraries are out there and how active the development is on

02:15 all of them. There's new contributions, new developments all the time. And it seems like

02:20 there's new projects popping up like almost daily.

02:22 Yeah, it's definitely tough to keep up with, but hopefully this adds a little bit of help

02:27 for the reference there. But before we get into all these libraries, let's start with your story.

02:30 How did you get into programming in Python?

02:32 I started programming at a pretty young age, like sort of back before Stack Overflow and

02:37 things like that existed. And I sort of mostly made games. I started with basic, like most people

02:43 probably from a certain age, and then worked into working on Pascal and was making games for my BBS

02:50 back in the day, making online games, utilities and stuff like that. And then for Python, later when I

02:56 worked in games for a long time. And when I worked in games, we were doing like tool automations, like build

03:04 automation, certain workflow automation, build pipelines, all that kind of stuff. So Python was something, a tool that we

03:10 used quite a lot there. So that was where I got my start with Python.

03:15 Oh, yeah, that's really cool. Python is huge in the workflow for games and movies, way more than people on the outside

03:21 realize, I think. Yeah, especially for artists. So like a lot of the tools have Python built into them.

03:26 And so artists will use it for like automating model exports or rigging and that kind of stuff. So it's pretty

03:35 popular in that sense. And then also still even just for like building assets for games.

03:40 Okay, I'm intrigued by your BBS stuff. It occurs to me and it's kind of crazy. There may be younger

03:46 people listening that don't actually know what a BBS is. Okay, so a BBS is short for bulletin board system.

03:54 And it was sort of like in a way the precursor to the internet where you used to host what is

04:02 effectively sort of a website on your home computer. And people would like call your phone number.

04:08 And you'd have it hooked up to your modem. And they would like call your phone number,

04:12 connect to your home computer. So in my case, it was like my computer that I played games on and

04:18 did my homework on and that kind of stuff. And they could connect and send messages to each other

04:24 and download files and play games, very simple games, that kind of thing. So it was like a...

04:30 Yeah, it was so fun.

04:31 Yeah, it was awesome. And like, I really, really enjoyed it. And you had a thing called like echo

04:36 mail back then, which was like this sort of way of like transferring messages all over the world. So,

04:41 you know, somebody would send a message on your BBS, and then it would like call a whole bunch of

04:45 others like in this network. And then somebody in like Australia might answer it. And it would take

04:50 like days to get back because it would be like this chain of people's BBS is calling the next one. So

04:56 yeah, there was no internet. It was the craziest thing. Like, we at our house, my brother and I

05:01 had talked my dad into getting us multiple phone lines so we could work with BBS is like in parallel.

05:08 And you would send these mails and like at night, there would be like a, like a coordination of the

05:14 emails across the globe as these things would like sync up the emails they got queued up. It was the

05:19 weirdest thing. But I loved it. I don't know, what's it? Trade Wars or Planet Wars? One of those games. I

05:23 really loved it.

05:24 For sure. I'm like a huge Trade Wars fan. You can actually play it now. Like there are people who have

05:29 it set up on like websites that have like simulated Telnet stuff. And you can, you can play versions of Trade

05:35 Wars, which I have done recently just to like, don't tell me that you're going to ruin my productivity

05:40 for like the whole day. Yeah. You'll be after this. You'll be like, Oh, Trade Wars 2002. Is it still,

05:46 it's still around? People still play it, but it was such a good game. It's fantastic. Yeah, it was

05:51 fantastic. It's awesome. All right. So that's how you got into this whole thing. Like what do you do

05:56 today? You work at ActiveState, right?

05:58 I do. Yeah. I'm a dev evangelist at ActiveState. So generally that means working with developers,

06:03 language communities, trying to make our distributions better. So at ActiveState,

06:06 we do language distributions. Probably a lot of people in the Python space know us that we do

06:11 ActivePython and it's been around for a long time. We were founding member of the Python Software

06:16 Foundation. And so ActiveState has a pretty long history in the Python community. And before that,

06:22 we were, people probably know us from Perl and now we have a Go distribution and Ruby beta coming out

06:28 soon. So we're sort of expanding to all these different dynamic language ecosystems.

06:34 Sure. That's awesome. So I know that maybe people are a little familiar with some of the advantages of

06:40 these higher order distributions for Python, but maybe give us a sense of like, what is the value of

06:46 these distributions over like going and grabbing Go or grabbing Python and just running with that?

06:50 I think that you've got obviously this sense of curated packages. So there are, you know, in the

06:58 Python distribution, there's like over 300 packages. And so you know that they're going to build, know they're

07:02 going to play nice with each other, know that they have current stable versions, all that kind of stuff.

07:06 And then additionally, you can buy commercial support. So for a lot of our customers, so we have a lot of like

07:13 large enterprise customers, they can't actually adopt a language distribution or a tool like that

07:19 without commercial support. They need to know that somebody has their back. And so that's something

07:24 that we offer on these language distributions for those large customers. But for the community and for

07:29 individual developers, then that is something that having that curated set of packages that you know

07:36 is going to work, that you know is going to play nice. And that also is a, maybe a development team

07:40 lead, you might want a unified install base, so that all your developers have the same development

07:47 environment and they and you know, it's all going to play nice. And so that's something that's one of

07:51 the advantages of those.

07:51 That's really cool. Certainly the ability to install things that have weird compilation stuff. Do you guys

07:57 ship the binaries like pre built for that? So I don't have to have like a Fortran compiler or

08:02 something weird?

08:03 Exactly. Yes. So they're all pre built, all pre compiled. So I mean, a lot of people depending

08:08 on what platform you're on, like on Windows, you're not might not even have a C compiler installed and a

08:12 lot of packages are C based. And so they're pre built, you don't and like you said, you don't need a

08:17 Fortran compiler or some, some exotic build tool to actually make it work. It just works out of the box.

08:23 Yeah. Okay, that's really awesome. And active Python is free. If I'm like a random person, not a huge

08:29 corporate that wants support.

08:31 Exactly. Yeah. If you're just a, you know, a developer, it's free to download and free to use.

08:37 And it even if you are, you know, a large corporation, it's free to use in non production settings. So on your

08:43 own. So it's, you know, you can go and just download it, try it out, see if it works for you.

08:47 Okay, yeah, that sounds sounds awesome.

08:49 How many of the 10 libraries we're going to talk about would come built in? Do you know off the top

08:54 of your head?

08:54 I think that actually almost all of them, but maybe I think cafe is on the list. It's not in the current

09:03 one, but it is on the list to be included. So I think actually like pretty much all of the other

09:08 ones, maybe CNTK as well is still new as well. That's really new. So but you know, we are targeting

09:15 to have as many of these as we possibly can. And so pretty much most of them are included.

09:20 That's awesome. So all the libraries that we're talking about, like one really nice way to just

09:23 get up to speed with them would be grab active, active Python, and you'd be ready to roll.

09:28 Exactly. Yeah, awesome.

09:29 Grab them, install them, you're ready to roll right out of the gate.

09:33 Cool. All right. So let's start at what I would consider the foundation of them. The first library

09:40 that you picked, which is NumPy and SciPy.

09:42 Absolutely. And they are foundational in the sense that a lot of other libraries either

09:47 depend on them or are in fact built like on top of them. Right. So they're, they are sort

09:54 of the base of a lot of these other libraries. And most people might have worked with, with

09:59 NumPy sort of the, its main feature is that sort of n dimensional array structure that it

10:05 includes. And a lot of the data that is shipped to a lot of the other libraries is either supported

10:11 that you can send it a NumPy array, or it requires that you, that you format it that way. So especially

10:17 when you're doing machine learning, you're doing a lot of matrices and a lot of like higher dimensional

10:22 data, depending on how many features you have. It's a really, really useful data structure to have

10:28 in place.

10:29 Yeah. So NumPy is this sort of array like multi-dimensional array like thing that stores and

10:37 does a lot of its processing down in a C level, but has of course, it's programming API and Python,

10:44 right?

10:44 Yes. Yeah, exactly. And a lot of these machine learning libraries do tend to have C level,

10:51 like lowest level implementations with a Python API. And that's predominantly for speed. So when

10:59 you're doing tons and tons and tons of calculations, and you need them to be really, really lightning fast,

11:05 that's the primary reason that they do these things, you know, sort of at the C level.

11:09 All right, absolutely. And so related to this is SciPy. They're kind of grouped under the same

11:15 organization, but they're not the same library exactly, are they?

11:18 No. So SciPy is like a more scientific mathematical computing thing. And it has the more advanced like

11:26 linear algebra and like Fourier transforms, image processing, it has like a physics calculation

11:32 stuff built in. So most like scientific numerical computing functionality is built into SciPy.

11:38 I know that NumPy does have like linear algebra and stuff in it. But I think that the preferred is

11:43 that you use SciPy for all that kind of linear algebra crunching.

11:47 Okay, yeah. So a lot of these things that we're going to talk about will somewhere in them have as

11:52 a dependency or an internal implementation of some variation, or even in maybe in its API,

11:57 like the ability to pass between them, these NumPy arrays and things like that.

12:02 Absolutely.

12:02 Yeah. One other thing that's worth noting, that's pretty interesting. And I think this is a trend

12:09 that's growing. Maybe you guys have more visibility into it than I do. But NumPy in June 13th, 2017,

12:16 so about a month ago at the time of the recording, received a $645,000 grant for the next two years to

12:24 grow it and evolve it and keep it going strong. That's pretty cool.

12:28 It is very cool. And I think that you're starting to see that these open source projects are really

12:34 forming the backbone of most of the machine learning research and actually implementation that you're

12:40 seeing out there in the world. There's not a lot of sort of more closed source behind trade secret

12:45 stuff. A lot of the most bleeding edge development and active development is happening in these open

12:50 source projects. So I think it's great to see them receiving funding and sponsorship like that.

12:55 Yeah, I totally agree. And it's just going to mean more good things for the community and all these

12:59 projects. It's really great to see. One thing I want to touch on for every one of these is to give

13:04 you a sense of how popular they are. And for each one, we'll say the number of GitHub stars and forks.

13:11 And that's not necessarily the exact right measure for the popularity because maybe this is you like

13:17 obviously NumPy is used across many of these other things which have more stars, but people don't

13:22 necessarily contribute directly to NumPy. So on. But for NumPy, NumPy has about 5,000 stars and 2,000

13:30 forks to give you a sense of how popular it is. The next one up, scikit-learn has 20,000 stars and 10,000

13:37 forks. So tell us about scikit-learn. scikit-learn is, again, like we mentioned before, is a thing

13:44 that's built on top of scipy and NumPy and is a very popular library for machine learning in Python.

13:50 And I think it was one of the first, if not the first, I'm not 100% sure, but it's been around for

13:56 quite a long time. And it supports a lot of the sort of most common algorithms for machine learning.

14:03 So that's like classification, regression tools, all that kind of stuff. I actually just saw like a blog post

14:08 come up in my feed today where Airbnb was using scikit-learn to do some kind of like property value

14:16 estimation or something using machine learning. So it's being used very, very widely in a lot of different

14:22 scenarios.

14:22 Oh yeah, that sounds really cool. It definitely is one of the early ones. And it's kind of simpler in the

14:30 sense that it doesn't deal with all the GPUs and parallelization and all that kind of stuff. It just,

14:36 it talks about classification, regression, clustering, dimensionality, and modeling, things like that,

14:42 right?

14:42 Yes, that's right. It doesn't have GPU support. And that can make it a little bit easier to install if

14:48 you, you know, sometimes the GPU stuff can have a lot more dependencies that you need to install to make

14:52 it work. Although that's getting better in the other libraries. And it's like you say,

14:58 it is made and sort of designed to be pretty accessible and pretty easy, you know, because

15:02 it has the sort of baked in algorithms that you can just say, oh, I want to do this and it will

15:07 crunch out your results for you. So I think that that's sort of the sort of ease of use and the sort

15:13 of cleanliness of its API has contributed to its sort of longevity as a, one of the most popular

15:19 machine learning libraries.

15:21 Yeah, absolutely. And it's obviously scikit-learn being part of the scipy whole family. It's built

15:27 on numpy, scipy, and matplotlib.

15:30 Yes. Yes. So yeah, it includes interfaces for all that stuff and for like graphing the output and

15:36 using matplotlib and yeah, using numpy for inputting your data and for getting your data results,

15:43 all that kind of stuff.

15:44 Yeah. Very cool. All right. Next up is Keras at 17.7 thousand stars and 6,000 forks.

15:52 So this one is for deep learning specifically, right?

15:56 Yeah. And so this is for doing rapid development of neural networks in Python. It's one of the

16:06 newest ones, but it's really, really popular. I've had some experience working with it directly

16:12 myself and I was sort of really, really blown away by how simple and straightforward it is.

16:18 So there's like, it creates a layer on top of lower level libs like TensorFlow and Theano and lets you

16:25 just sort of define, I want my network to look like this. So I want it to have this many layers and this

16:32 many nodes per layer. And here are the activation functions. And, you know, here's the optimization

16:37 method that I want to use. And you sort of just define this effectively a configuration,

16:42 and then it will build all of the graph for you, depending on what backend you used.

16:48 And so it's very, very easy to experiment with the like shape of your network and with the different

16:56 activation functions. So it lets you kind of really quickly reach and test, you know, different models

17:04 to see which one works better and to sort of see what one works at all. So it's really easy to use

17:10 and really very effective. I used it to build a little game demo where we like had an AI where

17:18 I trained an AI to play against you to determine when it could shoot at you.

17:22 Was this the demo you had at PyCon?

17:25 It is. Yeah. Yeah. And so we had that demo at PyCon. I since did a blog post about it a little bit.

17:31 And then I actually just recently rewrote it in Go for Go4Con too. So eventually it will be open sourced

17:38 so that people can see. But one of the things that you really notice is that the actual like code for

17:44 Keras to basically define the network and do the sort of machine learning heavy lifting part is very,

17:52 very minimal, like a dozen lines of code or something like that. It's really surprising because you think

17:57 it's like a ton of work, but it makes it super easy. Yeah, that's really cool. And it sounds like

18:02 its goal is to be very easy to get started with. I like the idea of the ability to switch out the

18:09 backend from say TensorFlow to CMTK to Theano. How easy is it to do that? Like if I'm, could I run some

18:18 machine learning algorithms and say, let's try it in TensorFlow and say, do some performance benchmarks

18:24 and stuff? No, no, let's switch it over to Theano and try it here and kind of experiment rather than

18:29 completely rewriting in those various APIs. Exactly. You literally, it's just a configuration

18:34 things. You just, it's almost like a tick box essentially, you know, like it's so easy.

18:40 And so that is absolutely one of the, I think the driving key features of that library that you can

18:47 just pick whichever one suits your purpose or your platform, you know, depending on what's available

18:52 on the platform that you're building for. Cause currently there's not TensorFlow versions for

18:57 every platform on every version of Python and all that kind of stuff. Right. Okay. Well, that's,

19:01 that's pretty cool. So there's two interesting things about this library. One is the fact that it does

19:07 deep learning. So maybe tell people about what deep learning is. How does that relate to like

19:13 standard neural networks or other types of machine learning stuff?

19:17 Well, I think the sort of the simplest way to put it is the idea of like adding these additional layers

19:24 to your network to create a more sophisticated model. So that allows you to create things that can take

19:35 more sophisticated feature domains and then map those to an output more reliably. So, and that's where

19:44 you've seen a lot of advances, for instance, like in like a lot of the image recognition stuff that

19:48 leverages deep learning to be really, really good at identifying images or even doing things like

19:55 style transfer on images where you have a photograph of some scene and then you have some other photograph

20:03 and you're like, I want to transfer the style of the evening to my daytime photograph. And it will just

20:09 do it and it looks like pretty normal. And those are like the most, I guess, popular, common,

20:15 deep learning examples that you see cited.

20:18 Yeah, it makes a lot of sense. And you know, it's, it's easy to think of these as being like,

20:22 I know, Snapchatty, like, sort of superfluous type of examples. But you know, machine learning,

20:29 doing them, like, you know, putting the little cat face on or switching faces or whatever. But,

20:35 you know, there's real meaningful things that can come out of this. Like, for example,

20:40 the detection of tumors in radiology scans, and things like that. And these deep learning models

20:48 can do the image recognition on that and go, yep, that's cancer, you know, maybe better than even

20:53 radiologists can already. And then in the future, it's gonna get crazy.

20:57 Exactly. And it's funny, you mentioned that Stanford Medical about a month ago,

21:01 month and a half ago, actually released like, I don't know how many, like 500,000 radiology scans

21:07 that are like annotated and ready for training machine learning. So that exact use case is intended

21:14 to be like a deep learning problem to be applied. And there are all kinds of additionals of these

21:21 datasets that are coming out. I just saw a post this week about deep learning model that was using

21:27 that was measuring heart monitor data and being more effective than cardiologists kind of thing. So

21:33 It's really crazy. You think of this AI and automation disrupting low end jobs, right? Like,

21:39 at McDonald's, we might have robots making our hamburgers or something silly like that. But if they start

21:46 cutting into radiology and cardiologists, and that's, that's gonna like, it's gonna be a big deal.

21:52 It absolutely is gonna be a big deal. I think people probably start need to start thinking about it. I don't think

21:57 it's necessarily a complete replacement thing. It's not, you know, the radiologist AI can't talk to you

22:04 yet, I guess. And until wait till we get to NLTK, but it can definitely augment and lighten the load

22:12 on professions like medicine that are, you know, perpetually overworked and allow them to be more

22:18 effective, you know, human doctors. So I think like as tools, these things are going to be absolutely

22:22 incredibly revolutionary. Yeah, it's gonna be amazing. You know, do you want a second opinion?

22:27 Let's ask, let's ask the super machine.

22:29 Exactly. But I mean, it's able to one of the strengths of all these machine learning models

22:35 is that the machine learning models are able to visualize higher dimensional complex data sets

22:43 in ways that like humans can't really do. And they have like just intense focus, I guess,

22:50 right? These models, whereas it might be, it's pretty hard for a doctor to read every single paper ever

22:56 written on subject X or to look at 500,000 radiology images even across the course of their career.

23:03 So pretty optimistic where this goes, it's going to be interesting to join all this stuff together.

23:08 The other thing that we're just starting to touch on here, and it's going to appear in a bunch of

23:13 these others. So maybe worth spending a moment on as well is Karis lets you basically seamlessly switch

23:21 from CPU computation and GPU computation. So maybe not everyone knows like the power of non visual GPU

23:30 programming. Maybe talk about that a bit.

23:32 For sure. So your GPU, which is a graphics processing unit. So, you know, if you have a gaming PC at home,

23:38 and you have like, you know what I mean, an Nvidia graphics card or an ATI grout.

23:43 Can run the Unreal Engine like crazy or whatever, right?

23:46 Oh, exactly. So if you have if you play games, and you have a dedicated graphics card, you well,

23:50 even without a dedicated graphics card, but you have a GPU, and there's this thing called general purpose GPU

23:56 programming. So that originally, like a GPU is highly parallel computer has like 1000 cores in it,

24:03 or whatever, something some huge number of cores. Yeah, the one to four or 5000 cores per GPU, right?

24:09 Exactly. Yeah. And so like the intention there was originally that it's because it needs to, in parallel process

24:15 every pixel, or every polygon that's going on the screen, right, and perform like effects. So that's why you can get

24:21 like blur and all this kind of stuff in real time, and real time lighting and all that kind of stuff. So it process

24:28 all that stuff in parallel. But then as the people started to develop SDKs that let you like, well, in addition to

24:36 doing graphics programming, we can just run regular programs on these things. And they're really, really

24:41 fast that cut doing math programs. So we can do that. And so now, basically, a lot of these libraries

24:49 support GPU processing, and it's literally just like a compile flag. Now it's getting a lot easier, you know,

24:54 you still have to make sure you have the drivers and that you you know, you have a GPU that's reasonably

24:58 powerful that's and especially if you're doing a lot of computation. And so then you can basically run

25:05 these giant ml models on your GPU. And again, it's something that's pretty, pretty well suited to

25:13 being parallelized. So that is really great use of GPU. And that's why you're seeing it take off,

25:19 because these models are are easily made parallel. Yeah, they're what are called embarrassingly parallel

25:25 algorithms, right? And just throw them at this, these things with 4000 cores and let them go crazy.

25:29 Yeah, the early days, I mean, still, I guess, when you're doing direct decks or OpenGL, or these things,

25:35 like, it's really all about I want to rotate the screen. So that's like a matrix multiplication

25:39 against all of the vector things. And it's really similar, actually, the type of work it has to do.

25:45 The other thing, I guess, which I don't see appearing anywhere in here, but I'm I suspect

25:49 TensorFlow may have something to do with it, is the new stuff coming from Google, where they have

25:54 like going beyond GPUs for like, AI focused chips. Did you hear about this?

26:00 Yes. So Google has a thing called a TPU, which is a tensor processing unit or whatever. And you can

26:07 that's like a cloud hosted, special piece of hardware that's optimized for doing TensorFlow.

26:12 And so I don't know the exact benchmarks in terms of how that compares to, you know, like some gigantic

26:20 GPU assembly. But obviously, Google thinks that this is a worthwhile investment to build these sort of

26:27 hardware racks in the cloud, and then give people access to run their models on there. So I think

26:32 you're probably going to see more and more specialized, ML targeted hardware that's coming out, whether I

26:40 don't know whether it's like, you'll obviously consumer hardware, like you can go and buy it,

26:43 something for your home computer, but especially in the cloud, you definitely will.

26:48 Yeah, definitely in the cloud. Yeah, it's very interesting. They were talking about real time

26:52 training, not just real time answers. So that sounds pretty crazy.

26:55 This portion of Talk Python To Me has been brought to you by DataCamp. They're calling all data science

27:02 and data science educators. DataCamp is building the future of online data science education.

27:07 They have over 1.5 million learners from around the world who have completed 70 million DataCamp

27:13 exercises to date. Learners get real hands-on experience by completing self-paced, interactive

27:19 data science courses right in the browser. The best part is these courses are taught by top data science

27:23 experts from companies like Anaconda and Kaggle and universities like Caltech and NYU. If you're a data

27:29 science junkie with a passion for teaching, then you too can build data science courses for the masses and

27:33 supplement or even replace your income while you're at it. For more information on becoming

27:37 an instructor, just go to datacamp.com slash create and let them know that Michael sent you.

27:42 So speaking of popular libraries and TPUs, the next up is TensorFlow. That originally came from

27:50 Google and it is crazy at 64,000 stars and 31,000 forks. So tell us about TensorFlow.

27:56 So TensorFlow, obviously, yeah, is this is Google's machine learning library and this is forms the sort of

28:02 slightly lower level than something like Keras and like obviously it's used as a backend.

28:07 You can use it directly as well. And what it does is it represents your model as a computation graph.

28:14 So that's effectively a graph where the nodes are like operations. And this is a way that they found

28:22 is really, really effective to represent these models. And it's a little bit more intimidating to

28:28 get started with mostly because you have to think about building this graph, but you can use it directly

28:34 in Python. Python is actually the recommended language and workflow from Google. So for example,

28:40 you know, when I rewrote the Go version of our little game there, I still had to train and export my model

28:46 from Python. So I use Python to build that, export it. So that's the sort of recommended workflow

28:52 currently from Google for many languages is to use Python as the primary language binding.

28:56 Yeah, that's, that's really interesting and great to see Python. Python appears in so many of these,

29:01 these libraries as a primary way to do it. So there's some interesting stuff about this one.

29:08 Obviously it's super popular. Google has so many use cases for machine learning, just up and down

29:15 their whole, you know, everything that they're doing. So having this like developed internally is really

29:20 cool. It has a flexible architecture that lets it run on CPUs or GPUs, obviously, or mobile devices.

29:28 And it even lets it run like on multiple GPUs and multiple CPUs. Do you have to do anything to make

29:35 that happen? Or do you know how it does that? As far as I can tell that, especially for like this

29:40 switching between CPU and GPU, it's essentially a compile flag. So you have to build like when you build

29:45 the libraries or download one of the nightly builds or whatever, you have to get one of the,

29:51 the versions or that has the enabled GPU support kind of thing built in. And I think that there are

29:57 also now increasingly like CPU optimizations in there. So like for instance, Intel is doing hand

30:04 optimized math kernel stuff that's integrated directly into TensorFlow to make it even faster. So that that's

30:12 something that you can also get in like the latest version as well. So I definitely think speed and

30:18 performance and making that stuff easily accessible to depending on what your hardware is and where

30:23 you're going to deploy it is a big focus for them. Yeah, that's really cool. So do you think this is

30:30 running in the Waymo cars, you know, the Google self driving cars?

30:33 Yeah, I mean, I don't know for sure, but I'd be almost positive of it, you know, from everything that

30:37 I've read and people that I've talked to. I mean, this is Google built this to use not just,

30:42 you know, so there, this is the platform for all of their deep learning and machine learning

30:48 projects. And so I would assume that it's that's TensorFlow is powering that and it's running pretty

30:53 much all of their all of their stuff. Very, very cool. It's probably in Google photos and some other

30:58 things as well. Yeah, Google translate, all those things are all, you know, those things,

31:03 pretty much all of the projects when you start looking at them that Google is running are all

31:08 effectively AI projects. And that's basically all the things that, you know, that just recently,

31:16 like the Google translate, which uses machine learning and like statistical models to do the

31:22 translations is approaching human level accuracy for translation between a lot of the popular

31:27 languages where they have huge, huge data sets to pull from. Yeah, that's crazy. And very,

31:32 cool. So up next, number five is Theano at 6000 stars and 2000 forks. And this one is really kind

31:40 of similar to TensorFlow, but really low level, right? Yeah, so it is, you know, more low level,

31:46 and it is very similar to TensorFlow in the sense that it's also a very high level, high speed math

31:51 library. And I believe it's actually it was originally made by a couple of the guys who then

31:56 went on to Google to make TensorFlow. So it predates TensorFlow by a little bit. But it also has,

32:01 you know, the things that we're, we're talking about here, it has transparent GPU use. And you can do

32:08 things like symbolic differentiation, and a lot of like mathematical things, mathematical operations

32:13 that you want to be highly, highly performant. So it is actually pretty similar to what TensorFlow does,

32:21 and sort of serves a similar purpose. But depending on what you're comfortable with, and what your maybe

32:27 existing projects are, then that is probably going to dictate which one you're using. And if you're

32:32 using something like Harris, then you can just choose this as the back end. And I flip the switch,

32:36 just flip the switch. And there you go. Yeah, it's cool. It also says it has extensive unit testing and

32:42 self verification where it'll detect and diagnose errors, maybe you've set up your model wrong or

32:47 something like that. That's pretty cool. That's pretty cool. Yeah, for sure. I mean, all of these

32:51 libraries are built by super, super smart, accomplished people who are creating things

32:58 that are, you know, solving a real world problem for them and really, you know, sort of pushing things

33:04 forward. And I actually think it's great that there's so many, so many libraries in this space,

33:09 because it really is just making it better for everybody. Yeah, the competition is really cool

33:15 to see the different ways to do this and probably cross pollination. Exactly. Yeah. Yeah. So one of

33:22 the things you have to do for these models is feed them data. And getting data can be a super messy

33:27 thing. And the one library that stands out above all the others about taking transforming,

33:34 redoing, cleaning up data is pandas, right? Absolutely. Yeah. Pandas is, is one of those,

33:41 those libraries that if you're manipulating, especially large sets of data and real world data,

33:47 then this is the one that, that people, you know, repeatedly come back to. And yeah, so pandas is,

33:54 for those that might not know, is like a, you know, data munging data analysis library that lets you

34:00 transform it. One of the hardest parts when you're doing machine learning is actually getting your data

34:06 into a format that can be used effectively by your model. And so a lot of times real world data is

34:12 pretty messy, or it might have gaps in it, or it might not actually be formatted in the right units.

34:20 So it might not be sort of normalized so that you're within the right ranges. And if you feed the models,

34:26 just sort of raw data that hasn't really been either cleaned up or, or formatted correctly,

34:32 then what you might find is that the model doesn't converge or you get what seems like random results

34:40 or things that don't really make sense. And so, you know, spending this time and having a library that

34:46 makes manipulating, especially very large sets of data, very easy, like pandas is super useful.

34:53 And even just for instance, like when I was doing that, that little demo there that, that we talked

34:59 about originally, you know, when I started, I was feeding things raw, raw pixel values for positions

35:05 and velocities and stuff. And it just wasn't working. And it wasn't until I really normalized the data,

35:10 cleaned it up that I had started getting good consistent results. So it's, you know, dealing large

35:15 scale data sets and being able to manipulate them effectively is super important.

35:20 Yeah. At the heart of all these successful AI things, these machine learning algorithms and

35:28 whatnot is a tremendous amount of data. It's why the companies that we talk about doing well are like

35:33 enormous data sucking machines like Google and Microsoft and some of these other ones. Right.

35:41 Exactly. And that's where the power of them comes from is like, you know, Google has access to like just

35:46 massive amount of data that we don't have access to regular people. Or like we were talking about earlier

35:54 with like the radiology images, you need to do need a fairly large set of annotated data. And so that's data

36:01 where, you know, these are case files or whatever that, you know, a doctor has already gone through and said,

36:06 this one was a cancer patient, this one wasn't. And without that kind of annotated data, the models

36:13 can't really learn. They need to know what the answer is. Right. And so that's really, really important.

36:20 Yeah. We have the whole 10,000 hours become an expert for humans. It's that's kind of the equivalent

36:25 for machines.

36:26 Yeah, I guess. Yeah. I don't know what the I don't know what the thing is. It's the machines might need

36:31 more. That's one of the things that is really interesting about humans is that our neural networks

36:38 can learn remarkably quickly without having to walk into traffic 1000 times or do something like that.

36:46 And so there's I don't know, there's some magic going on there or something.

36:49 Yeah, there sure is. All right. Next up is cafe and cafe two. And this originally started out as a

36:57 vision project, right?

36:58 That's right. Yeah, Berkeley. And so this was primarily a vision project. And then there's a sort of successor

37:05 that is backed by Facebook, actually, and is more general purpose and is sort of optimized for web and

37:13 mobile deployment. So obviously, you know, if you want to have machine learning based apps on your

37:18 phone, then having a library that sort of targets that is pretty important.

37:22 Yeah, I'm sure we're going to see more of that. I mean, there are even rumors. I don't know how

37:27 trustworthy they are that the next Apple maybe actually today analysis that the next iPhone will have a

37:34 built in AI chip.

37:35 I remember that they just announced so Apple actually just announced machine language SDK core ML at

37:42 WWDC in June. And so Apple is already targeting these sort of deployed ML models. So, you know, in that

37:51 that library's case, you are effectively choosing a pre-made model. So I want image recognition or I want,

37:57 you know, language parsing in my app. And then you can just feed these sort of pre-trained models.

38:01 But it wouldn't surprise me, you know, they've got the was like the motion chip in your iPhone now.

38:07 Yeah, they got the motion chip. Yeah.

38:08 So it wouldn't surprise me at all that to start seeing that phones are deploying AI chips in there

38:13 to assist with this because most of the sort of things like Siri is a machine learning based thing.

38:18 Right. So yeah. Yeah. It's and it doesn't make sense to go to the cloud all the time. Like that's one of

38:24 the super annoying things about Siri is you ask it a question and it's like six seconds later. Like you

38:29 ask it something simple like what time is it? 10 seconds later, it'll tell you it's such and such.

38:34 Like, is it really that hard? Yeah. Yeah. It's got to go all the way to the cloud and you're in some

38:38 sketchy network area or something. Right. Exactly. And so that I wouldn't be surprised to start seeing

38:43 that stuff deployed onto onto mobile. I think at even at build Microsoft's conference, they started

38:49 talking about edge machine learning where like the machine learning happen is getting pushed to all

38:55 these IOT devices that they're working on as well. So a lot of a lot of attempts in this area.

38:59 For sure. Yeah. And that's the next big thing, right? Is like having IOT based machine learning

39:04 devices. Like, can your fridge learn like your grocery consumption habits and, you know, suggest

39:10 tell you like you're going to run out of milk in two days and you're going to the store today. Maybe

39:13 you should pick some up. I mean, it's going to happen kind of crazy, but it totally will happen.

39:18 And yeah. Yeah. I mean, it doesn't sound as crazy as let's just let a car go drive in a busy

39:24 city on its own. That's true. And yet that's, that's something that exists now, right? Like

39:31 that's, that's a, that's a thing like you can, and maybe it's not fully autonomous, but I mean,

39:36 you could go and buy one like tomorrow, you could buy a car that you can turn on autopilot and like,

39:42 it's crazy. It's fully drive for you. So the future is now, the future is here. It's just not

39:49 evenly distributed. This portion of Talk Python is brought to you by us. As many of you know,

39:56 I have a growing set of courses to help you go from Python beginner to novice to Python expert.

40:00 And there are many more courses in the works. So please consider Talk Python training for you and

40:05 your team's training needs. If you're just getting started, I've built a course to teach you Python

40:10 the way professional developers learn by building applications. Check out my Python jumpstart by

40:15 building 10 apps at talkpython.fm/course. Are you looking to start adding services to your app?

40:21 Try my brand new consuming HTTP services in Python. You'll learn to work with RESTful HTTP services,

40:27 as well as SOAP, JSON and XML data formats. Do you want to launch an online business? Well,

40:32 Matt McKay and I built an entrepreneur's playbook with Python for entrepreneurs. This 16 hour course will

40:38 teach you everything you need to launch your web-based business with Python. And finally,

40:42 there's a couple of new course announcements coming really soon. So if you don't already have an

40:46 account, be sure to create one at training.talkpython.fm to get notified. And for all of you who have bought

40:52 my courses, thank you so much. It really, really helps support the show. One little fact or a quote from

40:59 the cafe webpage that I want to just throw out there because I thought it was pretty cool before we move

41:03 on. They say, speed makes cafe perfect for research experiments and industry deployments. It can process 60 million

41:12 images per day on a single GPU. That's one millisecond per image for inference and four milliseconds per image for

41:20 learning. That's insane.

41:21 So fast. And 60 million images per day is just like, it's crazy. And that's why we were talking

41:30 about the data just a minute ago. And the amount of data being poured into these models is just

41:36 staggering every day. And I don't doubt that they're probably feeding, people are feeding these models

41:42 like that much data every day. And I think they were saying 90% of the world's data that's ever been

41:48 created has been created in the last year. And so it's just one of these things where it gets

41:53 accelerates and accelerates and builds on all this stuff. So I think these things are just going to

41:59 get faster until they're effectively real time.

42:01 Yeah, absolutely. All right. I don't think we said the stars for that one. 20,000 and 11,000 forks.

42:08 So up next is definitely one that data scientists in general just live on. And that's Jupyter.

42:15 For sure. And so this has just become like the standard interchange format for sharing data science,

42:24 whether it's papers or data sets or models, or this has just become the sort of standard,

42:32 I don't know what you're going to call it, lingua franca for exchanging this data. And it's effectively

42:37 a tool for the thing called a Jupyter notebook, which is like kind of like a web pages with like

42:43 embedded programs and embedded data sets. I think that's probably a good way to describe it for those

42:48 who might not have used it before.

42:49 Right. It's like instead of writing a blog post or a paper that's got a little bit of code,

42:54 then a little bit of description, then a picture, which is a graph, it's like live and you can re-execute

42:59 it and tweak it. And it probably plugs into many of these other libraries and it's using that

43:03 somewhere behind the scenes to do that.

43:06 Exactly. Yeah. It's built on the IPython kernel for that's like interactive Python kernel. Yeah. I'm

43:13 sure that there are all kinds of specific uses that can run those notebook or that notebook code and use

43:19 that, that stuff there.

43:20 Cool. Next up is maybe one of the newer kids on the block in this deep learning story from Microsoft,

43:26 actually their cognitive toolkit, C and TK.

43:29 Yeah. And it's, they just released, I think the 2.0 version of it beginning of June or late May.

43:35 And, you know, now it's open source and it's, it's got the Python bindings and it's part of,

43:42 you know, Microsoft's been doing a lot of open source work lately and they've been, you know,

43:46 really, really pushing a lot of their own projects.

43:48 And, it's like we said earlier, it's available as a backend for Keras. So it's similar again to

43:56 TensorFlow and Theano that it's, it's again, focused on that sort of low level

44:00 computation as a directed graph. So similar model, I think this is, you know, obviously emerging as a

44:07 popular and efficient way to represent machine learning models is using that directed graph.

44:11 So it's pretty popular too, right? It's got a decent number of stars and forks and obviously

44:17 as a Keras backend and Microsoft backed library, it's going to be pretty popular and pretty common

44:23 out there.

44:24 Yeah, absolutely. These days, you know, with, Satya Nadella and a lot of the changes at Microsoft,

44:29 I feel like this open source stuff is really taking a new direction, a positive one. And also I think

44:35 their philosophy is if it's good for Azure, it's good for Microsoft. And so this plugs into their

44:41 hosted stuff and interesting ways. And they've got a lot of like cognitive cloud services and things

44:47 like that.

44:47 Yeah. Azure is becoming pretty huge. It's like starting to rival maybe even AWS for, you know,

44:57 a lot of this cloud hosted services and especially around machine learning, like Azure has so many

45:01 different machine learning tools available. And it's really clearly a pretty, pretty big focus for

45:08 Microsoft. And again, it's great to see, you know, more of the, you know, the sort of big guns being

45:14 more open about their development and sharing. I mean, it drives everybody forward and, and, you know,

45:19 just accelerates development across the whole ecosystem.

45:21 Yeah. And they have a number of the Python core developers there. They have Brett Cannon,

45:25 they have Steve Dower, they have, you know, VLAN, like there's some serious people back there working

45:30 on the Python part.

45:31 Exactly. Yeah. They've got a lot of the Python core team there. And, I know a bunch of the

45:37 guys from active state were just at PI data in Seattle and, you know, huge number of the core team

45:42 were there and, you know, just really, really great little conference. They're talking about

45:48 Python and data science. Yeah. I think they have some really interesting language stuff as well.

45:52 So speaking of languages, the, most, certainly the longest running one, probably that's really

45:58 still going strong is NLTK with 5,000 stars and 1.5 thousand forks.

46:03 Yeah. And so NLTK was like the natural language toolkit. And, you know, obviously this is a thing

46:09 for doing natural language parsing, which is, I guess, one of the holy grails of, of machine

46:14 learning is to get it to be really, you know, so you can just speak to your, to your computer

46:19 and completely natural language, and maybe even give it instructions in natural language

46:23 and, and be able to be able to follow your, for your directions and understand what you're

46:28 asking. And so this is like a really popular one in academia for research. They link to and

46:35 include massive corpora of, of work. So that's like gigantic bodies of text in different languages

46:43 and in different styles to be able to train models. So there's, there's also like a pretty

46:48 large, like open data component to this project as well. And, obviously, you know, the use

46:54 case here for natural language is, you know, it's huge for translation. Like we mentioned earlier,

46:59 chatbots, which are now a huge thing for like support. I mean, every website you go onto and

47:05 it pops up, Hey, I'm, you know, Bob and I'm, can I help you today? And it's like, not a really a

47:10 person. It's just a chatbot. And, you know, there's just so many. And then like we were saying, Siri and,

47:16 and Cortana and all those sort of personal assistants where you can say, ask it a natural language question

47:22 and it can come back to you. So this is the sort of almost like foundational library still going

47:27 strong, still tons of active development and research going on with this. Yeah. It's really

47:32 cool. And especially with all the smart home speaker things, Google home, home pod, all that stuff.

47:37 This is just, this is going faster, not slower terms of acceleration, right? It's

47:42 weird talking more and interacting with them way more. Definitely the chatbots. And anytime you have

47:49 text and you want a computer to understand it, this is like a first step for tokenization,

47:55 stemming, tagging, parsing, semantic analysis, all that kind of stuff. Right? Yeah. And that's,

47:59 that's exactly what it outputs. So it will do is like generate parse trees and, and stem it all out and

48:05 then use those, the kind of tokenized version to use that to train your model, not sort of raw text

48:12 characters. And, we really are getting there. I mean, like these days, like for sure, like just the

48:19 recognition part, you know, the tokenization part is very, very good. It's more like the kind of

48:24 semantic meaning. What do you mean when you ask it, you ask Siri for what are the movie times for X or

48:32 something like that? How specific do you have to be for, to get a reasonable answer from her? Yeah.

48:37 It's got to go speech to text and then it probably hits something like this. Exactly. Yeah, exactly.

48:41 That's going to hit a library like this and we're getting there. It's not quite at the Star Trek

48:46 computer do this for me, but it's like way closer than I kind of ever thought we would

48:51 be. It's really pretty impressive sometimes. Yeah, absolutely. It's, it's fun to see this

48:57 stuff evolve. Absolutely. All right, Pete, that's the 10 libraries. And I think these are all really

49:02 great selections and hopefully people have got a lot of exposure and maybe learned about some

49:09 they didn't know about. And I guess encourage, encourage everyone to go out there and try these

49:13 down and if you've got an idea, play with it with one of them or more. For sure. They're also accessible

49:17 now. You know, you don't necessarily have to be ML researcher or a math wizard to actually create

49:27 something that's interesting or experiment or learn a little bit. These libraries all do a really,

49:32 really great job of abstracting away some of the more complicated mathematical parts. And,

49:39 you know, in the case of a lot of them making it reasonably accessible. And so that's where I think

49:45 you're seeing this kind of like democratization trend in machine learning now where this stuff is

49:51 becoming more accessible. It's becoming easier. And I think you're going to see a lot of creativity and a

49:56 lot of innovation come out of people if they just sort of give it a shot and try something out and,

50:01 you know, learn something new.

50:03 Yeah, that's awesome. I totally agree with the democratization of it. And that's also happening

50:07 from a computational perspective, right? Like these are easier to use, but also with the GPUs

50:12 and the cloud and things like that, it's a lot easier. You don't need a supercomputer. You need 500

50:18 bucks or something for a GPU.

50:20 Exactly. That's the, I think all of these sort of things feed into that in together where you have a

50:25 democratization trend in the tools and the source code so that now a, you can have access to Google's

50:33 years and years of AI research via TensorFlow on GitHub. You also, like you said, can go and buy a

50:40 $500 GPU and have basically a supercomputer on your desktop, but also this open data component where

50:48 you can get access to massive data sets like the Stanford image library and, you know, these huge

50:56 NLTK like language corpora that you can then use to train your models where previously that was probably

51:03 impossible to actually access.

51:05 Yeah, that's a really good point because even though you have the machines and you have the algorithms,

51:09 the data, data really makes it work. All right. So I think let's leave it there for the library.

51:14 So those were great. And I'll, I hit you with the final two questions. You're going to write some

51:20 code. What editor, Python code, what editor do you open up?

51:22 Well, obviously ActiveState has Komodo. So I tend to use that a lot for doing a Python code, but I've also

51:30 to be totally fair. I have used VS Code as well, which is getting increasingly popular. So I tend to like

51:36 to cycle between them all because we have an editor product. And so, you know, it's great to keep up to

51:42 date on what all the other ones are doing. So I tend to cycle around a little bit, but yeah, like

51:48 Komodo is sort of my go-to.

51:50 Yeah, that's cool. Yeah. It's definitely important to look and see what the trends are, what other

51:54 people are doing, how can you bring this cool idea back into Komodo, things like that, right?

51:57 Yeah, for sure.

51:58 Yeah.

51:58 All right. And I think we've already hit 10, but do you have another notable PyPI package?

52:03 I don't know. There's, there's so many. I would again, probably give a, a little bit of a shout

52:08 out to, you know, since we're talking about machine learning to Keras, because I do think as an entry

52:14 point to machine learning, it's so accessible. It's so easy to at least get started and get a result with.

52:22 I would give a little shout out to that, that I think that if you're looking to get into this and

52:25 you're looking to try it out, that's a really great place to start.

52:28 Yeah, I totally agree with you. That's, that's where I would start as well.

52:31 All right. Well, it's very interesting to talk about all these libraries with you. I really

52:36 appreciate you coming on the show and sharing this with everyone. Thanks for being here.

52:39 Thank you for having me.

52:40 You bet. Bye.

52:41 This has been another episode of Talk Python To Me. Our guest has been Pete Carson,

52:47 and this episode has been brought to you by DataCamp and us right here at Talk Python Training.

52:52 Want to share your data science experience and passion? Visit datacamp.com slash create

52:58 and write a course for a million budding data scientists.

53:01 Are you or a colleague trying to learn Python? Have you tried books and videos that just left

53:06 you bored by covering topics point by point? Well, check out my online course,

53:10 Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more engaging way to

53:16 learn Python. And if you're looking for something a little more advanced, try my Write Pythonic Code

53:21 course at talkpython.fm/pythonic. Be sure to subscribe to the show. Open your favorite podcatcher

53:28 and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes,

53:33 Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.

53:39 This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it.

53:44 Now get out there and write some Python code.

53:46 Thank you.

54:06 Thank you.

54:07 Thank you.