Monitor performance issues & errors in your code

#384: Python Data Visualization - Where To Start? Transcript

Recorded on Wednesday, Sep 28, 2022.

00:00 Do you struggle to know where to start? With a wide range of Pythons data visualization frameworks. Not sure when to use plotly versus matplotlib versus altair? Then this episode is for you. We have Chris Moffatt, a Talk Python course author and founder of Practical Business Python, back on the show to discuss getting started with Python's data visualization frameworks. This is Talk Python to Me. Episode 384 recorded September 29, 2022. Welcome to Talk Python to Me. A weekly podcast on Python. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy and keep up with the show and listen to past episodes at Talk Python.FM and follow the show on Twitter via @talkpython. We've started streaming most of our episodes live on YouTube. Subscribe to our YouTube channel over at Talkpython.FM/YouTube to get notified about upcoming shows and be part of that episode. This episode is sponsored by Microsoft for startups 'Founders Hub'. Check them out at Talk Python.FM/foundershub to get early support for your startup. And it's brought to you by us over at Talk Python Training.

01:18 Did you know we have one of the largest course libraries for Python courses? They're all available without a subscription, so check it out over at Talkpython.FM. Just click on courses.

01:30 Transcripts for this episode are sponsored by Assembly AI, the API platform for state of the art AI models that automatically transcribe and understand audio data at a large scale. To learn more, visit talkpython.fm/assemblyai. Chris. Welcome back to talk Python to me.

01:47 Thank you. Glad to be here again.

01:49 I'm glad to have you back. We originally had you on to talk about the work that you're doing at Practical Business Python. And looking at the page now, your last article, Pandas Groupby warning on the 26th, which as of recording two days ago, looks like you're still really active on Practical Business by then.

02:08 I am. It's been a while, to be honest. I spent a lot of time working on the course that we'll talk about in a moment, and so some of this stuff fell by the wayside. And I think, like everybody in the covid, time has been a weird time warp for us all. So I haven't spent as much time on it as I would like. But I am getting back into it and as you mentioned. Just put a short article up there that in some ways kind of encapsulates a lot of what I want to do with Practical Business Python is I'm writing in this specific article in general on the blog about problems that I encounter that I think can help other people. And this was a short article about some kind of gotcha behavior with group by that I've been bitten by a couple of times. And this most recent time I decided, you know what, I need to write about this and share it with people so that hopefully they're not going to fall into the same traps that I had.

03:00 I love when you write something as kind of a note for yourself or a roadmap for yourself, and then later you go back and you search for it. I got to remember how this went and hit number one is your result. You're like, okay, I guess I don't remember this, but I'm going back to it, and my future self thanks my old self for it.

03:18 Right, yes, exactly. And sometimes I even remember I wrote an article about it, and we'll refer to it, and then a lot of times you're right. I'll do a Google search and like, oh, yeah, I did solve that before.

03:29 How interesting. Yeah, cool. What a great resource. People should be definitely checking this out. If you want to do data science, python data science intersected with Python. Of course. You were on the show a couple of times ago, way back in 2019, and then before times escaping Excel hell with Python and Pandas. Basically, you are making the case for using the Python Data Science stack instead of Excel, right?

03:53 Yes, absolutely.

03:55 The blog and a lot of my experience has been doing data analysis, data manipulation, data science, and trying to leverage the power of Python. And most people in a business setting their go to data science tool or data analysis tool is Excel, and it has its place. I'm not advocating we get rid of Excel, but I think there are a lot of things that we can do with Python that are much quicker, much less error prone, and much more efficient than trying to do it in Excel. And Excel is one of those tools where there's such a wide range of usage. There are some people that are experts and can do really complex, very efficient things in Excel, but there's a lot of people that treat Excel like the proverbial hammer everything is a nail. So you try and use Excel to do everything from data cleaning to building out your financial statements to, I don't know, machine learning. And it's probably not really the best tool for all of those things. I think Marathon really feels a nice niche and had the good fortune of using it for a lot of different types of activities in my business career and wanted to talk about that in the podcast.

05:06 One thing that occurred to me, just thinking about the code that I've seen for visualizing data and so on with Python compared to writing algorithms or web apps or something, it seems to me that the amount of Python that you have to know and able to maybe import Pandas, load a CSV, and then graph it, you almost hardly need to know Python at all. And it's more about knowing the APIs of the various visualization frameworks, like matplotlib or seaborn. Right.

05:35 No, absolutely. I agree completely. I mean, you basically need to know, like you said, how to import and even backing up. Probably the biggest challenge is getting Python set up on your system, depending on the system you have and everything getting environments all squared away. But once that's done, you're right. The data visualization library, no matter which one you choose, essentially are knowing how to call functions.

05:57 Yeah, exactly. And they all often seem to have their own little DSL domain specific language for what they've decided they're going to do, right?

06:05 Yes, exactly. And I think that's part of the challenge is everybody thinks a different way and sometimes, like, a library might make a lot of sense to you, but other people it doesn't. And so that's a lot of where I think some of the challenges in the visualization landscape are trying to find that right API that makes sense for the actual business problems or visualization problems that you have and how it fits in your brain.

06:29 So if people are coming from a business perspective and maybe Excel is where they or their colleagues have been working, I definitely recommend people go check out episode 200. And then there was also ten tips to move from Excel to Python. A lot of common themes here. Yes.

06:44 Normally at the beginning of the show, I asked people how they got into programming Python. I've got you to answer that at least once, maybe twice, but probably a third time is not required. So just give us an update on what you've been up to.

06:57 Sure. I am still working in the medical device industry. My job doesn't require Python, but my job does involve a lot of data analysis and working with not necessarily large sets of data, but sometimes data big enough that it's a little bit painful to work in Excel. And I just continue to find Python as the tool that I reach for when I need to do data analysis. And I continue to use it to build repeatable data analysis pipelines. I use it to clean data, maybe take external data that we buy from a third party and integrate it with internal data sets. Or as we'll talk about doing data visualization, I think there are a lot of things that Python can do from a data visualization perspective that are easier than trying to use Excel or some of the other tools that are available in a traditional environment. So I continue to live and breathe Python, and I would say every week I use it a little bit, some weeks a lot more than others, but continue to enjoy kind of that blend of Python real world problems, and not just for software development's sake.

08:07 Do the people you work with, since the program is not officially like your title, do they look at you as kind of like a wizard?

08:13 A little bit, yeah.

08:18 I've done. One of the things that I've worked on was a forecasting tool to help us forecast our business performance anticipate what that's going to look like, which I think a lot of people did over the covid times, trying to forecast was extremely challenging, and they just call it, well, what does the Python tool say? So they don't really get what the underlying libraries are, what's going on, but they do associate money with Python, and I don't know if they really understand what it all means. I certainly try and explain it, but at the end of the day, happy with they do kind of think there's a little bit of a superpower, as you frequently say, that's for knowing that.

08:55 Well, when you go and look at the basic things, that a lot of the tools we're going to talk about in the frameworks that we're going to talk about result in, you could easily look over at something like Excel and go, well, six, one, half dozen, the other kind of the same. Right. But then the amount of customization and specialization that if you go a little bit beyond, just give me a histogram of this data, but you dig into it a bit, it goes far beyond what things like Excel are able to do.

09:23 Yes, absolutely. And it's funny you mentioned histogram. Like, even a few years ago, there wasn't even really a histogram function in Excel. Kind of one of the basic functions I use pretty much for any new data set is the histogram. And Excel didn't have one out of the box. You could build one, but it wasn't there. And I think that just kind of speaks to Excel is approaching visualization from a very different perspective, I think, than the Python tools. Excel is much more of a how can I quickly create something and kind of guide the user through it and then give them almost infinite options to customize the visualization? So you can go in and tweak any individual data point or axes or colors, which is useful for getting started, but I don't think it scales very well. And it also doesn't have some of the more sophisticated, complex visualizations that you can do with the Python libraries that.

10:17 Are out there, like Sunburst and other amazing things. Yeah, cool. All right, well, let's mention your course real quick, because what we're going to cover today is inspired by the course. It's not the same thing as the course, but you recently published course over at talk Python Python Data Visualization. This is really nice because the visualization landscape is so diverse and varied, it's hard to pick, I think. What should I choose? How do I find something that, as you say, fits your brain? How do you find something that's modern, that maybe is interactive or is good for publications? So in this course, you kind of do a survey of many of the popular options. You want to give a quick elevator pitch on this and we'll dive into the topics.

11:01 Sure. Like you said, I think the Python visualization landscape is so rich, there are so many options and many people are discouraged and don't even know where to start. And I think when you try and marry up that landscape with the different types of problems you can solve with visualization, it really requires you to spend a little bit of time with each one of the main libraries or several of the main libraries to think about how they're going to solve your problem. So the course steps through many of the common libraries gives you, like we were talking about, the amount of Python that you need to know to do visualization is fairly minimal, so we don't spend a lot of time on Python. It's more about the API and how to interact with each visualization library, to use it in the way that it's intended. And then I also think it's important when people are thinking about visualization, it's not just about the library, it's also about thinking about visualization. And as I pointed out at the beginning of the course, visualization is a really broad topic. There are computer science classes that you can take a whole semester on. There are many, many books that are really strong.

12:11 Edward tough day comes to mind.

12:13 Yeah, exactly.

12:15 And when you start thinking about visualization that way, it changes the way you approach visualization from the Excel approach of how do I just build a bar chart to what is the information I have and how am I trying to convey it to the end audience? And so I spent a lot of time talking about that. And then I also think there is a data manipulation component to this. Once you start to understand how to structure your data correctly, using maybe not correctly, but most efficiently for data visualization, once you start structuring your data in that tidy format, then it's very easy to iterate on your visualizations and zero in on what's going to work best for you and your end users. So that's what the course talks through. It talks to the concepts, real world examples of developing visualizations using many of the libraries we'll talk about, and then how to customize those visualizations from very basics all the way up to building custom dashboards that can be highly interactive and potentially deployed for others to use.

13:21 Cool. Yeah, I definitely learned a ton going through your course. It's over at Talkpython.fm/dataviz

13:27 People can check that out. Let's get maybe a high level landscape. So view of the landscape before we dive into these topics, because there's different branches of this, I guess you would say. So there's a GitHub repo they point out by Nicholas Rogue, I'm not sure how to say his name, sorry, Nicholas. This is an adaptation of Jake Vanderplace graphic about the landscape here. And so how's this picture fit for you? Do you think this is pretty accurate?

13:58 Yeah, I do, because I think it points out a couple of different things when you start thinking about visualization. So one of the key things that you can clean from looking at this. And for the people that are listening, it's a kind of starburst plot. And you've got a whole bunch of linkages between these different visualization libraries. And something that jumps out is mat plot lib is at the center of many of these libraries and it's the foundational tool that's used to build other libraries. And so I think that's a key concept. You can use Matplotlib on its own, or knowing Matplot lib makes it easier to use some of these other libraries that are built on top of it. And the other thing is not shown on this, but from a history perspective, Matplotlib is sort of the grandfather of all Python visualization libraries. It's been around a long time. And what you see for people that can see the visualization on the left hand side, JavaScript visualization is a little bit more modern, like you said, and what people expect when you think about an interactive visualization tool. And so that's a different approach for visualization that has some pluses and minuses. So you've got that mat plot lib and JavaScript, I think, are the key distinctions for how libraries are constructed for visualization and Python. There are some other ones, there are some other libraries, but the two that I really focus on are either map plot lib based or JavaScript based.

15:26 Sure. In the matplot lib side, we have things like matplot lib itself, but also Pandas and Seaborn and Scikit plot ggplot, stuff that people may be familiar from there. And then on the JavaScript side, we've got Bokeh and Plotly some of the more, as you say, interactive ones. And D3JS is in there as far as a foundational item as well.

15:48 Exactly, altar.

15:49 I don't even see Altair in this list. Maybe it's in there somewhere.

15:52 But it is.

15:53 It's over there hanging out close to Mat Plotlib.

15:56 Exactly.

15:57 More strongly related to JavaScript. Got it. The third one in the three main branches here are OpenGL. What do you know about the OpenGL one?

16:06 Yes, it's a question. I don't use them a whole lot. I don't have a whole lot of experience with them. I think there are certainly maybe certain very high volume data analysis that might be where performance is really important, where some of those OpenGo libraries, I think, were originally founded. But I get the sense that most people are gravitating towards either those matplotlib or the JavaScript ones that we've talked about. I don't have as much experience or see as much development there with those libraries.

16:38 Yeah, maybe they're about visualizing changing data that is flowing in real time and you can actually see a change because OpenGL is basically a graphics library for animation. Right?

16:48 Sure. Yeah. I think that real time component is a good distinction.

16:51 Yeah. Probably now before people run fleeing to the hills just because some of these projects are grouped under JavaScript doesn't mean you have to write JavaScript to use them, right?

17:02 Correct. Absolutely. Yeah. And all the ones that I cover in the ones on here actually have a really nice API on top of it. The JavaScript is abstracted away and in some ways there's some benefits. Like I'll call it Altair because it leverages vegalite and anytime that underlying JavaScript library is updated, you kind of get all of those benefits for free through all tier. So it's in the spirit of open source building on the shoulders of others and there's a lot of benefits to having that JavaScript foundation and you don't need to understand JavaScript to use any of them.

17:40 Yeah, and Altair really is like a transformation layer into a vega lite definition which is then processed by JavaScript to be rendered. So there's even this sort of separation layer. So it's not like you necessarily have to change your code to pick up the changes there.

17:55 No, I think at some point, depending on how deep down the rabbit hole you go, there could be points where, okay, if you're doing something highly custom or something really unique, understanding what's going on under the hood could be useful. But you can get pretty far without having to know that, right?

18:11 Maybe you're trying to make a book or an article and you need something. Just so you know what, I'm just going to dump out the vega light definition and just add two things to it. But created through altar.

18:21 Exactly.

18:25 This portion of Talk Python to me is brought to you by Microsoft for Startups Founders Hub, starting a business is hard. By some estimates, over 90% of startups will go out of business in just their first year. With that in mind, Microsoft for Startups set out to understand what startups need to be successful and to create a digital platform to help them overcome those challenges, microsoft for Startups Founders Hub was born. Founders Hub provides all founders at any stage with free resources to solve their startup challenges. The platform provides technology benefits, access to expert guidance and skilled resources, mentorship and networking connections, and much more. Unlike others in the industry, Microsoft for Startups Founders Hub doesn't require startups to be investor backed or third party validated to participate. Founders Hub is truly open to all. So what do you get if you join them? You speed up your development with free access to GitHub and Microsoft cloud computing resources and the ability to unlock more credits over time. To help your startup innovate, Founders Hub is partnering with innovative companies like OpenAI, a global leader in AI research and development to provide exclusive benefits and discounts through Microsoft for Startups Founders Hub, becoming a founder is no longer about who you know. You'll have access to their mentorship network, giving you a pool of hundreds of mentors across a range of disciplines and areas like idea validation, fundraising, management and coaching, sales and marketing. As well as specific technical stress points, you'll be able to book a one on one meeting with the mentors, many of whom are former founders themselves. Make your idea a reality today with a critical support you'll get from Founderhub. To join the program, just visit talkpython.FM/foundershub all one word the links in your show notes. Thank you to Microsoft for supporting the show.

20:15 Well, let's start with the granddaddy, as you put it, matplotlib. So there's so many shiny new things, but you make the case that knowing Matplot lib is still really worthwhile and really important, right? Because it is the foundation of so many things.

20:31 Exactly. It is the foundation, and it's so important to know it. And also it is extremely highly customizable. And I was thinking about this. And in some ways. I think if Matplot Lib. If you stripped out a lot of the old and like everything that they've done in the last three to five years and just focused on that new. And if you could erase all the old tutorials and all of the maybe ancient answers on stack overflow and just focused on the new stuff. People would have a much different perspective on Matplotlib. I think it's hard when you have something that's been around so long and has evolved that you do get some cases out there where, okay, an API or the way they approached a visualization in the past was clunky. But in the five or ten years since, it's improved. And if you use the new and improved, it's really streamlined and really easy. So I think it's important that people not get turned away right off the bat from that matplot lib. And there are certain types of visualizations where if you need a high degree of customization, if you want to print it out or include in a manuscript or a book, matplot Lib is really useful and powerful for it.

21:47 In that distinction about the new and the old. Originally, when Matplotlib came out, correct me if I'm wrong, it's my limited understanding is it was somewhat modeled on the MATLAB way of programming and it had this imperative API. The new one is more object oriented, isn't it?

22:03 Yes, that's exactly right. So there's this state based interface that was based on MATLAB. And for people making that transition from MATLAB to Matplotlib, it was seamless right and they really understood it and made sense. But that way of doing things, it's not really Pythonic. And so the objectoriented interface is newer and is clearly the direction that Matplotlib documentation wants to steer you down that path. And if you stay on that path, then it makes more sense, I think, from a Python perspective. And you do have a tremendous amount of power. And I would say the other thing that I think turns people off with Matplotlib in the past is the visualizations are relatively unstyled.

22:48 They're kind of plain, whereas some of these newer libraries just out of the box make something that looks really nice. Map Plot Lib allows you to customize it, but that's extra work. One of the things that Matplot Lib has done is they have a new theme or relatively new theming API. So if you use that, then you do get visualizations that look a little nicer out of the box and are more visually appealing.

23:12 Yeah, it does have that kind of it looks fine, but it kind of just looks a little bland. Right. And it doesn't have that D3JS feel. I got to give them some pretty mat props on the Xkcd.

23:24 Exactly.

23:27 You've seen this, it sounds like.

23:28 Yes, I have.

23:29 So I'm sure most people out there listening know the Xkcd comic, right. If you don't go to your terminal or command prompt and run Python3 and then just type import, antigravity, then you'll know but it's been around forever and it has this sort of style of like handwritten, but not handwritten. And one of the themes you can get is you can get the Xkcd theme exactly.

23:53 Yes. It's really cool. And every once in a while you'll find an article that someone put together where they show this beautiful visualization, and then you'd be surprised as Matplotlib, and they show all the steps and you can configure it and you can make something as nice as any of the JavaScript frameworks that are out there. But it does take some time, there are more lines of code to get there.

24:16 Yeah, that's true. On this Xkcd thing, it might sound like that's funny, like, everybody loves Xkcd. I do think there is some value to presenting results, whether that be a user interface or a visualization of analysis, where you want to give it this preliminary look, this unfinished look. Right. And so if you're going to come into a meeting and you want to say, this is what the early data says, this is what our first pass analysis says, put it in the Xkcd, they might set that tone versus if it's like, perfect and beautiful, like, oh, you're done. No, we're not done. We're really far from done. This is just the beginning. I just want to give you a hint of what we're finding out. Right. I think there's a way that you could actually use this that would be practical.

24:58 I agree. It's a good point. And one of the things that I think when you go into a business setting, everybody's used to standard Excel plots, and when you bring in something else like this, like an XKCD plot or some other plot that people aren't used to, it does get them to focus and look at it a little bit different and can steer the discussion in a little bit different way.

25:21 Yeah. One of the little areas that I thought was just really nice and really simple would be things like if you go and plot something with matplot lib and you have a lot of ticks along the bottom, it's very common that the words start to overlap each other and you're like, well, this isn't working.

25:40 And just little simple things like putting an angle on the values and the x axis can make it so much nicer. Right?

25:49 Yes.

25:50 And one of the things that I do have a soft spot in my heart from that matplot lib on the official documentation, I think under the tutorials, they took one of the blog posts that I wrote on Matplotlib and it has been incorporated into the official tutorial.

26:06 About how to get started.

26:07 So that's kind of cool. I'm proud of that.

26:11 Absolutely. Another one that I thought was nice to know about that's not at all obvious is formatters using like F string formatter type things for when the data gets put up there. So maybe instead of having 2010 some huge long decimal, you could format that as a number with no decimal point. No sense.

26:32 Right. I use that all the time for currency, so if you're showing like millions, you don't want to have all the zeros. Maybe just format as dollar sign to them or whatever. And Matplotlib makes it very easy to do. And dates here, as you're showing, are always one of those challenges with any time you're plotting something, trying to figure out the right level of dates that convey the information, but don't crowd out the visualization.

26:58 Yeah, absolutely. So there's a lot to learn there, but I still think a lot of people will be doing that matplotlib, but also a lot of people be choosing the newer ones. Now, for each of these, you came up with some pros and cons. So for the pros category of matplotlib, you have robust options that can do almost anything and lots of documentation examples.

27:18 Yeah, I mean, that is, anything you want to do, you can do it in Matplotlib. The challenge is going to be and you can find documentation examples. What does get challenging is because the API has evolved and changed over time. Looking at the official documents are great, but when you can't find it there and you go searching on the web, you will find a tutorial from seven years ago. And the way they're telling you to do it may work, but it's not the most efficient way or may not work well with the way the rest of your code is structured.

27:50 Maybe it's not taking advantage of the themes or maybe it's using the stateful API instead. There's a lot of reasons it could be.

27:56 Exactly. The final challenge with Matplotlib is there is some degree of interactivity, but it's not at the high level that some of the other libraries have. So if you have a scatter plot, you want to zoom in on an individual plot, look at the data. It's not as easy to do in that plot lib as it is in some of the other libraries. The other pro that I'd say with matplotlib is if you are trying to get your visualization in another format. So you're trying to put it in a document, you're trying to put it in PDF, SVG, any kind of graphic format. Matplotlib supports that out of the box. It's very easy to save it in any format and move it into whatever other document you have, whereas some of the other ones are a little more challenging to do that or maybe don't have as high quality output as Matplotlib.

28:46 Does, especially if it's an SVG. You can scale that almost infinitely. Right?

28:50 Yes, exactly.

28:51 Yeah. Another thing that is going to be a repeated theme throughout many of these frameworks, but it's also here is the ability not to just make one charge or one picture or one plot of some variation, but to build multiple plots. Right. So, for example, a matplotlib, you can put on the examples here. They have an MRI with EEG, and so they've got the brain, a picture of the brain, and then MRI, and then also some other measurement, which I don't know enough brain size and what.

29:24 It's for, but neither do I. It looks cool. It looks cool.

29:27 Right. But you can create a single picture that puts like, two graphs and then a graph below it. Right. There's a way to compose these beyond just making one picture.

29:36 Exactly. And if you think, well, I could do this in Excel, I could put two or three graphs next to each other. But here's where the power map plot live comes into play. Let's say you were running experiments in a lab and you're doing hundreds of these. You wouldn't go into Excel and individually position all this. Now you write your Python script, you develop that layout for those different visualizations, and then you're done. Once you get the data, you just kind of run through it and you can iterate through it. And that's really the power, using a real programming language to do visualization versus just trying to do one off visualizations in Excel or some of the other tools out there.

30:12 Sure. And instead of necessarily putting those pictures into a notebook output, you could write a loop. It says, go get me this experiment data and generate this graph and save it to a file, get the next one, find out even what's in the folder, pull them all in, loop over them, generate one picture, then the next, the next, and the next with the right name. And so it could just be all automatic, either in a script or just in a notebook that doesn't have as much output.

30:36 Right, exactly.

30:37 The full power of Python is at your fingertips.

30:40 Yeah, absolutely.

30:41 All right, so that's the kernel of one half of this branch, maybe the next one, this one's a little bit surprising to me because I just don't do enough Pandas. But Pandas, when I think about Pandas, it's about manipulating data and reading data and transforming data and doing that tidy data preparation that you spoke about, all those different things. But there's also graphing built into Pandas itself.

31:04 Yes, and that graphing is built on top of map plot lib. So that's why from a course perspective and thinking about visualization, I think having that matplotlib foundation set you up so that when you're in Pandas, where like you said, you'll be doing the majority of your data input, manipulation and analysis, it's important to understand what visualization tools you have there. And a lot of the standard visualizations that you want to do with a line charts, scatter plots, bar charts, box plots, histograms you can do with Pandas. And it's mostly a very thin wrapper around matplot lib. So that's why it helps if you get this output and you look at it and say, well, I just want to customize it a little bit. Typically if you know that matplot lib API, then you could customize it in Pandas or go into matplot Lib and customize the Pandas output. So I think that is really useful. The other thing that's interesting about Pandas is it will allow you to plug in other back ends. So matplot lip is the default, but you can enable back ends for plotly and a few of the other visualizations. So it can be kind of this universal interface to visualization.

32:21 In my experience, I don't use the Pandas visualization a whole lot, but I do think it's important for people to understand it's out there because there are times where that's the quickest way to get something out there and it's sufficient for the quick and dirty needs.

32:37 Well, yeah, sometimes you open up a Panda, you read something like a CSV or whatever, you just want to know what is this? And you would type DF, a data framed DF head or tail, and it gives you just a little brief view into it. But there's the picture's worth a thousand words sort of thing. And if all you have to say is just dot plot and that's it or hist and now you have a picture instead of the tail or head equivalent, that's a really cool way to just sort of quickly explore the data.

33:06 Exactly. And there are some other more unique plots in Pandas. So there are some plots called the Andrews curves and parallel coordinates that are more advanced. And to be honest, it's more of a data science kind of machine learning plot. It's not something you use that often, but it is important to understand if you need it, that it is out there in Pandas for you.

33:35 Yeah, the Andrews Curves reminds me a lot of like a Lorentz generator attractor from chaos theory, bunch of lines like looping over and over and over and over.

33:46 Exactly. It looks pretty cool. You probably blow some people's minds if you put that up. But it's specialized. But it does speak to when you start getting into this visualization world, there are specialized libraries and you might find something in your niche that you're working that this is really useful and it's really powerful and it's maybe hard to do another tools and boom, it's easy in Pandas or some of the other tools.

34:10 Yeah, so let me tell, I normally try to avoid saying code on here because it's audio, but given a data frame, I can say plot figure and then just call Andrew's Curves, give it the data frame and a name and boom, you get this amazing visualization of your data, like on three incredibly simple lines.

34:29 Yes.

34:30 This is the kind of stuff that I was referring to at the beginning when I said you almost don't need to know Python. I mean, technically there's a little bit of Python, but barely, right?

34:37 Right. And it's just starting to understand how to think about how you could use these tools and getting familiar with it. So that the first time you're trying to build something is that you're not also trying to learn in the API and all these other visualization topics on top of trying to solve whatever problem you're trying to solve with visualization.

34:56 Well, yeah, there's definitely some neat stuff to do visually with Pandas and people should certainly be using it. Also, based on the matplotlib kernel, we have Seaborn. Where does this fit into the world?

35:08 So Seaborne builds on top of matplotlib, like you said, but it really focuses more on doing statistical analysis of your data and taking that data and frequently transforming it in some ways and developing a visualization. So I use it a lot for histograms and box plots. But where Seaborne is really powerful is the ability to do facets facet plotting or small multiples. And that's where you take a plot and you change a variable over each row or column and then you get a grid of several plots. So you've got 9 12 however many plots you need. And it gives you an opportunity to spot data, anomalies trends very easily and present a wealth of information in a very compact frame. And what I like about Seaborne is it is very easy to do this. So the code that's required to do this is typically one or two lines of code and you get this really nice plot that has different colors. It has data varying across the rows and columns. You can change the shape, the size, everything with those visualization concepts that we cover in the beginning to develop visualization and really give you a lot of insight very quickly.

36:36 Yeah, absolutely. It's very statistical focused, isn't it?

36:39 It is.

36:41 It is very statistical focus. And some of the plots are more complex. And having a statistical background will help you understand them. But simple things like just doing bar plots or histograms or account plot or heat map are relatively straightforward to explain to others and easy to create with Seaborn. And then you're right. There are some of the plots that are definitely much more kind of deep in the statistical toolbox, and you have to know how you would explain them to others.

37:12 So this faceting thing is pretty interesting.

37:16 If you go to the Seaborne examples, they've got one for looks like five different variables, or sorry, four variables, and they're looking at two different groups on one of them. And you could just say, basically, instead of giving me a picture of this data, take any two pieces of information and generate a graph to show me how those things correlate. One of the pieces of data that you did a lot of work with was the automobile gasoline efficiency data from the EPA, right?

37:45 Yes. What I find really fascinating about this is when you get the data set up properly and you want to look at all these different relationships, it's very easy to slice and dice and mix up those relationships to see where the trends are. So if we're looking at fuel efficiency, we could look at it by the year the cars were manufactured. We could look at it by are they front wheel drive, all wheel drive cars? How many cylinders do they have? Are they electric?

38:15 SUV versus SUV, all these things?

38:18 Yes, there could be a price component. There could be just a whole bunch of different ways to look at it. And when you have a big data set, those are the types of things that are very difficult to do just by looking at the numbers. And that's where visualization really shines and where Seaborn makes it tremendously easy to just quickly iterate through and say, okay, I want to look at these two variables together. Now let's layer in a third variable. Now there's a fourth variable. Well, I don't like this. Let's switch them around a different way. And I use Seabourn a lot because of that flexibility of just exploring the data quickly, figuring out what those trends are, what those insights are, and rapidly iterating through it for the exploratory analysis.

39:01 Another thing that Seabourn has, it looks really nice and it has this idea of themes.

39:07 Yes.

39:07 So easy to make it look good, right?

39:09 It is. I mean, Seabourn out of the box, applies some themes and also does things behind the scenes with the visualization to make it cleaner and to try and format the data so that dates line up appropriately and colors look nice and there's appropriate spacing and things like that. And there are some other things that are pretty easy to control, to turn on and off or change the color palettes with Seaborn. But generally out of the box, it strives to look good and it looks nice. And what's also beneficial about it is it is just Matplotlib under the hood. And so if you get to the point where you've done your analysis and things look pretty good, but you want to do some tweaks, there are some convenience functions in Seaborne to do that, but there's also it's just matplotlib under the hood. So if you know Matplotlib and you want to tweak some things, you can do that as well.

40:04 Yeah, for sure. One thing I want to maybe take a step back on here with matplotlib on a lot of these examples. Not this one I got up here, and lots of them I'll see you talk about this that will have semicolons some of the time, and Python is famous for not requiring semicolons at the end of line. What's the story there?

40:26 Yeah, the semicolons are just an artifact of when you're developing in a jupyter notebook. And when you show that plot, the matplotlib will show additional information about the plot. So it will have kind of like a string descriptor, and then it will show the plot. And if you use the semicolon, it will suppress that extra information. So all you see is the plot. So it's certainly not required by any means. And you're right.

40:55 For people that have played with Python for a while, you kind of wonder why there's semicolon there, but it's just to suppress some of that extra information that gets shown.

41:04 And it's not on all the lines. It's just on certain plotting lines. The other lines don't need it. Right. So it's a little unclear if you're not sure, which is why. All right, moving on from Seaborne, we start to bridge our way over into the JavaScript D3JS side of things with Altair. And Altair, I think, is certainly very well known and very well respected. It's one of the newer ones, isn't it?

41:27 It is, yes. I have to look, I don't remember off the top of my head, but if Matplotlib was started in 2012, altair is probably in the last five years or so. Yeah, I would imagine so. Definitely much newer, but has been tremendous amount of updates 2015. Okay. And Jake, who started it and maintains it, did an awesome job of leveraging a lot of the best practices from Python libraries as well as our to build altair. And like we said, it's built on top of Vega. So anytime that new work is done in that JavaScript library, it's easier to forward it to Python so that you can leverage that as well.

42:15 Yeah, quite popular. Almost 8000 github stars and Jake vander Plus added or changed 353,000 lines and removed 240,000 lines. And Ellison BG as well. Something on a similar scale. That's a ton of work.

42:30 Yeah, it's a fabulous library.

42:34 One of the things I really like about the library is the gallery. Yeah, that's where going now. And the documentation is really great for altair. And once you start to get into it, you need to spend a little bit of time just make sure you understand how the library works. But then probably 90% of the time, you're going to go to the gallery and try and find something. And you look at the code and you like, oh, okay, that's how I do it.

42:55 This is the one I want.

42:57 Yes, exactly.

42:58 There's a lot of interesting aspects about creating graphs here. So one of the common ones is to create a scatter plot, which is called mark circle in this world. Right. And hopefully those are similar enough to be put together is the same thing. And those great little dots that show like, where's all the data from these different categories say. And one thing that's interesting is you can say, I'd like to color. I want the X to be this value and the Y to be that value. But then I want the color of the dot to be based on another column and maybe the size to be on a third one. So in this EPA car data, you could say, well, I want the color of the dot to be the type of vehicle like an SUV or a car or whatever, and then I want the size to be the number of cylinders in the engine. And that's just incredibly easy, but it really draws out the data.

43:49 It does. And Altair makes it easy to combine this in different ways. So if you want to have a scatter plot and bar chart or a histogram, you can combine these together. You can also do the facets that we talked about with Seaborne, you can do with Altair, where you change the variable across the columns and rows to get different plots. And the other thing that Altair one of the other things Altair introduces is interactivity out of the box. So because it is JavaScript based, you then have that ability to go in. And as you're doing here, for the people who can see it, you can hover over a spot and then control what information is shown for that hover. So you can see what's going on with this individual dot. Well, here's a Volkswagen Rabbit from Europe, and it has a 71 HP engine and gets 31.9 miles per gallon. So it's really cool and really useful for doing that exploratory analysis where you kind of want to see the individual data points and maybe drill into it a little bit more.

44:55 Yeah, the interactivity is great. The ability to add custom tool tips and then have those tool tips have like, F string style formatting as well is pretty excellent. So if the data doesn't show up just the way you would like, you can have some medium large versus, like, this number or that number. So you can kind of think about it separately, differently.

45:16 Right, exactly. And one of the other things that's interesting about altair is it does try to infer different aspects about your data. So it tries to understand, well, is this data, continuous data, is it date data? And you can specify the different types of data. So you can say that it's quantitative or it's an ordinal value, or nominal value. And the actual visualization will change a little bit depending on that data type. That's a unique thing that Altair does. And I think that's one of those things. As you start going down this visualization path, you start to think about your data a little bit differently and think about how you want to present it. And Altair gives you that window into all the flexibility you have with presenting data.

46:03 One of the really nice aspects of interactivity of these is the legend. So the legend will show it really looks nice. It matches like the color and the name and it's in a font that is pretty readable. But if you set it up right, you can go and actually click on these and either just highlight one or you can have them sort of be toggle button. So if you want to just focus on let me pull out just in this case it's got like agricultural finance, government type of spending or something. You can just say I want to just see the educational and health and click that and it highlights that sort of separate from the rest of them.

46:37 Yes, and it's really easy. There's like one line of code that you need to do, set it all.

46:42 Up, add selection or something simple like that. Right?

46:45 Exactly.

46:46 One other thing I want to touch on with altair here and I also want to talk about an example, but you talked about data. Was it transformed or something? You got to either use the file or a little server or something to process the data. What's the story of that?

47:03 Yeah, so one of the things that can be a little tricky with Altair is behind the scenes it's translating whatever data you have into a JSON file. And so what that means is when you have ten or 20 data elements, it's not that big a deal, but when you have thousands of elements you can suddenly get to a point where that file is really huge. And so rightly, so Altair make sure that you don't inadvertently embed that in your notebook. So you could end up with your Jupyter notebook suddenly being 50 megs because you've got all these Altair visualizations in there. So there are some options for how you can manage that data so that it's not necessarily stored directly in the notebook file. Maybe you have that data stored separately, kind of like on a cache file in a directory or potentially more of like a real time streaming option where you have a back end service that's running behind the scenes that streams up the data to you. So there is a little bit more complexity sometimes with Altair to get it running, and because of that visualization, because of the JSON approach that it uses. So that's certainly one of the watch outs and things to keep in mind. The other thing that sometimes has been a challenge for me with Altair is saving visualizations. So if you want to create something as an SVG in Matplotlib or Seaborne, it's very straightforward. You just save it. With Altair, sometimes it can be a little challenging because of the way it's trying to render those visualizations and save them to a PNG or SVG file.

48:41 Yeah. So you got to set that up and select the right one. It'll work if you don't have too much data without doing that. Right. But then correct, there's some limit where it's like, this is too much. You got to have to push it outside of the notebook.

48:52 Yes. You'll get an error message and it'll kind of tell you what's going on. But I do mentioned it in the course, some of the options for getting around it and the documentation is good about what those options are as well.

49:04 Sure. All right, so to wrap up Altair, I want to talk to a little example here, and I'll put this example in the show notes so people can check it out. Let's try to describe this picture here. And like I said, I'll put it in the show notes so people can see it. You have this Amazon author reviews for the top 20 most reviewed books, most reviewed authors over the last ten years, or something like that. Right?

49:29 Yes.

49:29 Tell us what's going on in this picture, and we can maybe talk to just the API components that make it happen.

49:34 Sure. So behind the scenes, the data is, I guess this one's through ten years or so of Amazon reviews. So on the X axis, it's 2009 through 2020, and then on the Y axis, we have authors. So a lot of famous authors in this time frame. And some of these authors will have one book a year. They'll have multiple books a year. And so it will have a circle where for each author for the year, they publish the book. And the size of the circle represents how many reviews they had. So this is a really quick way to see how consistent some of these top authors are over the years.

50:18 It's sort of interesting. Like Dale Carnegie is at the top. I mean, he wrote this book, I don't know, probably 50 plus years ago, but it's still a best seller on Amazon. And then you have other people that are maybe a little more sporadic, but it's a very easy way to see the consistency and then the number of reviews that each author gets for a year. And I would say if an author has more than one book, obviously they'll have more reviews. So it's not broken out by book, it's just purely by author.

50:48 Right. And when you build up this picture, some of the things that happen is there's a legend that shows up. The legend is basically just a copy, slightly offset of what's on the left. But what you really want to know is what's the size of the circle means. You can add like an alternative legend and you can put a grid behind them to make it really easy to follow the timeline. There's just a really cool bunch of features. And this is the kind of picture I was thinking about when I said if you just go and call plot or circle or whatever mark circle, you end up with something that's not all that impressive. But if you layer a few of these ideas on, then it's great, right?

51:24 We said we've got the author in the year and then the size. We configure the size based on the number of reviews. Then we configure the color based on the author so that it's a little easier visually to look at this and see the information. And then we also do this thing that is interesting with Altar is when you think about Seaborne and Matplotlib and Pandas and some of the other libraries we'll talk about, you typically do the data manipulation in Pandas. Altair has its own ability to do manipulation and transformation of data. And so there is this option, I could have filtered it down to the top 20 customers or authors, excuse me, but I use this transform filter to select only the top authors. And that's all in Altair, not using any Pandas. And then the final thing that we do is configure the width and the height and the title. So that's all kind of one long piece of code that looks intimidating.

52:27 Maybe if you haven't worked with Altair. But when you take a step back and break it down and think about what it is you're trying to do with your visualization and then with the basic understanding of the Altair API, it's pretty straightforward and extremely powerful.

52:42 Sure, you kind of got to do it in steps. It's a very fluent API chart markcircle encode config. But if you take each one of those relatively simple function calls, then you try to understand that and see what you're doing, it turns out to be not too bad.

52:59 Yeah, exactly.

53:01 What people need to realize when they're thinking about this course is anytime I develop code and it looks like this and it has this many lines, you're right. I didn't start off I did first. Let's just do author versus year. I don't like this. I need to tweak one more thing, then I need to tweak one more thing and you iterate over it to get there. And once you kind of understand how all these libraries work, it's not too much work, but it does take a little bit of time. And that's why it's important to dive into the data and play with it. And experiment with it and see what works for you.

53:32 Yeah, it almost is its own little mini language.

53:35 It is.

53:37 All right, before we move on to the next one question from the audience I'm all out there ask, can we do responsive and animated workflow diagrams with Matplotlib? For example, continuous builds, developments on different server deployment on different servers. Not entirely sure exactly what you're asking, but certainly you can automate these things, right? It doesn't have to be in a notebook. This could all be put into a script, right?

53:59 Yes. And there are matplot Lib does support some animation. And I can't remember how much of this is out of the box matplotlib versus third party libraries, but I've certainly seen visualization people have done with Matplotlib where it's something changing over time or steps in a process. You can do that with matplotlib.

54:19 So you could create one of those language battles. Have you ever seen those were like, over 20 or 30 years? It's either a browser or popular. Then it goes up.

54:29 Yes, you could do that. I'd have to look and see what would be the best approach, but those sorts of options are out there.

54:36 Fun. All right. Sticking in the JavaScript side of things, the other really popular one over there is Plotly. What's special and unique about Plotly?

54:44 I think Plotly is special and unique because it is a newer plotting library kind of on the Order of Altair. It is supported by a company out of Canada. But the Plotly Visualization library is completely open source. It is based on JavaScript. What I like about it is everything is interactive out of the box. So any plot you make, as soon as it renders, you can take your mouse and you can hover over it, you can zoom in, you can limit the range of data.

55:23 So that is really powerful. And then the second thing that I really like about it is the history of Plotly. There was a separate Plotly Visualization, and then there was something called Plotly Express, which was streamlined. And the Plotly Express API, in my opinion, is very similar to seaborne, and I think it's Pythonic. It's very easy to understand and pick up. And over time, they've expanded it to where now, that's kind of the default visualization. So it's very easy to get started with Plotly. It makes those interactive plots out of the gate. And then it does have some unique plots. So some of the kind of custom tree map plots, scatter matrix, you can do plotting on maps to show geographic plots. Those are all out of the box, work pretty well and are fairly simple to create.

56:19 Yeah, the tree map and the sunburst, those come from Plotly, right? Yes, these are in plotle.

56:27 And one of the other things that is interesting about Plotly, like any of these visualization tools, you have to be able to. Go in and configure and customize and tweak things. And Plotly gives you that ability to generate a plot, but then you can update it over time. So you can change colors, you can change the way data is presented. You can change pretty much anything with the plot, using a fairly simple API as well.

56:55 Yeah, people should definitely go and check out the tree maps. Example one of the really cool interactions is, for example, the one that I'm looking at, here's the world, and then here's the different continents, like Asia, Africa, the Americas. Then if you want to, within each one of those, they've got a little block that says, well, here's how large of an impact. Like, for example, Nigeria and Egypt have more people in it, I guess, than the other countries, and then they're colored by what their actual values are. But if you want to just focus on, say, Africa, you can just click on that section of this thing and it just zooms in to show you just that information. And you're not going to get this with Matplotlib.

57:35 No.

57:35 Right.

57:36 No, you're not the ability to just.

57:38 Dive in and out of the data.

57:39 Yeah, absolutely. And it is very simple. I use plotly quite a bit, especially when I want to explore the data. So you want to have a scatter plot or tree map or any of the other visualizations, and you can easily filter or zoom in, zoom out.

57:58 So you're showing some of the other cool, kind of unique visualizations that are out there, the plotly that maybe aren't as available in some of the other libraries.

58:06 Yeah, the Sunburst has a real similar the Sunburst is like a pie chart, but as you interact with it, it zooms into those sections in pretty amazing ways.

58:15 Yeah, it is pretty cool. I haven't actually seen that one.

58:21 It just says, I want to explore this data. Right. Not only can I, but I'm going to.

58:26 Yes, exactly.

58:27 Yeah. Cool. Does Plotly have this idea of like a back end server, like altair? plotly.

58:33 Out of the box? No, I guess I have to think behind the scenes. The actual architecture, I don't know. But I do know that you don't have to necessarily worry about your data as much and suddenly having a huge file that shows up in your notebook that you have to deal with altair.

58:51 Yeah. Amazing. This is good looking stuff. And really top marks on the animation and interactivity, sort of diving into the data, right?

58:59 Yes.

58:59 Cool. All right. So that's the building block libraries for the different options.

59:04 I know there are many other plotting libraries, and we saw on our original graph that there's a bunch here that we didn't touch on. But these are the ones that you felt are most important right now.

59:15 Yeah, it's interesting. I struggled a little bit with where to draw the line. What else? Some of the other things I might want to bring in. After I posted about the course, I did get some feedback. How come we didn't talk about bouquet and panel and Hallow and I think the short and sweet answer was I had to draw the line somewhere. I didn't have as much experience with those libraries, so I didn't dive into them. They are also libraries that are kind of undergoing some they've undergone changes in the past and they are working to clean up their documentation, get the examples cleaned up. So I think it's certainly worth considering those as well. The other one that I would point out that is really interesting to me, but I haven't used it. But I think a lot of your listeners might be interested in the library called Plot Nine that is meant for people that have used ggplot from an R perspective and it's essentially like a direct port of that to python. So if you really like ggplot and miss it from R than you can use Plot Nine to replicate that in python and it uses a matplotlib behind the scenes.

01:00:28 It looks fascinating to me, especially for people that come from that are background.

01:00:33 Sure. That's just something we haven't even touched on, is like the influence of R and like the parallels there.

01:00:39 Yes, we probably won't do much. So we got two more things to cover and these are where do I maybe run my code if I want to put it online and make it interactive and let other people interact with it. Right, so two of them, streamlit and Plotly dash, tell us about these.

01:00:55 Yes, so everything we've talked about now, especially with plotly and altair, there is some degree of interactivity. But when you want to build a dashboard or want to build more of like an application where you can select and filter data, maybe have different visualizations, maybe have complex visualizations like maps, you need something more than just the out of the box altair or plotly. And Streamlit is a very simple way to wrap a little bit of extra python code around your visualization and you get this interactive application for free. The demo you're showing right now is a great, really powerful example that shows uber ride sharing in New York City. It has sliders for you to choose what time the pickup happens and then it has these real time visualizations for different parts of the city about how many pickups are happening. And what's so cool about streamlin is there is very little additional python code you need to do that. So the workflow that I will typically do is I'll do my visualizations in seaborn or Plodly or all tear, and then once I realize I need that next level, I can then just easily plop them into a separate file with a couple of lines of streamline code and boom, I've got an interactive application that I can run.

01:02:21 Yeah, it's a really interesting way of programming. You basically write a top to bottom procedural script that says if in this case, we're putting the hour of a pickup, the hour of data you want to slice and visualize. And he said, well, if I could write a function that would make a graph given the hour, then you just say and make the Web app.

01:02:44 It gives you the interactive sliders for all the variables that go in. And then as the slider changes, it just recalls your functions and you don't have to know anything about Web programming or Ajax or front end code and all of that just happens. It's pretty fantastic. Streamlit was recently acquired, I think so recently, but within a year or two, yes.

01:03:04 They're required by Snowflake for a really large amount of money. I know a lot of people are kind of scratching their head at that valuation.

01:03:11 Not to knock on Streamlit, I mean, congrats to them, but it's really interesting too, and it will be interesting to see what Snowflake does with them. But this tool right now is open source, and I do think it is a very powerful, easy way to get a real Web native interactive app with very little code.

01:03:30 Yeah, absolutely. Let's see. Demetrius out there has a question, says, there are all these ways to make graphs quickly, but I can't find anywhere on any information on how to make an interactive calendar with events quickly.

01:03:43 Yeah, that's a good question. I don't know.

01:03:46 Yeah, I don't know either of any.

01:03:47 Of these plots that have like a calendar function.

01:03:50 You know, possibly STREAMLIT, maybe.

01:03:53 It's interesting to bring up Streamlit because Streamlit does have like third party apps or plugins that you can incorporate. So the individual post that, I definitely encourage them to take a look at Streamlit and see if there's something out there.

01:04:06 Sure. So I interviewed I believe it was Adrian. Let me double check. Yeah, Adrian Trruille back early days, early days of Streamlining. We're talking two years ago before it was acquired about that. So people can check that out. And I'm somewhat familiar with that. The other one, though, that's very fascinating, that I don't know about is Dash. How's that compared to streamlined. This comes from the plotly of the company as well.

01:04:31 Yes. So plotly, like we mentioned, as a company, they have Dash, which is a much more sophisticated and in depth platform for developing interactive applications or dashboards. So whereas Streamlit is a little bit of code, dash is much more of you kind of are embracing HTML and CSS and you're doing callbacks and you have just a ton of flexibility in how you structure your application. And like this demo you're having here, you've got wind speed, histograms, and you've got a line chart that's fully interactive and interactivity between the charts. As you choose one, it influences another. You can dash gives you flexibility to kind of manage the back end as well, so you can run it. I think it's a flask server, like on your system. You could do that, but if you wanted to do an enterprise grade deployment, you could do that as well.

01:05:37 If you can think of it, Dash will probably let you do it. And if you're a big enough company and it's mission critical, dash does have that enterprise support where you can pay a company to host it and support it for you.

01:05:50 Bill, people should go look at the gallery for plotly Dash. There's a bunch of interesting things, and one of the areas that stands out, you've got many different types of visualizations and whatnot, but one of the things that stands out for me is the streaming data aspect. Create a dashboard where you hook it up to stock market data or IoT data, and it just goes right.

01:06:14 Right. Exactly. It's designed to support a lot of data and low, latency and all those kinds of things, so it's really powerful. But this is one of those areas we talked about everything up until now. You don't really have to know a whole lot of Python. Once you start getting into building Dash, it gets a little more complicated, and I think that's where you want to make sure you've got a good, solid Python foundation before you go and build a dashboard to run your company.

01:06:45 Yeah, but it's very powerful.

01:06:47 Very powerful.

01:06:47 Yeah. The interactivity between the different elements is also quite interesting.

01:06:51 It is, yes.

01:06:54 You can do that to some degree with streamlit, but Dash just makes sure you can have a ton of interactivity between the different widgets and the different visualizations.

01:07:05 Cool. Well, it looks great. If I had a dashboard that looked like this, I'd be proud of it. Not one of those things that you're like, I guess it works. It looks great.

01:07:14 Yeah, I would be, too.

01:07:15 Yes. Cool. All right. Chris spent a lot of time on this. If they wish, I button it up.

01:07:21 But yeah, bring it home.

01:07:23 A lot of great stuff in the visualization space. I think that's one of the really very powerful aspects of Python is just all of these tools. It's not the language, it's not the standard library, which all those they are important.

01:07:37 People need to think this is sort of what people are talking about when they say, Python is awesome, it's great to use. It's not how it does a for loop. It's that I can say Andrew's curve.

01:07:48 Yes.

01:07:49 And it all builds. Right. So if you're just starting on Python, then you start to build a little pandas knowledge. You don't have to throw that away. And then focus on visualization. It all builds on top of it, and you can leverage all that knowledge and then all the other wonderful libraries that are out there. Absolutely.

01:08:06 All right. Before you get out of here, though, final two questions. If you're going to write some Python code. What editor or editors do you use?

01:08:12 I'm pretty much 100% VS code now.

01:08:15 Right on. Even over notebooks.

01:08:16 Yes.

01:08:18 I've gotten to where I use the native Vs code notebooks and I really like that.

01:08:25 Like the Comet divider type of style.

01:08:29 They have continued to update it so much that it just seems like it's a superior approach for what I do and how I manage my environments right now.

01:08:38 Nice. And then notable PyPI package or as I.

01:08:42 Put two in here, I don't have a ton of experience with them, but I wanted to call them out because I do want to spend some time with it. So the first one is Splink. I wrote an article a couple of years back about doing data linkage or data duplication. And so for those of you that aren't familiar, it could be a situation where let's say you have your customer database and it's got Chris Moffatt lives at 123 Main Street and you have a third party data set and it says Mr. Moffatt lives at street is spelled street. How do you merge all that data together? How do you do fuzzy matching? And I played around with different options and this Splink is one that actually came out of the UK from an individual who works at the Ministry of justice, which I think is just a cool name. And he talks about using this to merge millions of records together. And I think it's a kind of really interesting tool that is something that you can't really do in Excel and it's a challenging problem and anytime someone's spent some time on that kind of problem, I think it's really interesting and I want to spend some more time looking at that.

01:09:52 Yeah, and they have on the GitHub repo, they got a couple of videos introducing, which is great. Yeah.

01:09:57 And the other one red frames. So this is another one that I've seen come across my Twitter feed a couple of times. I have not used it directly, but it is another library for manipulating data that's interoperable with Pandas, but gives a little bit more of that fluent API where you can kind of string all these commands together to modify your data in ways that Panda supports a lot of this, but there are certainly some things in Pandas that are a little bit clunky. And this looks like it's an attempt to try and bring some of that our goodness to Python.

01:10:33 Yeah. It also looks a little bit like bringing SQL to this filter. A group sort change the names a little bit, short order by it looks a little bit like SQL.

01:10:45 It just looks really interesting. And like I said, I want to get some visibility to it and I haven't used it extensively, but certainly want to play around with it a little bit more. I thought your listeners might be interested.

01:10:56 Yeah, looks very cool. Thanks for sharing. That all right. People are interested in Python data visualization. They want to know more. What do you tell them?

01:11:04 Right, check out the course. So, really excited about the course. If you've liked what you've listened to, here the course on Talk Python Training has the examples, the notebooks for you to go through and learn and play with this on your own. And by the end of it, you should be at the point where you can start to apply to your own data. So encourage you to check it out. And if you do check it out, let me know. Be interested to see what you think.

01:11:26 Yeah, it definitely covers all of this in hands on detail, not just conceptually. So, yeah, thanks for being here, Chris. Thanks for sharing your experience and yeah, happy to have you back on the show.

01:11:37 Thank you.

01:11:38 Great. Yeah, you bet.

01:11:39 Bye.

01:11:39 Bye.

01:11:39 Bye.

01:11:41 This has been another episode of Talk Python to me. Thank you to our sponsors. Be sure to check out what they're offering. It really helps support the show.

01:11:49 Starting a business is hard. Microsoft for Startups Founders Hub provides all founders at any stage with free resources and connections to solve startup challenges. Apply for free today at Talkpython FM/Foundershub want to level up your Python? We have one of the largest catalogs of Python video courses over at Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and Async. And best of all, there's not a subscription in site. Check it out for yourself at training.talkpython.FM. Be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the itunes feed /itunes, the Google Play feed at /Play, and the direct RSS feed at Rss on talkpython.FM.

01:12:37 We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at Talkpython.FM/YouTube. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon