Learn Python with Talk Python's 270 hours of courses

#288: 10 tips to move from Excel to Python Transcript

Recorded on Tuesday, Sep 15, 2020.

00:00 Excel is one of the most used and most empowering pieces of software out there,

00:04 but that doesn't make it a good fit for every data processing need. And when you outgrow Excel,

00:09 a really good option for that next step is Python and the data science tech stack,

00:14 Pandas, Jupyter, and Friends. Chris Moffitt is back on Talk Python to give us

00:18 concrete tips and tricks for moving from Excel to Python and Pandas.

00:23 This is Talk Python to Me, episode 288, recorded September 15th, 2020.

00:29 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem,

00:48 and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy.

00:53 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at talkpython. This episode is sponsored by the Voyager video

01:02 game, which is built on Python and Linode. Check out what they're both offering during their

01:06 segments. It really helps support the show. Talk Python to Me is partially supported by our

01:12 training courses. I got a joke for you. What's the world's most popular IDE? Excel. Funny, right?

01:18 Except many companies really do run on Excel to the point where they would be much better off

01:23 using clean and simple programming tools. For many, Python's data science stack would be vastly better.

01:29 But moving from Excel to Python is a challenge. Most data science courses don't focus specifically on

01:35 the Excel use cases. That's why we've teamed up with Chris Moffitt from Practical Business Python to

01:40 create a course tailor-made for helping people learn just enough pandas in Jupyter to replace the

01:46 problematic Excel usage with clean and scalable Python code. If you or your co-workers are ready to move

01:52 beyond Excel, visit talkpython.fm/Excel, or just click the link in the show notes to learn more

01:58 about this online course at Talk Python training. Chris, welcome back to Talk Python to Me.

02:03 Thanks, Michael. Really excited to be back.

02:05 Yeah, it's great to have you back. You know, back on episode 200, this is January of 2019. So coming up

02:14 on two years ago, we did an episode together called Escaping Excel Hell with Python and Pandas. And it was,

02:20 I got to tell you, out of upwards of 290 episodes now, well, at the time of the recording, and this

02:27 is the sixth most popular episode ever, there must be some sort of problem with Excel that people are

02:33 looking forward to dodge or get away from to escape.

02:36 Yeah, I think so. I wish I could say it was all me and people wanted to hear me. But no,

02:40 I think you're right. There's definitely some challenges with Excel that people are trying to

02:44 get around.

02:45 Yeah. And there are, and we're going to talk about them. I think there's some interesting

02:49 parallels. You do this in Excel. Here's how you do that in Python. But there's also some

02:54 interesting possibilities like, oh, I could turn this into an API, which is not too many people are

02:59 probably using Excel as an API. But I will bet you there's a Windows machine in the cloud

03:04 running Excel that somebody has an API focused on.

03:07 Oh, I'm sure you're right. And I agree. I mean, I think that's one of the real powerful aspects of

03:12 Python is your ceiling is a lot higher. You can do a lot more once you get into Python.

03:16 Yeah.

03:17 If you only want to use it for a little bit, that's great. But if you really want to

03:20 take it to the next level, those options are out there for you.

03:23 Yeah, absolutely. So let's just do a quick catch up. It's been a couple of years since you've been on

03:27 the show. What have you been up to since then?

03:29 Continuing to work on the blog, continuing to write articles.

03:33 I just saw that PB Python, Practical Business Python, got ranked as the 12th most bestest

03:39 Python blog on the internet. That's awesome.

03:41 I saw that. Yeah, I don't know what their methodology was. But I will say they ranked,

03:45 I think, on four or five different categories. And the one that they gave me a little bit of a lower

03:49 rank on was the frequency of writing new content. And I think that's spot on. I wish I could write

03:54 more. Yeah.

03:55 But it does take time to get good content out there. And I've really enjoyed continuing to do

04:00 that, mainly because it helps me learn and engage with the audience and learn what people want to

04:05 hear about with the Python ecosystem.

04:07 Yeah.

04:08 So I've been continuing to do that. And Awesome.

04:10 I actually have a new job. So that's always been exciting in a pandemic to start a new job and

04:18 work from home full time. Like I know you're very used to.

04:21 I am very used to it. I got lucky and got a mobile, remote friendly job about 12 years ago.

04:28 And I've just never looked back. You know, a lot of the stuff that people are discovering in this

04:33 pandemic has kind of been things I've embraced in life and really appreciate. I don't appreciate being

04:39 locked away from my friends, not being able to go to restaurants. But I do appreciate not having a

04:43 commute, not having all the expense and all those kinds of things. That's really nice.

04:47 I definitely enjoy getting up in the morning and my commute is a couple steps from the bedroom and

04:52 not 30 to 45 minutes. Definitely enjoy that.

04:56 That's right. My office is in my garage, which is a separated garage in my house. I got to go across

05:00 the sidewalk periodically. I'll scare a squirrel when I go out in the morning. So there's a chance of

05:04 excitement, but it's pretty low excitement.

05:06 Yes. Yes. Definitely have to find some benefits in all of this.

05:09 That's right. So you said you have a new job. What kind of work are you doing there?

05:14 I'm working, I'm continuing to work in the medical device industry. And right now I'm doing pricing

05:19 strategy and analytics. So working with the company to plan our new product launches and what our pricing

05:25 strategy is and how do we execute those effectively. So it's been really enjoyable so far.

05:30 Yeah. It sounds like you're able to take these ideas and just a hundred percent make that the core of

05:35 what you're doing.

05:35 Yeah, absolutely. I mean, a lot of, you know, I keep quoting your term that Python is a superpower,

05:41 and that's definitely the case for me is the knowledge that I have to take Python and apply

05:47 it to some of those real world business problems that I encounter on a day-to-day basis, clean some

05:52 data, manipulate some data, build some visualizations to tell a story. It's really powerful and I really

05:57 enjoy getting a chance to do that.

05:59 Yeah. I do think programming is a superpower and Python is a special kind of superpower for sure.

06:04 Exactly.

06:05 Now it's been, like I said, about two years since you were on the show. We did talk about some of the

06:10 benefits of choosing basically Python's data science stack over Excel. But let's, for those who don't

06:19 have an insanely good memory or haven't heard it before, let's just do a quick review of like some of the

06:24 reasons you might want to use Python instead of Excel.

06:27 Sure. So the first thing I think of is the repeatability aspect. So in Excel, once you create

06:34 something, you'll typically end up with a bunch of tabs and a bunch of data all over the place and a

06:41 bunch of formulas that are nested, maybe some VBA. And it's really hard to repeat that process. There's

06:47 no way to start from the beginning and do it again. So if you've done some analysis, say,

06:52 in January and you want to repeat it in October, you almost have to start from scratch. You can see

06:58 that file, but there's no just button to push to run it all again. And with Python, if you do it right

07:03 and build a script, then you do have that repeatability aspect.

07:06 One of the things that drives me crazy about Excel from a programmer mindset is you hear of the different

07:13 control flow structures and whatnot in programming languages like Python didn't have a switch statement,

07:19 but maybe Python is getting a switch statement. It has obviously ifs and loops. One of the things

07:24 very few people boast about their language having or saying a good thing is a go-to statement. You're

07:30 like, oh, our go-tos are really improved now. Like, no, no, they're not.

07:33 No, no. And like you said, Excel is just kind of like a whole bunch of go-to statements. And that

07:38 really is the way you need to troubleshoot Excel. And I think that's the other piece is that

07:44 repeatability kind of goes hand in hand with traceability. I mean, you can certainly write

07:48 challenging Python code that's hard to understand. But in general, Python code flows from the top to

07:55 the bottom and you can see what happens. Whereas Excel, it is really hard to figure out what's going

08:00 on. And even if you look at the formulas, there can be errors in the formulas or there can be

08:05 gaps in the formulas. So there's just a lot of subtle bugs that are difficult to trace with Excel

08:11 that I think Python makes a lot easier.

08:13 I think one of the challenges is when you look at an Excel spreadsheet or it could be Apple numbers or

08:19 it could be Google sheets, like the same basic thing, the data is hiding the logic, right? You

08:26 see a number in a cell and you don't know whether that number was typed there, what was imported from

08:31 something, or if it was computed in a really complicated way. And so you've got these go-tos on

08:37 top of like labels that hide the go-tos.

08:39 And you can hide the data and just see the formulas, but then you don't see the data.

08:44 So like you said, even when you try to get around it, it is not very easy at all.

08:49 Yeah. What about lots of like large files or lots of data, something like that? Excel, Excel's at it,

08:56 right?

08:57 It's funny. I mean, we're certainly not talking about big data by any means, but when Excel gets

09:02 to a little over a million rows, it just can't read the data. And nowadays it's very easy to get data

09:07 that's well outside of what Excel can handle. And I even think if, even if you have a data set that's

09:14 relatively large and fits in Excel, the performance penalty is pretty large and it's difficult to

09:20 manipulate that data and analyze that data efficiently in Excel because it's just

09:25 kind of a resource hog, right? You've got a lot of things that Excel supports, which are really nice

09:29 from a visualization and the way it looks on screen. But in Python, for the most part, once that data is

09:35 read in, if it fits in memory, it's going to be pretty performant and you can manipulate gigabytes

09:40 of data on kind of your standard laptop pretty easily with Python without having to move to some

09:46 of the more advanced data science tools.

09:48 Yeah. I'm always, it blows my mind how fast programming languages are. Even when you've got

09:54 slow Python, right? I think performance of Python is super interesting because you can talk about

10:00 this code runs more slowly than say compiled C++. Maybe it's 20 times slower if it's doing numerical

10:06 calculations, but that's still really fast. But there's also, of course, I could put that into

10:11 like a numpy array. Now it's back to the C level speed and there's all these variations.

10:14 But I remember on this course, I just did the memory management course. There's a section where I did

10:20 performance around different types, structuring your classes in different ways. And so one of the metrics

10:26 I had was how many attributes of the classes can you access per second? I think it was like 18 million

10:33 reads per second. You're like, that's a lot of processing, right?

10:37 Right. Right. Absolutely. And one of the things that I found is even just reading in an Excel file,

10:42 it just takes time. Even in Python, it takes a lot of time because that Excel file is pretty bloated.

10:48 It carries a lot of information outside of just the data. But once it's in Python, then it almost

10:53 doesn't matter. It's really quick and a really, really big benefit.

10:58 Yeah. The other thing to keep in mind as we're talking about this to position it is much of what

11:03 you're proposing or you're going to propose shortly is to use Python's data science stack. So when we

11:09 talk about the performance of data analysis in Python, what you really mean is mostly the performance

11:15 of data analysis in C, right? Because when you talk about NumPy and pandas, a lot of that stuff is

11:21 orchestration layers in Python over top native code.

11:25 Sure. So people a lot smarter than me have figured out how to build these capabilities in C. You're right.

11:32 And it also runs on Windows. It runs on Mac. It runs on all the different operating systems.

11:37 And you're right. It runs very quickly, much more so than people would expect, especially people that think

11:43 that Python has this reputation as being a slow language or maybe not a performant language.

11:48 Yeah. Yeah. Meanwhile, Instagram is running on Python. YouTube is doing a million requests a second.

11:54 Like there's variations on what defines slow, right? I think another interesting thing that is not

11:59 mentioned here, but is probably worth mentioning is price.

12:04 Yes, absolutely. So there are commercial variations of Python and there are companies that support Python,

12:13 but Microsoft is making it available through the Windows store for free. All of the tools that we've

12:19 mentioned like pandas or using conda for managing the environment are free and open source.

12:25 So you can at least get started with it without having to buy some enterprise licensing. And when you

12:30 get in big companies, enterprise licensing is really expensive. And I think sometimes developers that maybe

12:36 haven't been in that environment don't realize how much companies pay for some of these licenses for

12:42 commercial software. It's just really, really crazy to think about.

12:46 Yeah. And I think most enterprises probably have maxed out Excel for everybody, right? They all have Word, they all have Excel, and you know, they all have Outlook.

12:54 Right.

12:56 But you want to do some computation, say in the cloud, right? I want to take some of this work and analyze it over there or turn it into an API endpoint or whatever.

13:05 All of a sudden, maybe we don't have licenses to like fire up Windows machines and put Excel, like that doesn't make a lot of sense, right?

13:12 Whereas you just take these same things as people know in Python and you just pip install them over there instead of over here.

13:18 Or conda, you take your pick.

13:20 Yes. And I think that speaks to this concept that there's just this higher ceiling with Python that you can get started and then, like you said, start to move to the cloud, start to move to some of these other services that are out there.

13:31 And you can take the same thing that you're doing on your local machine to analyze a few thousand rows of data and scale it in the web to millions and millions of rows of data.

13:42 Right. Docker, Kubernetes, all these things become an option.

13:45 Exactly. Yes.

13:47 Nice. A lot of what we're saying sounds like use the Python data science stack instead of Excel.

13:52 But I know for sure there are many people who must deliver an Excel product or maybe receive input starting from an Excel worksheet, a workbook.

14:02 And they've got to maybe they can make this step to the side where I say the rebels live, you know, but they've still got to either receive or deliver Excel, present stuff for other people in Excel and so on.

14:16 There's some cool integration on that level as well.

14:19 Right. Yes. And I think you're touching on a really key point.

14:22 I think people that come from a programming background and see Excel just see the challenges.

14:27 But businesses run on Excel and there is a lot of power in Excel and you can't just take Excel out of the organization.

14:35 You shouldn't march into your CFO's office and say, we're deleting Excel and everybody's going to learn to use Python.

14:43 That's just not practical in any world.

14:45 And I don't think that will really be in the future.

14:48 So thinking about the problem and thinking about where Python really solves the problems and where Excel can still be leveraged.

14:54 Excel is a great modeling tool.

14:56 Python using Pandas and some of the other tools we'll talk about allows you to export Excel files and even build interactive Excel files that you can share with others.

15:07 So that those that just want the final answer or that final report or the final model can still do it in Excel.

15:14 And Excel is a perfectly appropriate tool for a lot of those scenarios.

15:19 Yeah. And there's some nice libraries that you can read Excel files.

15:23 Obviously, Pandas just imports them directly, but you can also write to them like you can format the cells.

15:29 You can change their colors.

15:30 You can add formulas back into the Excel file you deliver.

15:34 Right. So there's a way to sort of do all your work in Pandas, but then generate something that's a live Excel.

15:40 Yes.

15:41 Artifact for people who need to pick it up and run with it.

15:44 I've done that quite a bit.

15:45 And I think it's like a lot of data analysis.

15:47 You start with a rough cut.

15:49 And so maybe you dump something to Excel and it doesn't look very pretty, but it gives you what you need.

15:53 And then once you get to the point where it's almost like a production ready report that you want to give to people every week or month or whenever,

16:02 then you can start to apply that formatting and you can put all the styles that you can really do in Excel by hand.

16:09 You can put those in.

16:10 You can put pivot tables.

16:11 You can put visualizations.

16:12 There's a whole bunch of things you can do there.

16:15 And then if you really want to take it to the next level, you can start to do some of the more sophisticated visualizations like with Streamlit or some of the other tools that are out there that really allow a high degree of interactivity on front or on top of that Python base capability.

16:34 This portion of Talk Python to me is sponsored by the PC game Voyager, available on Steam and published by Tollip Consulting.

16:40 Voyager is an edutainment game in which you travel around the world.

16:44 You visit famous cities and discover well-known landmarks and attractions.

16:47 Enjoy stylized graphics, beautiful photos, and the local music of each city.

16:52 You even apply for a variety of jobs to pay for your next plane ticket.

16:56 Play it yourself or suggest it for your children so they can learn about the world instead of playing battle royales all the time.

17:02 And for an added bonus, Voyager is written entirely in Python.

17:06 Visit talkpython.fm/voyager or just click the link in your podcast player.

17:12 You'll get a key for the game and instructions on how to redeem it on Steam.

17:16 Or you can, of course, also just search for Voyager on Steam.

17:20 That's V-O-Y-A-G-E-R directly on Steam.

17:23 With something like Streamlit, you could publish it.

17:28 People could consume it live.

17:29 Yes.

17:30 Maybe there's even a download as Excel button.

17:33 Yeah, absolutely.

17:33 I mean, I think there is a challenge and I think that'll be something that happens in the future is

17:37 how do you build up that infrastructure at a company?

17:41 When you're at a startup, maybe it's easy to spin up a service and get Streamlit going.

17:44 But as a company grows, it's not that easy for most people to just get access to a Linux server somewhere and run Streamlit and host it for the organization.

17:54 Even though it may be technically relatively simple, there's a lot of hoops to go through.

17:58 So I don't think we've quite made that as easy as we'd like.

18:01 But I think it's going to continue to improve in the future.

18:03 Yeah, I agree.

18:04 If you're just barely moving outside of Excel and you're just starting to pick up Python, like set up your own cloud VM with Linux and SSH probably is a bridge too far.

18:15 Yeah.

18:16 I mean, it's certainly gotten a lot easier than it was many, many years ago.

18:19 I mean, there's so many services that just almost make it push button.

18:22 But I think there's also this aspect of security as well.

18:26 There's I think there's a lot of flexibility in being able to bring Python and install it on your system.

18:31 And there may be some challenges with your IT organization to get that supported.

18:35 But the one thing that I do think is really challenging is when you deal with sensitive data, you want to be careful about putting it out in the cloud.

18:42 And those are the types of things you really get in trouble if you put all your confidential pricing data on a server somewhere and the company is not supportive of that.

18:50 That's kind of what they call a career limiting move.

18:53 That's right.

18:55 You go into the big board meeting like, look, I've done all this amazing work.

18:59 That's on the Internet.

19:00 You're fired.

19:00 Exactly.

19:01 Wait a minute.

19:02 What just happened?

19:03 I thought I was doing good.

19:04 I definitely don't want to promote that.

19:05 And I think most people understand that.

19:07 And there are certainly varying degrees.

19:09 Some companies, it's probably wouldn't be a challenge.

19:12 But then you go to the banking industry or the government or other places.

19:16 And that is.

19:17 Or the medical industry, like what you are in.

19:19 Yeah.

19:19 Or the medical industry.

19:20 Yes.

19:21 A huge, huge challenge.

19:22 Yeah, absolutely.

19:23 So let's dive into the 10 tips.

19:25 Before we do, some of these tips are coming out of a course that you wrote at Talk Python.

19:31 And you want to just tell people really quickly about the course?

19:33 Sure.

19:34 Sure.

19:35 So we just went live with this course.

19:36 And what I'm really excited about is we start from the beginning.

19:40 And it is kind of a Windows focus.

19:42 Because I think a lot of people that are taking this course are in a Windows environment.

19:45 And all the code that we walk through can run on Windows or Mac or Linux.

19:51 But I start from the beginning.

19:53 How do you get your system set up and running in Windows?

19:56 And then one of the powerful aspects of Pandas is that it is a complex library.

20:02 But you really only have to learn a few concepts to get started.

20:06 So we start from the beginning, assuming you know a little bit of Python.

20:09 And then progress through these Pandas concepts with the idea that we want to get you up and running as quickly as possible.

20:16 So that you can apply the code to your own business problems and start to solve some problems.

20:21 And I use the model of talking about how you would do something in Excel and translate it to Python.

20:27 And talk about the Pandas library commands and functions that you would use to manipulate the data.

20:33 And then at the end, I have a kind of a real-world scenario where we talk about ingesting an Excel file or multiple Excel files, combining them, analyzing them, cleaning them, and then generating an Excel report.

20:46 And I think this course really is a nice summary of how you can quickly get up to speed with Python and Pandas, get that stack running, and build a stable foundation for you to progress and just take on more and more complex tasks as you grow your skills.

21:02 Yeah, it's super cool.

21:03 I really enjoyed going through the course as just somebody consuming it, even though I helped put the pieces together.

21:08 And there's some of my stuff I do, some accounting in Google Sheets, not actually Excel, but effectively the same experience.

21:16 And after that, I'm like, yeah, I really need to take some notes and rewrite some of the important parts of what I'm doing in Excel.

21:22 So yeah, people should definitely check that out.

21:24 I'll put a link in the show notes.

21:26 All right, without further ado, let's talk about 10 quick, easy things that we can do in Python and Pandas to move, sort of solve some of our problems or solve a series of problems in one analysis in Excel.

21:38 Sure.

21:38 So I think the first thing is, and it may sound obvious, but being very specific about defining the problem you want to solve.

21:46 And so the example I'd give is if you came to me and said, hey, Chris, I'd like to build a model to predict customer churn at the company.

21:56 And I'd say, well, what kind of model do you have today?

21:59 And they say, well, we don't have a model.

22:01 Well, that may be a little bit much to start out with.

22:04 I think something more appropriate might be, here's a report where I open up Salesforce every day.

22:11 Someone copies and pastes some data into an Excel spreadsheet.

22:15 Someone manually cleans it up and then emails us out.

22:19 And we do it every day.

22:20 And it takes us 30 minutes a day.

22:22 But we have, you know, an intern do it for us.

22:25 That's the crazy stuff is there's somebody usually at these companies that do that.

22:29 Yes.

22:30 Either because they were told every day, every Monday, beginning of the month, whatever, before the presentation, like they sit down and they do these things.

22:38 And it just, it blows my mind because, you know, you maybe spend two hours, but it's not just the two hours.

22:44 Like what if there's a mistake?

22:47 You know, it could be tremendous.

22:50 One of the things you pointed out in the course was there's all these, there's like this history of crazy Excel errors.

22:58 Right?

22:58 Yeah.

22:59 And I think I even mentioned one where one of the genome types that they use in scientific notation, you know, in scientific publications, they actually changed it because it was being confused with a date type in Excel.

23:12 So essentially, it was easier to change science than it was to change Excel to properly process this data.

23:19 That's right.

23:20 I read that article in the global organization that oversees the naming of genes.

23:24 Yes.

23:24 The sequence of DNA pairs that give us all, all the different features we have.

23:30 Some of them were named in ways that were being detected.

23:33 Like one was March one, all one word, but that would become March 1st if you type that in.

23:38 So they literally renamed, I think, 27 different genes.

23:42 So they would be Excel friendly.

23:43 They wouldn't get renamed and or reinterpreted.

23:47 And then also they set up guidelines to say you can't name stuff that might be understood in a non-obvious way by Excel.

23:55 It's crazy.

23:56 It's crazy.

23:56 I mean, as someone from a computer background, you would think, well, why would it matter what it's named?

24:01 But we've all been bitten by those Excel trying to help you with things that just make it all the more worse.

24:07 Yeah, exactly.

24:08 So my disbelief or something along those lines is that this person sits down and does this every so often.

24:15 And you only ask that question weekly, probably, because it's so painful to create the answer.

24:21 And it's so error prone to create the answer.

24:23 What if you could instantaneously, you know, instant, just like on demand, say, give me the current state of the report for the last seven days now or whatever.

24:33 Right.

24:33 And that was a five second calculation rather than a person that could make errors.

24:40 And when it's a very manual process and it takes time, there may be things that you could do to tweak it to make it more useful.

24:47 Maybe you want the report daily or maybe you want to add some extra fields.

24:50 But people just don't want to take that on because it is painful.

24:54 But once you start to automate this process, then adding a new field is easy or increasing your distribution list or reformatting it.

25:02 Now I want a PDF version.

25:03 Right.

25:03 The granularity.

25:04 Right.

25:05 Like, well, we can't do it by zip code.

25:07 We've got to do it by state because it's already taken half an hour.

25:09 It would take a week.

25:10 Right.

25:11 We did it by zip code.

25:11 So those kinds of things.

25:13 All right.

25:13 So step one or tip one is define the problem clearly.

25:16 Yes.

25:17 And it's not just defining it clearly, but it's making sure that you understand that problem.

25:22 So it's almost more taking an existing problem that is really well understood.

25:26 So you can just focus on learning Python and not learning the problem.

25:29 And as you grow your skills, then you can start to tackle more and more complex problems with that base kind of Python knowledge.

25:36 Yeah.

25:36 One of the problems with all computer models, computer software is it'll almost always give you an answer whether or not that answer makes any sense.

25:43 Exactly.

25:44 Yes.

25:45 Yeah.

25:45 100%.

25:46 All right.

25:48 Tip number two.

25:48 So the second one is get your Python environment set up.

25:52 And it's gotten a lot better over the years, especially in the Windows world.

25:57 As I mentioned, it is now officially supported in the Windows store and you can install it there.

26:03 But I recommend using some sort of environment.

26:06 I prefer using Miniconda to install it on Windows, install Python and the associated libraries on Windows and manage the environments.

26:16 I think that's kind of the key thing to take some time and get it set up.

26:21 Because one of the crazy things, just as I've taught new people how to use it, the concept of where are files on my file system is actually something you need to spend some time on.

26:32 When you're on a Windows environment and you've got OneDrive and you've got Teams and you've got SharePoint and you've got Box and you've got Network Drive and local files, most people aren't very familiar with where they all are.

26:45 And Windows makes it hard sometimes to even know exactly where on the disk something is.

26:50 And that's a pretty important skill to have as you start getting your Python environment set up.

26:56 And so it's really important to kind of get that base foundation in place so that you can start doing the actual Python development.

27:04 Yeah.

27:05 And there's a little bit of work to manage the concept of an environment.

27:08 Right.

27:09 Like that's not a trivial thing.

27:11 And the unfortunate aspect is that hits you like step one or two.

27:15 You know, the very next thing you basically start with is, OK, you need an environment then.

27:20 But what's I think trickier is paths.

27:24 Like, oh, when I type Python, it means this.

27:26 But actually, Pip, if I type that, it means something else from a different environment.

27:30 Like I pip installed a thing, but then I run Python and it's not in there.

27:35 It can't import it.

27:36 Well, it's like because for some reason, the pip was from the Python 2.7 path.

27:40 And the Python, when you typed it, it runs the Python 3.1, which is not even the same thing.

27:45 You know, you don't even want to get into that conversation.

27:48 So I think it just started clearly with like content environments or something like that makes it a much easier thing.

27:54 Once you get over that small step, then it's pretty straightforward.

27:56 Yes.

27:57 Yeah, absolutely.

27:58 And I think if you confront it head on and have that discussion and encourage people that if you follow this process, you're not going to break anything.

28:07 So if you're in a virtual environment, the worst case is your virtual environment is messed up and you can kind of restart that.

28:13 Throw it away.

28:13 Start over.

28:14 Exactly.

28:15 Yeah.

28:16 Yeah.

28:16 And especially on a work computer, I suspect.

28:17 Right.

28:18 Like I periodically will get frustrated and reformat my home computer.

28:22 But if it's like a corporate one and it's got all sorts of VPN stuff that they're going to ask you, why are you messing with it?

28:29 You just don't even want to mess that thing up.

28:31 Yes.

28:31 Maybe you don't have permissions to either.

28:33 Yeah.

28:34 And that is one of the things that has gotten better with Python is you can install it without administrative access, which in the past wasn't always the case.

28:42 Yeah.

28:43 Another thing that you advocate, and I guess this leads us into tip three, is really be super organized about your data because you can have source data, but then maybe you open up those Excel files and edit them and then they're no longer source data or they're all mixed in or they have random names.

28:58 So what's your tip on kind of setting the stage with data?

29:02 One of the things that I encourage people to do is get the data in as raw a format as possible.

29:09 So what that might mean is if it's, say, a sales summary data, can you actually get transaction level data that's maybe directly from a data warehouse or your ERP system or somewhere so that it has all the information you need and is in a format that you don't touch?

29:27 So maybe it's Excel, maybe it's CSV, but you store it somewhere and do not touch it at all.

29:33 Because I think what people tend to do with Excel is because you can open up the data, you go in and make changes.

29:41 So people will go and change a file name or change a cell or clean up some data by hand, which is fine, but it's not reproducible.

29:50 Right.

29:51 So I encourage people to set up a structure where the data is stored and you never touch it.

29:57 And then if you do need to manipulate it, you have an intermediate directory structure where you store that manipulated file and then another directory structure where you store the output file and start getting in that disciplined process of managing those input and output files.

30:11 Yeah.

30:12 And you take this as far as having a cookie cutter template that people can run cookie cutter name of the template and it'll generate like the raw data file, the intermediate processing, the reports, the code, all that kind of stuff that always do it the same.

30:25 Right.

30:25 Yes.

30:26 It just makes it easier.

30:27 And when someone's starting, they don't have all the bad habits.

30:30 Right.

30:31 So this is just, you know, this is how you get started.

30:33 You create these directories and you move between the directories in this way.

30:37 And people are trying to learn so much.

30:40 It's just accepted that that's the best way to do it.

30:43 Yeah.

30:43 And if you hand it off to someone else, right.

30:45 Some people in the accounting department worked on it, then they handed it off to, I don't know, the marketing people.

30:51 If they all know that this is the way we do it, then they understand where to go.

30:55 And I also set it up and I encourage people to do this so that you're less likely to make a mistake.

31:02 And if you do make a mistake, you're not going to really break the whole pipeline.

31:06 It's kind of segmented out and firewalled almost, if you will.

31:10 Yeah, absolutely.

31:11 All right.

31:12 Number four gets us into some of the pandas commands.

31:15 Yes.

31:16 So I love pandas.

31:18 Pandas is a, an extremely powerful toolbox.

31:22 And I think that is a big challenge for new users, because if you start to approach pandas, it's almost like, well, where do I start?

31:30 How do I get a handle on all the options that are out there?

31:34 And so I think that the first thing to do is to understand how to select your data by rows and columns.

31:42 And pandas has the loc command, .loc, as well as the .iloc.

31:48 And all the basic pandas tutorials kind of cover this.

31:53 But when you're working with Excel, this is actually an extremely powerful command.

31:59 And it can be almost deceptive how much you can get done with it and how much you need to learn it to use it in the day-to-day analysis.

32:06 So I'd like to spend time just starting there with the basics of selecting data by rows and columns.

32:14 So if you have a data frame, got some columns and some rows, I can say df.loc, L-O-C, bracket.

32:20 And then you can give it column names, almost like slicing, right?

32:24 So I could say, like, state colon country.

32:28 And that might give me state, zip code, city, country.

32:31 Yes.

32:32 Yep.

32:33 So it's kind of building up that mental model.

32:36 People have Excel.

32:38 They have A through ZZ columns.

32:41 And they have the number of rows.

32:42 And we're kind of used to manipulating data that way.

32:45 This starts to get people to think about it in the pandas way.

32:50 And I think it's different, but there's enough similarities that it kind of fits in that mental model.

32:55 Yeah, for sure.

32:56 One of the differences is now all of a sudden it's much more important what you name your column names.

33:01 Yes, you're absolutely right.

33:03 It matters what you name your columns, but the order doesn't matter now.

33:06 So that's one of the challenges with Excel is, let's say you're doing a VLOOKUP and then you add an extra column in there.

33:14 Suddenly your VLOOKUPs are broken or something else breaks down the line.

33:18 So this is a little bit more like kind of a SQL table approach where you care about the names more so than the order.

33:25 And I think that it's easier to troubleshoot naming changes than it is to troubleshoot order changes.

33:32 This portion of Talk Python to Me is brought to you by Linode.

33:37 Whether you're working on a personal project or managing your enterprise's infrastructure,

33:41 Linode has the pricing, support, and scale that you need to take your project to the next level.

33:46 With 11 data centers worldwide, including their newest data center in Sydney, Australia,

33:50 enterprise-grade hardware, S3-compatible storage, and the next-generation network,

33:56 Linode delivers the performance that you expect at a price that you don't.

34:00 Get started on Linode today with a $20 credit and you get access to native SSD storage,

34:06 a 40-gigabit network, industry-leading processors, their revamped cloud manager at cloud.linode.com,

34:12 root access to your server, along with their newest API and a Python CLI.

34:16 Just visit talkpython.fm/Linode when creating a new Linode account and you'll automatically get $20 credit for your next project.

34:24 Oh, and one last thing.

34:26 They're hiring.

34:27 Go to linode.com slash careers to find out more.

34:30 Let them know that we sent you.

34:31 The order changes are a nightmare.

34:35 Yes.

34:36 So next tip is to use what are called accessors.

34:40 Yes.

34:40 Which are really cool.

34:41 I had no idea about accessors before I checked out your course.

34:44 Those people have much data frames I would actually work with.

34:47 Well, this is another kind of good analogy to Excel.

34:51 So we talk about selecting rows and columns of data.

34:53 But now what do you actually do to that data?

34:57 And the string accessor gives you a lot of powerful capability to clean data, to manipulate string data.

35:04 So if you have text, you can uppercase it.

35:07 You can lowercase it.

35:09 You can strip out characters.

35:10 You can use regular expressions.

35:12 You can pretty much do anything that you can do in Python on a string.

35:16 You can do in pandas.

35:17 And the creators of pandas have done all this work to make it really fast.

35:22 So you don't have to write loops.

35:24 You use these accessors for strings to get at that data and potentially filter your data or clean it.

35:31 And then the date time accessor is extremely powerful.

35:36 I mean, I think it's one of those things that it seems on the face that it just gives you access to the date.

35:42 But you can do so much with it.

35:44 And it is so much easier to work with dates and times and pandas than it is Excel.

35:49 And so I think that's a really fundamental concept to grasp so you can start to build more manipulation analysis capabilities in Python and pandas.

36:00 Yeah.

36:00 And that sort of hints a little bit on the underlying types.

36:03 Yes.

36:04 Right.

36:05 Like Excel has some kind of idea of types a little bit.

36:08 Like you do have the date columns and it knows about numbers.

36:11 Put numbers on the right and strings on the left, for example, is a pretty good giveaway.

36:15 But it's more important here because you're doing these operations.

36:18 Like the dot DT probably doesn't make a lot of sense on a string and vice versa.

36:23 Right.

36:23 Yes.

36:24 And that is one of the things where I think as people are new to pandas, there's a lot of times where pandas will make it easy for you.

36:32 But there are certainly times where you have to go into it a little bit more and force the types and make sure that you're having good discussion or a good decision about what type you want.

36:41 And I think it also exposes maybe errors in your data.

36:45 So you think it's a date, time column, but then you go into it and realize you have some dummy values in there and you have to figure out at least what to do with it.

36:54 And Excel may not care.

36:55 But pandas is going to force you to really understand that data and make some decisions about it so you can process it appropriately.

37:02 Yeah, that makes a lot of sense.

37:03 Another thing that you touched on or sort of implied there is that even in regular Python programming, you do a lot of loops and things like that.

37:12 Like I want to compare these two sets.

37:14 So I might do a loop and then zip them together and then check whether the thing I got back, you know, they compare in some way.

37:20 But just in pandas in general, you have to have a much more of a set based mindset, right?

37:26 Yes.

37:27 In general, like I don't even think in the course I even talk about loops.

37:33 And for the most part, you should.

37:34 Yeah, I don't think you do either at all.

37:36 Not have to do a loop in pandas.

37:39 Now, you may have to loop through files or something to get the data into pandas.

37:43 But for the most part, looping is not something you want to do in pandas because they have, like you said, these vectorized formulas.

37:51 So they're specialized and they kind of apply everything in parallel versus sequentially in a loop.

37:57 And it's a different way of thinking.

37:59 And it takes a little bit of time to kind of get that all there.

38:03 But I think it is really powerful because you think about the data as a whole versus a whole bunch of individual cells in an Excel file.

38:10 Right.

38:11 And you could probably still just loop over everything, index back into them and update it.

38:15 But it would not just be more verbose, maybe more error prone.

38:19 It would also be slower.

38:20 Yes.

38:21 Much slower.

38:21 Much slower.

38:22 Yeah.

38:23 Yeah.

38:23 Speaking of working with sets and stuff, the next tip, Boolean indexing.

38:28 So Boolean indexing is really combining those accessors that we talked about with location to then filter your data.

38:36 And I think the best example I always use is like the auto filter in Excel.

38:40 That's probably one of the most common things that I use.

38:44 And I suspect most people use when you get a new data set.

38:47 You open up your data and you click that auto filter and you select maybe a date range and you select maybe certain customers you're looking at or sort by revenue, whatever you're doing.

38:58 And it's just all those kind of drop downs that you select.

39:01 Well, essentially, the Boolean indexing or masking allows you to do that in code so that you can select those data sets and maybe do additional summary analysis or you can actually update the data set.

39:14 So this is a powerful tool, not just for analysis, but also for cleaning your data.

39:18 Right.

39:19 So you could say things like if the total sales for this particular customer is greater than 100,000, I want to pull them out to a separate set and work with them special or maybe give them a discount.

39:30 Right.

39:31 Add to their discount range.

39:32 Things like that.

39:33 Absolutely.

39:34 It's almost like anything you would do with an if statement in an Excel formula you can do with this combination, but it's much more powerful and easier to manage because it's in Python.

39:47 It's not this kind of crazy nested Excel statement.

39:51 And so you can build it up and as complicated as you want, but at least it's all just clean Python and not just kind of gnarly nested Excel formulas.

40:01 Yeah.

40:01 Whoever came up with the if statements and those kinds of things in Excel, what were they thinking?

40:06 I don't know.

40:07 I don't know.

40:07 It definitely doesn't scale very well to multiple if statements.

40:11 It's not the simplest sort of thing, right?

40:13 No, no, it's not at all.

40:15 No.

40:15 All right.

40:15 You talked about next step.

40:17 You talked about having the data in its most raw format and I concur.

40:22 I think that's a great idea, but that often means it might be too granular for the types of questions you're asking.

40:30 The granularity might be off, right?

40:31 Like maybe I want to know sales by state, but I have sales by city, every cell.

40:37 And I just want to know the number and the state, for example.

40:40 Right.

40:41 So I might maybe could do a group by or something.

40:43 Yes.

40:44 Group by is, I think, outside of the local accessors.

40:49 I use group by a ton.

40:51 And group by is the way you can actually aggregate your data across multiple different columns.

40:58 And then not just sum the data, you can perform many mathematical functions on it.

41:03 You can do the average.

41:04 You can do a standard deviation.

41:06 You get the min and the max.

41:07 But you can almost have as many levels as you want.

41:12 And so it's very trivial to do a group by.

41:15 And like you said, look at it maybe by state.

41:18 And then you look at it by region.

41:20 And then maybe you look at it by product.

41:22 Who knows?

41:23 Yeah.

41:23 The salespeople or the product.

41:25 Yeah.

41:25 So it's just a lot of flexibility with very minimal code.

41:30 And I find that that's a really much simpler way to analyze the data and just kind of iterate

41:36 through that process using group by.

41:38 And it's another one of those functions.

41:41 It's almost deceptive in how powerful it is.

41:44 Yeah.

41:44 Yeah, absolutely.

41:45 It's another one of those set based mindsets as well.

41:48 Yes.

41:49 Yes.

41:50 Because it's not your the everything is kind of done on the column basis.

41:54 And so it's just doing all the heavy lifting behind the scenes for you.

41:59 Yeah.

41:59 So next tip, when I hear people who are Excel power users, you know, they're doing serious

42:06 stuff because they talk about pivot tables.

42:08 I don't know.

42:09 I do what pivot tables or of these.

42:11 And then how do they manifest over in pandas?

42:13 Sure.

42:14 So you're right.

42:15 The pivot table is kind of the I think it's probably the number one tool that people use

42:20 in Excel.

42:21 And it's a really convenient way to group the data in multiple different levels across

42:27 rows and columns.

42:28 So you can adjust how many levels you want to group it.

42:31 And then you can summarize the numeric values in multiple different ways.

42:36 And what is really nice about it from an Excel perspective is it's all GUI driven.

42:40 So you can kind of drag and drop your columns and put them wherever you want so that you

42:44 can quickly adjust the way the data is being presented just using that GUI.

42:49 And what pandas has the pivot table command is really kind of group by on steroids.

42:56 So everything that group by can do pivot table can do.

42:59 And it uses the similar kind of structure that the Excel pivot table uses as well.

43:06 And so what I find is group by is the first step.

43:10 And then as I start to group more and more levels or do something called unstacking or stacking

43:16 the data, then pivot table really makes that easier.

43:20 And I think that's a nice transition for people that have experience with Excel.

43:23 They have this pivot table concept.

43:25 And pandas does the same thing.

43:27 And if you start to master that, you've really got a very powerful tool to do a lot of quick

43:33 and easy analysis and reporting.

43:35 Yeah, absolutely.

43:36 So another thing that you talked about is taking multiple sources of data, maybe multiple Excel

43:43 workbooks that have some column that means the same thing and sticking those together and

43:49 having it merge that data together automatically.

43:52 Yes.

43:53 So I think in Excel, probably the most common way to do that is the VLOOKUP.

43:58 So everyone has some VLOOKUP experience.

44:01 They're more advanced.

44:03 Maybe they use index match, or I think there's the new Excel called XLOOKUP, which is even more

44:09 powerful.

44:09 But the basic idea is it's kind of a very simple merge or join of the data.

44:15 And pandas supports that.

44:17 But pandas merging of data is much more like the SQL approach where you can do a left join or

44:23 right join and inner and outer joins and have a lot more sophisticated usage where you can

44:29 join maybe on multiple columns without having to concatenate them together like you would

44:34 in Excel.

44:34 So it's really powerful to do that.

44:36 And then also, I'm sure a lot of people in Excel will do just the cut and paste.

44:41 So you've got different tabs with data, and you can easily, if the data is all the same,

44:47 stack it on top of each other with the pandas concatenate command.

44:50 Right.

44:51 I've got January, February, and March's sales data.

44:54 I want it for the quarter.

44:55 So we're just going to jam it on the end.

44:57 Exactly.

44:57 One after another, right?

44:59 Yeah.

44:59 Yeah.

45:00 And I think people do that all the time.

45:01 And that's certainly a valid approach.

45:04 It's hard to repeat, and it's easy to make mistakes.

45:07 But once you master those concatenate and merge commands, then it's very easy.

45:12 And things that you can do in Excel where maybe cutting and pasting, you know, a thousand

45:17 rows is fine.

45:18 But if you have a million, you just can't really do that in Excel.

45:20 Whereas pandas, it's all the same command, whether you're doing 10 rows or 10 million rows.

45:24 Yeah.

45:25 Yeah.

45:25 Very cool.

45:26 All right.

45:27 So far, these nine tips have probably given us a nice result.

45:31 We've got the data.

45:32 But one of the cool things I can do in Excel is I can go over to that chart section.

45:36 They want this kind of chart or that kind of chart.

45:39 Yes.

45:39 Yeah.

45:40 And so the next tip is picking a plotting library or a data visualization library and starting

45:46 to learn it.

45:47 So one of the challenges in the Python ecosystem is there are a lot of options.

45:53 It's a good thing because there are a lot of options.

45:56 There's a lot of really good options.

45:58 But it is challenging for a new user to figure out where to start.

46:02 So I just want to do a bar chart.

46:03 I want to do a line chart, maybe a box plot.

46:06 Right.

46:06 Even just knowing what basic library I need to start researching.

46:10 Should I do Plotly?

46:11 Should I do Matplotlib?

46:13 I've heard there's other things, right?

46:15 Where do you even start, right?

46:16 Yeah, it's really hard.

46:17 And so what I think is important is if you're getting started and you're tackling this kind

46:23 of well-known problem and you've built up some core pandas knowledge, pick a library

46:29 and go with it.

46:30 And right now, the one that I recommend, it's certainly not the only one, but I've had some

46:35 good experience with it is Plotly.

46:36 And specifically, Plotly has a Plotly Express API, which is kind of a streamlined approach to

46:43 manipulate data.

46:45 And I think that that works really well.

46:48 One of the things I like about it is it integrates well with your Jupyter Notebooks.

46:52 It is interactive by default.

46:54 So once you build one, it does all the JavaScript behind the scenes.

46:58 So you can hover over your plots and see the individual data points.

47:02 And it works really well.

47:04 Nice.

47:04 And if you need to do more in-depth modifications of it, you can do that as well.

47:10 So I think it strikes a nice balance.

47:12 There's certainly other options out there.

47:14 But what I think is best for someone is to pick one and stick with it a little bit and get a

47:20 feel for the API.

47:21 And then if you decide that that's not the right one, maybe look at some alternatives as well.

47:25 Yeah, a lot of these have galleries, right?

47:28 You can go into the gallery and go, does it make pictures kind of like I need yes or no?

47:32 And then you can decide whether to pay attention to it.

47:34 Yes.

47:35 And I think the way these libraries plot is so different from Excel.

47:40 It takes some time for you to kind of figure out how to fit it in your brain so that you can

47:46 use them effectively.

47:47 And so that's why I encourage someone to stick with it, get it, you know, apply it to your problem.

47:53 And it even goes back to our earlier point about getting the data.

47:56 So you have to have the data in the right format in the sufficient level of detail so that you

48:01 can do this plotting.

48:02 But once you get your mind around it, it's much faster to iterate through your visualizations

48:07 and really get some meaningful insight with just very few lines of code.

48:12 Yeah, absolutely.

48:13 The one I've been using for this kind of stuff lately is Altair.

48:16 It's been nice as well.

48:17 Yeah, I like Altair as well.

48:18 I mean, it's a great one.

48:19 Bokeh is good as well.

48:21 It's kind of hard.

48:22 It's like picking your favorite child.

48:24 You certainly don't want to go on record with that.

48:27 But if you pick one of those three, I don't think you're going to go wrong.

48:30 Yeah, absolutely.

48:31 All right.

48:32 Well, those are some really great tips.

48:33 Like I said, after seeing all these things in action, I'm like, I've been doing it wrong.

48:37 I've got to take some time and go and put some of this in practice and what I'm doing because

48:42 I've got Excel type things like everyone else, you know?

48:45 Yes.

48:45 And you, if you're using Google Sheets or other things where you want to pull in data,

48:49 I think that's really where you'd experience a lot of benefits.

48:53 And those are some of the other, you know, next level benefits of using Pandas as you have those

48:58 ability to bring in data pretty seamlessly and easily through APIs or other sources.

49:04 Yeah.

49:05 And it's worth also pointing out that we said you can read and write Excel files with

49:10 Python and that's great, but you can also read and write Google Sheets, right?

49:15 So if you need to deliver a Google Sheet to somebody or update an existing one, there are

49:19 APIs to connect to a Google Sheet and do similar things to it.

49:23 Right.

49:24 And it's all pretty straightforward.

49:26 And especially once you're using Pandas, it will either support Pandas or if it doesn't

49:30 support Pandas, it's going to be pretty easy for you to figure out how to get that Pandas

49:34 data into that format because Pandas is such a common tool that everybody uses.

49:39 Yeah.

49:39 Awesome.

49:40 All right.

49:40 Well, let's wrap up this tip section with a really quick thought.

49:44 You know, you've been going through this journey at a couple of companies, probably to varying

49:49 degrees of willingness or unwillingness with various participants in the loop.

49:54 What's your advice to people out there?

49:57 What's your experience been?

49:58 I think the thing that you have going for you now is the data science kind of revolution and

50:05 so many people understanding that data science is the thing, artificial intelligence, machine

50:11 learning, whatever the term is like it is for better or worse, a little bit of a buzzword.

50:15 People in your organization are going to know about it.

50:18 And I would say leverage that to say this Python tool, which can help us with data science

50:23 tasks, can also help us with some of these manual processes that we have.

50:27 And you can point to maybe some of your bigger enterprise applications that actually support

50:34 Python.

50:35 So I'll give one example like Tableau is a visualization tool that I think a lot of companies

50:40 have and they actually have bindings for Python.

50:43 And if you go to a Tableau user group, they will talk about using Python.

50:47 So it's not as foreign a concept anymore as it used to be.

50:50 So I think in general, people are going to be more receptive to talking about Python.

50:55 So frame it in those terms as this is kind of a data science industry standard that you want

51:02 to start learning and applying to your specific use cases.

51:07 And then I think the other thing I'd recommend is it is really hard.

51:12 I think you talked about this before.

51:14 People look at Python and Python seems so easy.

51:17 It is kind of like pseudocode.

51:21 And people say, okay, I understand it.

51:23 But getting them to take the next step to actually write code for themselves to solve their own

51:28 problem.

51:28 So I think trying to find a way to have a cohort of people.

51:31 Maybe you need two or three people.

51:33 You're going down this journey together.

51:35 So it's not just you.

51:36 It's coworkers or maybe cross-functioning across different departments where you hold each

51:42 other accountable to actually doing some of the work.

51:44 So you're learning about it.

51:46 You're applying it.

51:47 You're using this group to reinforce it and keep the momentum versus just getting caught

51:54 up in all the day-to-day work that squeezes out all the fun stuff you'd like to do in Python.

51:59 Yeah, for sure.

52:00 And one other tip I would throw in there is nothing solves the debates about whether this

52:06 is possible, whether this is a good idea, as just doing it, right?

52:11 So your example of we're going to spend 30 minutes to get all this data together and then

52:15 we're going to fill it out in Excel.

52:16 Like if you say, maybe if you're one of the high-end programmers of the company, it's not

52:21 going to give you much credence.

52:23 But if you're somebody who's not typically the developer and you go and say, look, this used

52:28 to be super painful and error-prone and monthly, now I wrote this code and it took me four hours

52:33 instead of two.

52:34 But now I just push the button and it's instant, always, right?

52:37 Everyone else would be like, wait a minute, that was horrible.

52:39 We can avoid horrible things like that, right?

52:41 Like play, I think playing to that angle as well, it's got a lot of value.

52:44 And you have to be able to be comfortable selling those wins, letting people know that, yes,

52:50 this is something we accomplished.

52:52 Here's something maybe behind the scenes, some data we scrubbed or manipulated.

52:56 And then nothing breeds success like success.

52:59 So once you prove you could do it, people start to come to you and say, hey, you know,

53:04 like I had some survey monkey data that I worked with to clean up and I didn't spend a lot of

53:11 time explaining how much work it was to clean it up.

53:13 I just did it.

53:14 But the next time a survey came out, you know, they came back to me and said, hey, could you

53:18 help with this?

53:19 And fortunately, I had those scripts.

53:20 So all that time and energy I spent there, I could just kind of repurpose and keep moving it

53:25 forward.

53:25 Yeah.

53:26 Cool, cool.

53:27 Two really quick things I just want to touch on while I've got you here.

53:30 You put together a really nice article on setting up a Python developer environment on Windows,

53:37 but using Linux.

53:38 I want to tell people real quick about that.

53:40 Sure.

53:41 So several years ago, Microsoft came out with something called Windows Subsystem for Linux,

53:47 WSL.

53:48 And it's essentially gives you kind of a lightweight kernel in Windows that you can then install

53:57 Ubuntu or a couple other different versions of Linux.

54:00 And you've got a full Linux environment running on your Windows system.

54:05 And it opens essentially instantaneously.

54:07 And performance is good, but also it integrates with Windows.

54:12 So when you want to use your file explorer to look at your files on the WSL system and copy

54:19 between your Windows system, it's easy to do.

54:21 You can even do things like this is really cool.

54:24 Like you run a Jupyter Notebook in WSL and it will open up your Chrome browser in your Windows

54:30 environment.

54:31 So they are much more tightly integrated than just like a normal virtual machine or a dual

54:37 boot system.

54:38 And a lot of this is on the Windows Subsystem 2, right?

54:41 Correct.

54:42 If you do enough research and you look back, people might complain that it doesn't do some

54:46 of these things.

54:46 But that was version one before it was kind of more integrated.

54:50 Correct.

54:51 Yeah.

54:51 There were some changes under the hood, especially with the file systems and the way they were

54:55 set up and how the access is done in speed.

54:58 So definitely anybody that's thinking about going down this path, use WSL 2.

55:02 And the other thing that Windows has done is I don't have any specific insight, but it sounds

55:06 like it's been very successful.

55:08 And I know Microsoft has even backported WSL 2 to some of the prior builds so that you don't

55:14 even have to be on the bleeding edge quite as much as you did in the past.

55:18 So I think they've really hit on something with being able to use a full Linux system

55:25 in Windows and using the new Windows Terminal, which, you know, it's a little surprising to

55:31 get excited about a terminal given how long they've been around.

55:34 No, it's not.

55:35 It was horrible.

55:35 But the Windows Terminal is really nice.

55:37 And if you've worked on a Linux system, you're going to feel right at home.

55:42 And it's really powerful.

55:43 Yeah, absolutely.

55:44 Now, I guess it's worth pointing out really quickly that you can do everything that you've

55:48 been talking about up to this point without Windows subsystem for Linux.

55:51 But it's just if you want to sometimes certain things work better on Linux and you don't have

55:56 to go create virtual machines and stuff if you don't want to.

55:58 Yes.

55:58 And I will say I do think Windows subsystem for Linux, I think you need administrative access

56:05 to get that all set up and running.

56:07 So that may be a little more difficult for people to do on their work computers.

56:11 But if you have a computer at home or you do have admin access, it's pretty cool.

56:16 Yeah.

56:17 A lot of power.

56:18 Yeah, for sure.

56:19 And then you have this other project called Side Table, which sounds like it makes displaying

56:24 information exactly like an Excel user who found their way over to Pandas might want to

56:29 do better.

56:29 Tell us about that project real quick.

56:30 Sure.

56:31 It's a simple library I put together, but I've actually found it really useful.

56:35 I use a lot of my day-to-day analysis just to get quick summary tables of my data.

56:42 So I think probably the more correct term is it's a frequency table, but it gives you quick

56:48 summary information if you just want to know on a new data set, like how much, how are sales

56:55 distributed by state or by region or by salesperson.

56:59 It's just kind of a quick one line command that will give you a nicely formatted summary

57:04 of your data.

57:05 It doesn't require anything else outside of Pandas, so it's easy to install and really

57:10 streamlined and kind of integrated with your data workflow.

57:13 And so I've really enjoyed working on that and I've enjoyed using it and hope others will

57:18 as well.

57:18 Yeah.

57:19 People can just check it out.

57:20 I'll put a link in the show notes and it's visually very clear why you'd want a better

57:24 presentation than some of the default Pandas output as it shows up.

57:28 And it's nice.

57:28 Yeah.

57:29 Cool.

57:29 Cool.

57:29 All right.

57:30 Well, before you get out of here, let me ask you the final two questions.

57:33 If you're going to write some Python code, maybe you're going to do this kind of work that you're

57:37 talking about.

57:37 What editor do you typically use?

57:39 So I'm changing my answer from the last time.

57:41 So last time I think I said I was sublime, but I am using Visual Studio Code now.

57:46 And back to the WSL point, it integrates really well with WSL as well.

57:51 Yeah, it does.

57:51 It runs everywhere.

57:53 So yeah, I really enjoy it.

57:55 I'd say my one knock on it is I don't spend enough time in it to actually memorize all the

58:00 shortcuts.

58:00 So I try and force myself to get more efficient with it every time I use it.

58:04 But I really enjoy using it.

58:06 Yeah.

58:06 There's a lot of stuff underneath the surface that you don't know that's there unless

58:09 you know to go look for it.

58:11 Yes.

58:11 Yeah, very cool.

58:12 And if you come from a sublime background or Atom, Visual Studio Code seems like the very

58:18 natural like, well, here's the modern version of that thing.

58:20 Yes.

58:21 Yeah, it's great.

58:22 And I do keep up with, you know, every month they give updates and they're really doing a

58:27 lot of improvements on Jupyter Notebook support.

58:29 And there are some pretty cool things you can do with that integration right now.

58:32 So I'm excited to see how that evolves over time as well.

58:35 Yeah, for sure.

58:36 All right.

58:37 Also notable PyPI package or Conda package.

58:40 If you prefer to view it through that lens.

58:42 But, you know, something cool that people maybe don't know about.

58:45 We could pip install side table as one of them.

58:47 That's certainly one.

58:48 The other one I'll go back to is we talked about Plotly Express.

58:52 There's a library called Kaleido.

58:55 I think is how you pronounce it.

58:56 That actually makes it easy to export your images, the graphical images you create with

59:02 Plotly to a PNG or an SVG file.

59:06 And I use that a ton just to generate some graphics, save them out and, you know, send

59:11 them in an email or put them in a PowerPoint.

59:13 And that's really high quality.

59:16 So sometimes I've done the copy and paste.

59:18 So you use the screen snipping.

59:20 It's not just the screenshot or whatever.

59:22 Yeah.

59:23 Exactly.

59:24 So it doesn't sound that revolutionary, but it's just one of those things that's really,

59:28 really handy to have and really useful and it works well.

59:31 Yeah.

59:31 Well, people spend a lot of time looking at presentations and it just, if it has those,

59:37 that fuzzy screenshotty feels like this isn't those kinds of things add up.

59:43 It's like having a good microphone when you're on Zoom or something like that, right?

59:46 Exactly.

59:47 You're never going to get criticized for not having it, but people just like to talk to

59:52 you more or it's a more pleasant experience if they don't have to strain to hear you.

59:55 Yeah.

59:55 That was a perfect analogy.

59:57 Yeah.

59:57 Thanks.

59:57 All right.

59:58 Well, final call to action.

59:59 People, Python's popularity has grown since the last time we spoke by quite a bit.

01:00:04 So the opportunity to use the Python data science tools are probably even more in their own

01:00:10 now than they were then.

01:00:11 What do you say?

01:00:12 Yeah, absolutely.

01:00:13 I think you kind of wonder when that curve is going to flatten out.

01:00:17 And I still think we have a long way to go.

01:00:19 And Python just has so much to offer to automate and improve processes where people only know Excel.

01:00:26 So I think try and encourage people to start taking that journey.

01:00:31 And if they're interested in the course, certainly would love to have them take the course.

01:00:35 So I think that's a good place to get started.

01:00:37 And I'd say maybe if you're advanced enough with Python and don't think you need the course,

01:00:43 maybe it's a good place for your co-workers to get started as well.

01:00:46 Yeah, absolutely.

01:00:47 The course is great.

01:00:47 Like I said, it inspired me.

01:00:48 We're also doing a free webcast in a couple of weeks, which is actually a month ago.

01:00:54 Because of time travel of podcast recording.

01:00:56 But just another point of how much this kind of stuff is wanted, right?

01:01:01 I sent out one email and one social media message.

01:01:04 And over almost 1,000 people, surely by the time it's finished, 1,000 people have signed up for this webcast.

01:01:09 Like, sign me up.

01:01:10 I want out of this Excel thing.

01:01:11 Yeah.

01:01:11 No, I'm really looking forward to that.

01:01:13 I think it's going to be great.

01:01:14 I mean, it really was great.

01:01:15 Yes.

01:01:16 It was great.

01:01:17 It went really well.

01:01:18 I'm sure that it did.

01:01:19 But I want to point out that it's recorded and I'll put a link in the show notes.

01:01:22 So whatever it is that Chris and I come up with for that, you'll be able to check that out as well.

01:01:27 So that'll be in the show notes.

01:01:28 Yeah.

01:01:28 Great.

01:01:29 Well, thanks a lot for having me.

01:01:30 I really enjoyed the talk.

01:01:31 Yeah.

01:01:32 Thanks for coming back and helping us out with this Excel thing and getting more Jupyter and more pandas and more Python.

01:01:38 Absolutely.

01:01:39 Thank you.

01:01:39 You bet.

01:01:40 Bye.

01:01:40 Bye.

01:01:41 This has been another episode of Talk Python to Me.

01:01:44 Our guest on this episode was Chris Moffitt, and it's been brought to you by Voyager, the video game, and Linode Cloud Hosting.

01:01:50 Voyager is a game developed and published by Tolib Consulting.

01:01:54 Travel the world in front of your computer and learn a whole lot.

01:01:58 Visit talkpython.fm/Voyager or just search for Voyager on Steam.

01:02:03 That's V-O-Y-A-G-E-R.

01:02:06 Start your next Python project on Linode's state-of-the-art cloud service.

01:02:11 Just visit talkpython.fm/Linode, L-I-N-O-D-E.

01:02:15 You'll automatically get a $20 credit when you create a new account.

01:02:18 Want to level up your Python?

01:02:20 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

01:02:25 Or if you're looking for something more advanced, check out our new async course that digs into all the different types of async programming you can do in Python.

01:02:33 And of course, if you're interested in more than one of these, be sure to check out our Everything Bundle.

01:02:38 It's like a subscription that never expires.

01:02:40 Be sure to subscribe to the show.

01:02:42 Open your favorite podcatcher and search for Python.

01:02:44 We should be right at the top.

01:02:46 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.

01:02:55 This is your host, Michael Kennedy.

01:02:57 Thanks so much for listening.

01:02:58 I really appreciate it.

01:02:59 Now get out there and write some Python code.

01:03:01 Thank you.

01:03:01 Thank you.

01:03:02 Thank you.

01:03:02 Thank you.

01:03:02 Thank you.

01:03:03 Thank you.

01:03:03 Thank you.

01:03:03 Bye.

01:03:04 Bye.

01:03:05 Bye.

01:03:06 Bye.

01:03:07 Bye.

01:03:08 Bye.

01:03:09 Bye.

01:03:10 Bye.

01:03:11 Bye.

01:03:12 Bye.

01:03:13 Bye.

01:03:14 Bye.

01:03:15 Bye.

01:03:16 Bye.

01:03:17 Bye.

01:03:18 you you Thank you.

01:03:21 Thank you.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon