Learn Python with Talk Python's 270 hours of courses

#200: Escaping Excel Hell with Python and Pandas Transcript

Recorded on Saturday, Jan 19, 2019.

00:00 Do you know or maybe work with people who abuse Excel?

00:03 Is it their hammer to pound all the computational problems that get in their way?

00:07 Well, join me to chat about this opportunity to bring Python deeper into their lives.

00:12 You'll meet Chris Moffitt, who runs Practical Business Python.

00:15 He works with lots of folks who could make better use of Python to solve their business problems

00:19 and has a ton of material on his website focusing on just that.

00:23 It's time to escape Excel hell with Python and Pandas.

00:27 This is Talk Python to Me, episode 200, recorded January 21st, 2019.

00:32 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

00:51 This is your host, Michael Kennedy.

00:53 Follow me on Twitter where I'm @mkennedy.

00:56 Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via at Talk Python.

01:02 This episode is brought to you by Linode and Stellar S.

01:05 Please check out what they're offering during their segments.

01:07 It really helps support the show.

01:08 Chris, welcome to Talk Python.

01:11 Hey, great to be here.

01:12 Yeah, it's great to have you here.

01:13 You've got some really interesting and amusing presentations around Excel.

01:17 And not doing bad things with Excel, but maybe using it when they should use something else like Python or Pandas.

01:25 Absolutely.

01:26 Yeah, yeah.

01:27 I mean, Excel in and of itself probably isn't bad or good, but it's like many things, the only tool that people have.

01:34 And sometimes it can be used in ways that cause a lot of pain in the future.

01:38 Yeah, it definitely can.

01:40 You know, if you think of what is the most widely used and deployed databases within companies, like on install per install, it's probably Excel, right?

01:49 Absolutely, absolutely.

01:50 Absolutely.

01:50 Absolutely.

01:50 And I think it's funny.

01:52 I've talked to other people in the Python community, maybe where they do more like Django development or really deep data science or testing.

02:01 And I don't think a lot of people understand how pervasive Excel is and how it's used for everything.

02:07 And the idea that you just replace Excel is really a non-starter at most organizations.

02:12 Yeah, absolutely.

02:13 It's pretty crazy.

02:14 And it's going to be fun to talk about, you know, taking people on that journey, maybe saying like, look, you could dip your toe in this programming world.

02:22 Like, you don't have to want to be a developer, but do you know how much better your life would be if you could learn like an if statement and a while loop and a few function calls?

02:30 Like, it would be a lot better, right?

02:31 Absolutely, absolutely.

02:32 And especially some of the things that power users do in Excel, it is programming.

02:37 I mean, they don't maybe think about it that way, but it is.

02:40 And I kind of talk about if you can write a nested if statement in Excel, you can probably program Python.

02:47 Yeah, because if statements in Excel are painful.

02:50 Before we get to that, yeah, before we get to that, because it all fits on one line with no organization or structure.

02:55 Exactly.

02:56 I've read some myself.

02:57 So before we get to that, though, let's just start with your story.

03:00 How did you get into programming and Python?

03:01 Sure.

03:02 So I grew up in the early 90s.

03:05 So my Apple IIc was first computer and studied, learned basic and actually learned Pascal in high school.

03:12 And then in college, I studied computer science and electrical engineering.

03:16 So I, you know, learned C, C++, kind of the traditional computer science curriculum at the time.

03:23 Really enjoyed it.

03:24 But when I graduated, I took a job where I didn't actually use any of it.

03:29 So some of those skills kind of atrophied a little bit.

03:33 And fortunately, throughout my career, as I've moved to other roles, I've been in typically a little bit more business type roles.

03:39 But I've been able to use my technical skills to try and solve problems.

03:43 So that's been kind of my evolution over the past many years that I've been doing this.

03:49 Interesting.

03:49 So what is your impressions and experiences of being this little more on the business side of things, but having that technical skill and looking at other people, trying to solve problems manually, copy paste, like no version.

04:03 I'm just like looking at them going, do you know how much easier this could be?

04:07 Like, what are you doing?

04:08 Right.

04:09 Is that sort of something that you encounter often?

04:11 Absolutely.

04:12 Yeah.

04:12 That's definitely one of the biggest challenges is you mentioned something like version control.

04:17 One of those things that is just once you start using it and get used to it, not having it is just extremely painful.

04:24 But most people who don't program really don't use version control.

04:28 They haven't used GitHub or Bitbucket or even SVN back in the day.

04:32 When you see how manual things are, how people try and rename files as the poor man's versions control, it is painful.

04:42 It is very painful.

04:43 And I've had to think a lot about how do you approach that problem and not just tell people, oh, there's a better way to do it and just scrap everything you're doing and move to my way.

04:54 Right.

04:54 How do you help them make a few steps?

04:56 So if renaming the files, having like Excel report May version 3 is like version 3 Kennedy or something like this, right?

05:05 Exactly.

05:06 Like a weird naming scheme.

05:07 If that is the version control, then Outlook probably is GitHub, right?

05:12 Like so you attach it to an email and it's like it's kept live in like this thread of email with the different versions in there, right?

05:19 Absolutely.

05:20 It's either Outlook or it's in the file share buried somewhere in multiple, you know, nested file subfolders that are impossible for anyone to actually find or understand.

05:31 Absolutely.

05:31 Yeah.

05:32 And, you know, I think a lot of people either work independently or they work at like small companies.

05:36 But the larger the company, the more this like the more people copied on those email threads.

05:41 There's like 20 people and they all have their little tweaks in versions.

05:44 And it's crazy, right?

05:46 It is.

05:46 And it's so frustrating because you want to just say, hey, can't we just put the data set in one place and we're all working off of it and using version control just like you would in an open source project that the concept just isn't there.

06:02 And I mean, there are some tools, some of the more recent Microsoft tools support it a little bit, but it's still not, at least in my experience, doesn't seem to be as common as you would expect.

06:12 Yeah.

06:12 A lot of times what the answer seems to be at these big organizations is like, well, this Outlook named thing as version control sucks.

06:20 So let's use SharePoint, which is like its own special type of hell.

06:25 Yes.

06:25 You know, I haven't used SharePoint a lot, mainly because I've heard, you know, it's its own special type of hell.

06:31 And I think now Microsoft Teams is the new thing, but I haven't used that enough to really figure it out because it is.

06:38 It's a lot of time and energy figuring out how to use it and then to get the people you're working with to use it as well.

06:45 Yeah.

06:45 Let me see if I can summarize my SharePoint experience.

06:47 Like, oh yeah, that's on the shared SharePoint project site.

06:50 Okay.

06:50 Where is that?

06:51 And eventually someone will send me a link and then I'll click on it and it'll say access denied.

06:57 And I'll say, oh, you never gave me access to that part of the project.

07:00 Oh, and they'll come back and then they'll give me access to it.

07:03 And then you can like check out the file, which downloads it and locks other people from checking it out.

07:08 But, you know, maybe someone's already checked it out and forgotten it.

07:11 So you can't get it.

07:12 It's just like, what are we doing?

07:14 Like, this is crazy, man.

07:15 Yeah.

07:16 So it's really frustrating those things.

07:18 Yeah.

07:18 Yeah.

07:18 I've had similar experiences as a user just trying to get it to work and it just didn't seem like it was worth the effort at the time.

07:26 Yeah, for sure.

07:27 So you've started Practical Business Python, this project to sort of help take people along, right?

07:34 Tell us a little bit about that.

07:35 Sure.

07:35 So I've always kind of, you know, had a little bit of that technical itch that I wanted to scratch.

07:42 And I did some Django development back in the day.

07:47 And after a while, for multiple reasons, I wasn't doing that as much.

07:50 And so I felt like, you know, I just didn't have that much going on.

07:55 And probably, I think in 2014, I decided, you know, I'd like to do something else to get that technical information out there, to keep my skills up, to share with others.

08:05 But I didn't want to embark on a huge open source project, something that I was going to have to maintain.

08:10 So I thought, you know what?

08:11 I'll do a blog, get this started.

08:13 And it's funny.

08:14 Originally, I thought I was going to spend more time talking about, like, Raspberry Pi, Internet of Things type work.

08:21 But as I started getting into it, that's when the whole data science thing, or I guess probably when I started learning more about data science, and then learned about Pandas and those tools and started applying them to the problems I had.

08:33 And it was just like, wow, this is really cool.

08:35 I got to learn a lot, got a lot of really positive feedback from folks.

08:38 So it was really exciting.

08:40 Yeah, that's awesome.

08:41 And Pandas is nice because it solves a lot of those problems automatically, right?

08:44 It'll read a CSV directly, and it'll kind of let you sort and things like that.

08:50 It has a lot of the features at the code level that other people might need at the, you know, Excel or GUI level, right?

08:58 Absolutely, absolutely.

09:00 And I think one of the things that really sold me on Pandas, I had done in my role, I identified one process that was just extremely manual, a lot of work that it would take to build this report essentially on a monthly basis.

09:15 And going back to, you know, when you only know one tool, everything looks or when you only know how to use a hammer, everything looks like a nail.

09:22 So I decided I was going to build my data processing thing using Django.

09:26 So Django admin and Django ORM, and I put it together.

09:31 And it worked.

09:33 I mean, but it was slow.

09:34 So I would do this, I would kick this process off, and it would take maybe an hour to run.

09:39 But it was an hour I could go to lunch, get coffee, whatever.

09:41 So it was no big deal, and it worked well.

09:44 Eventually, once I learned Pandas and got over that learning curve, and then I solved the same exact problem in Pandas on the same exact hardware, and it took like two minutes to run.

09:55 That's incredible.

09:56 So it's, you know, just Pandas because it uses NumPy and all the lower level functions written in C.

10:02 And if you do things right, it's pretty quick.

10:05 And so once I kind of had that experience, I'm like, okay, there's no reason we can't use some of these tools to solve some of the problems we have right now.

10:13 Yeah, that makes a lot of sense.

10:14 And, you know, Pandas, its origins, a lot of people think of it around data science.

10:17 So maybe they think machine learning, they think science, but it really comes from the financial industry.

10:22 And it's probably got its roots right from Excel, more or less, right?

10:28 Like right near there.

10:29 Exactly.

10:30 And if you start, you know, kind of looking around at Pandas, there's a lot of things around time series and grouping of data that are just really, like you said, really kind of from the finance world and are very applicable to a lot of the kind of common problems that people use Excel for in many large organizations.

10:47 Yeah.

10:48 So I was looking over your blog and all the articles you've written.

10:51 There's a ton of resources there.

10:53 It looks like you've been at this for quite a while.

10:55 And you said you use the Feynman learning technique, as in Richard Feynman, the physicist, right?

11:01 Yes.

11:02 What's the story there?

11:02 Yeah.

11:02 Well, I certainly didn't start off doing this, but this is something I don't remember when I first learned about it.

11:08 But basically, the idea is you take some sort of concept, you learn it, and then figure out how to teach it to someone that really doesn't know anything.

11:16 Now, Feynman talks about teaching it to a toddler.

11:19 Certainly, my articles aren't at the toddler level.

11:22 But by going through and actually writing a blog post and understanding it at the level of detail that you need to to write a post and put it out there on the internet, it really does force you to kind of have this feedback loop that he talks about where you think you understand something.

11:37 But once you try and put it in a blog post, you realize, ooh, I didn't understand it as well as I thought I did.

11:43 And so then you kind of loop back.

11:44 And by the time you're done actually writing a good post, what I find is, you know, I really understand the concept pretty well.

11:50 And that's what I try and do with the blog post is get it out there so that others can learn.

11:54 Yeah, it's interesting.

11:55 You see these polished blog posts or conference presentations or even online courses, and you're like, that person so deeply knows this stuff that they're teaching me this, that it's great I got this article or this video or whatever, but how am I ever going to reach that level?

12:13 And the reality is probably that person who did that presentation went on that journey as part of creating this thing, right?

12:20 Like they may have not known very much at the beginning, but they kind of applied this technique knowingly or unknowingly to just go like, I am interested.

12:29 I know a little.

12:30 I'm going to just dig in and like really put it together.

12:32 Absolutely.

12:33 And I think the funny thing is I was actually working on a post this morning where I'd written it a couple of years ago and the API, Pandas API has changed, so I need to update it.

12:43 And I like had to really go back and figure out what I had done.

12:46 And I know people have talked about this before about there's got to be some name for when you have a problem and you Google it and it's a blog post that you wrote and forgot about.

12:55 I mean, it happens to me all the time.

12:57 I was just thinking of that.

12:58 Like I've had that problem happen to me as well.

13:01 Where I'm like, gosh, I really don't remember how this works.

13:04 I know I did something with it a while ago and I Google it and the best answer is something that I've done.

13:09 I'm like, oh, I'm all hope is lost.

13:11 Yes, exactly.

13:13 If this is at the top and I don't remember it, we're kind of done here.

13:16 Yeah.

13:17 No, but it's a really good point.

13:18 I mean, I think this is a it's very easy for new users to get discouraged because it is a lot to learn.

13:25 And I think it's really important.

13:27 You do a great job on your show with the other guests and letting people know that we're all human.

13:32 We're all trying to learn.

13:33 And some of this stuff seems to slip out of your brain over time and you have to continually relearn it.

13:40 Right.

13:40 Well, at the same time, like so much stuff is changing that you have to eject some stuff to make room for it.

13:47 Right.

13:47 And it's a matter of can I go back and Google that thing or do I really need to force myself to remember it?

13:53 Right.

13:53 Like a lot of times it's like I have it written down or I have an example.

13:57 Let's kind of just sort of forget that and really focus on this new library.

14:01 Right.

14:01 Like I want to learn Altair for graphing and I'm just going to forget the Matlab plot live API for a while because, you know, this is what I want to focus on.

14:08 Right.

14:08 There's just so much new knowledge.

14:10 It's like this constant battle for cognition, basically.

14:13 Right.

14:13 Right.

14:13 Absolutely.

14:14 And I think at the end of the day, that's a lot of what I'm trying to do at the blog is not so much teach people to understand everything,

14:21 but it's almost teaching them what is possible.

14:23 So it's showing the tools that are out there and saying, here are the types of problems.

14:28 And here is a way to sort of present the problem, maybe in a way that will apply to what you're working on today.

14:35 Sure.

14:35 And so what are some of the more popular topics or things that have come across on the blog that people liked?

14:41 It's interesting.

14:42 So one of the things I've learned in writing the blog is it is very hard to gauge what is going to be popular when you publish it and what's going to live on through the magic of Google search engines.

14:53 And so some of the articles seem really popular in the beginning, but then hardly ever pop up in searches and don't get much traffic.

15:02 And then others that seem to maybe be a little less exciting are the ones that tend to pop up on Google.

15:10 So one of the most popular articles for a while was my pandas pivot table.

15:16 I really enjoyed that one.

15:17 I think that one was a good example where I felt like it was going to be popular.

15:21 People enjoyed it.

15:23 It was, I included, I think, a good summary of how to use the pandas pivot table.

15:26 And that was really popular.

15:28 The one that surprised me is the pandas data type article.

15:33 So I explained how to use the different pandas data types.

15:36 And that is one of my most popular Google search items right now.

15:40 So, you know, I just can't quite figure it out.

15:44 You know, sometimes I think about it and then other times I'm just like, well, you know, I'm going to write what I think people want.

15:49 And sometimes they're hit and sometimes they're not.

15:51 Yeah, it's a really tricky thing to try to guess that, right?

15:54 You just kind of got to throw it out there and let the world decide what they're interested in.

15:58 Because I've done that with blogging where I'm thought, I'll just throw this out, like whatever, I got to spare 20 minutes.

16:04 And other times I'll spend a whole day writing something really polished.

16:08 And I'm like, this is going to be the good one.

16:10 And, you know, it's like kind of like, meh, people don't care.

16:12 The one I threw together is like super popular.

16:14 I'm like, if I knew it was going to be popular, I would put more effort into it.

16:17 Same thing with videos on YouTube.

16:20 Same thing with even the Talk Python episodes, right?

16:24 Some are like, oh, this could be super popular.

16:26 I'm not sure people are going to love that one.

16:27 And it, you know, it's all over the map.

16:29 Yeah, absolutely.

16:30 And I think I've talked about this on some of your prior podcasts.

16:33 Like as technical people, we're not necessarily good marketers.

16:37 You know, we kind of expect all this content to stand on its own.

16:41 The world's a meritocracy.

16:42 If I make it well, it's just going to be fine.

16:43 And like, that's just, it would be cool if that were true.

16:46 It's not.

16:46 It's not.

16:47 It's not.

16:47 And so that's one of the things, you know, I don't do a whole lot of promoting old articles.

16:51 And I do wonder if, you know, should I do that?

16:54 Should I kind of remind people, hey, here's something I wrote two years ago that's still,

16:57 applicable.

16:58 It's not something I do, but I think it should be done to really kind of keep this stuff fresh

17:04 and remind people what's out there.

17:05 Right.

17:05 It's not a bad idea.

17:07 I mean, one of the ways to promote anything like a blog or courses or videos or whatever

17:12 is to just keep making new stuff that's exciting.

17:15 And some of it will like not be as exciting as you thought it would.

17:18 Some of it will be more and it'll draw people in.

17:20 And then they'll find like, just keeping your profile high will help your older stuff stay

17:26 relevant.

17:26 Right.

17:27 But there's also ways you could continue to promote them.

17:29 Right.

17:29 Like more explicitly for sure.

17:31 Sure.

17:32 And it is funny.

17:33 I mean, you kind of, I kind of forget, like once I've got an article out there and there's

17:37 kind of that initial bump for the first couple of weeks while people are discovering it.

17:41 And then I almost forget about it.

17:43 But it is amazing when someone will stumble upon an article that's three or four years

17:49 old and it solves the exact problem and they leave a comment or send me an email or reach

17:53 out and say, hey, Chris, that was awesome.

17:55 Thanks.

17:55 You saved me dozens of hours of time.

17:57 And so, you know, it feels good.

17:59 And it's good to remember that that's going on as well.

18:02 Yeah.

18:02 The long tail is definitely up.

18:04 Exactly.

18:04 Yeah.

18:05 So let's dig into a little bit of this Excel hell story, right?

18:10 Like you did a great presentation and we'll link to the slides, the PDF of the slides in

18:15 the show notes that you talk about, like basically what is Excel hell and what is Python and Pandas?

18:20 And it's a pretty good synopsis for folks who are presenting to a business audience.

18:25 Sure.

18:25 Excel hell in my mind is when you have a problem, it's a legitimate problem.

18:31 You're trying to solve at work in your business setting and you've solved it using Excel and

18:39 then you go back and try and figure out what you did.

18:42 And it's impossible to recreate what you've done or impossible to just like tweak it slightly

18:49 to use it for a new problem.

18:51 It's just the kind of like pain you feel once you open up something you worked on six months

18:57 ago in Excel and try and recreate it and understand what in the world did I do?

19:01 Where were my data sources?

19:04 What's the most recent version?

19:05 You know, what have I changed?

19:07 I mean, that in my mind is what Excel hell is.

19:10 Yeah.

19:10 Does it involve maybe like when you open up the worksheet that the app kind of goes like

19:16 gray or white?

19:16 Cause it's like not responding for a while cause something's happening in there.

19:20 Yeah.

19:20 I mean, there's certainly one where you're like, I think that's the challenge with Excel

19:25 is there are so many different ways you can interact with the data.

19:29 So you can have formulas, you can have VBA behind the scenes, you can have multiple tabs,

19:36 you can have linked workbooks.

19:38 And so when you open one, there's really no way to figure out what is going on.

19:42 How did the data get there?

19:43 What manipulations have been made?

19:45 How is this supposed to be used?

19:47 It's just it.

19:49 People underestimate how challenging it is to decipher the, you know, the complexities of

19:54 a big Excel workbook.

19:56 Yeah.

19:56 And the reason you get there is because these problems land in this like weird middle ground,

20:02 right?

20:02 Like they're not big enough to bring in like the software team to build a custom solution.

20:07 There's probably not a SAS solution that's like immediately obvious out there, but it's,

20:13 it's beyond just data, right?

20:15 It's like, it has to make some decisions.

20:17 And as soon as the conditionals start to appear in there, it starts to seem like, well, you're

20:23 taking that first step down that slope is like, now here's where the if statements are in the

20:28 table.

20:28 And you're like, okay, here we go.

20:30 Exactly.

20:30 Exactly.

20:31 And then there are so many cases of like when you're doing it and you like have a VLOOKUP

20:38 formula and you forget to copy it down or, you know, you have the ranges wrong.

20:42 I mean, there are just so many types of formulas, so many types of interactions that seem seductively

20:48 easy, but just set you up where it's so easy to make a mistake and so hard to troubleshoot

20:54 it when you do.

20:54 Yeah.

20:55 I mean, if you miscopied a formula, right?

20:57 Like it's, how do you look at that thing and go, yeah, okay.

21:00 Especially if it's spanning multiple sheets, like it can be a beast.

21:03 Exactly.

21:03 Exactly.

21:04 And then, you know, like we were talking about earlier, the whole version control mess.

21:08 I mean, there, there is no way to just do or no easy way to do a diff of the code and the

21:14 spreadsheets to understand what's changed and why did it change and very painful.

21:19 Right.

21:19 Certainly direct version control won't help you.

21:21 Right.

21:22 Cause it's like, it's, I think it's an XML file that is the source file of Excel, but it's

21:26 like zipped.

21:27 So you can't actually diff the code or you can't, it's like if they just left it unzipped effectively,

21:32 like at least you could diff it.

21:34 Right.

21:34 Right.

21:34 But no, yeah.

21:35 All hope is lost.

21:38 This portion of talk Python to me is brought to you by Linode.

21:41 Are you looking for hosting that's fast, simple, and incredibly affordable?

21:45 Well, look past that bookstore and check out Linode at talkpython.fm/Linode.

21:50 That's L I N O D E.

21:52 Plans start at just $5 a month for a dedicated server with a gig of RAM.

21:57 They have 10 data centers across the globe.

21:59 So no matter where you are or where your users are, there's a data center for you.

22:03 Whether you want to run a Python web app, host a private Git server, or just a file server,

22:07 you'll get native SSDs on all the machines, a newly upgraded 200 gigabit network, 24 seven

22:13 friendly support, even on holidays, and a seven day money back guarantee.

22:17 Need a little help with your infrastructure.

22:19 They even offer professional services to help you with architecture, migrations, and more.

22:23 Do you want a dedicated server for free for the next four months?

22:27 Just visit talkpython.fm/Linode.

22:29 And then, you know, and we talked about like coming from a programming background.

22:36 I mean, even if you just had a little bit of experience, the idea of an if statement, an

22:42 if then else, or a case statement, or switch statement, you know, or defining functions, all

22:47 is fairly straightforward.

22:49 But when you try and cram all of that into a single Excel cell, you know, trying to, like

22:55 we said, trying to do a nested if statement is just impossible.

22:58 I mean, it is very, very hard to do and understand it.

23:02 And I think anyone that's worked with Excel has opened up these spreadsheets and you click

23:06 on the tab or on a cell.

23:08 And there's just this massive formula with all these parentheses and all this kind of error

23:15 catching.

23:15 And you're like, how, how can I understand if this is even right?

23:19 How can I troubleshoot it?

23:21 It's yeah.

23:22 So that's Excel hell in my mind.

23:24 Sounds about right.

23:27 I think for the purpose of this discussion, it sounds like it's more or less the same situation

23:33 if like you're actually using Google Sheets or some other technically another tool than Excel,

23:37 but something Excel like, right?

23:39 Yes, absolutely.

23:40 I mean, I think there are some things that Google Sheets makes a little bit better.

23:44 I think, and I'm not a huge Google Sheets user, but it does kind of the versioning automatically

23:50 and seeing the history, I think is a little bit better.

23:53 And one of the things I think Google Sheets does better is it's easier to share.

23:57 So if two people are working in it at one time, that works better than Excel, where it just

24:03 locks the file and you have to yell at your coworker to get out, that kind of thing.

24:06 Yeah, yeah, exactly.

24:07 It is a little better there for sure, but it's also simpler.

24:11 So maybe you can't push it as far, right?

24:13 You can't get as far down that hole before you have to like scream for mercy.

24:17 Right, right.

24:18 Yeah.

24:18 I haven't dived into Google Sheets to see like how complex can you do the formulas?

24:23 I know it doesn't have VBA, but I'm sure you can program it behind the scenes.

24:27 I'm sure someone's done something pretty crazy, but I think just Excel is just so pervasive

24:35 and so misused in organizations.

24:37 Yeah, yeah, yeah.

24:38 Well, thankfully, both Excel and Google Sheets have Python packages that'll let you work on

24:44 them, which we'll get to.

24:45 Yes, exactly.

24:45 Exactly.

24:46 Yeah.

24:47 Let's start with the question of like, why is Python better than Excel, right?

24:51 Especially from the perspective of somebody who's like a power user, but they don't consider

24:56 themselves a programmer.

24:58 Like they wouldn't pick up code and think that that's okay for them.

25:00 Like that's for someone else, you know?

25:02 Right, right.

25:03 There's a couple of things.

25:04 One of them is Python.

25:06 And when you use these tools are going to encourage a little bit better practice.

25:11 So it's more reproducible.

25:13 So you can look at a Python script and even if it's the newest of newbies and it's maybe

25:20 not the most Pythonic, you can at least step through it and understand what's going on.

25:25 And if you've maybe put some comments in there or used a Jupyter notebook to capture some of

25:31 this information, there is like some reproducibility built into that.

25:36 And you can step through it and see what's happening.

25:39 And you can see, oh, I'm taking data A and data B and data C and I'm joining them together

25:45 and I'm cleaning them and I'm doing all these types of things that a program, it's going

25:49 to be fairly straightforward how this is happening in what order.

25:53 Whereas Excel just by its very nature doesn't do that.

25:57 It's super hard to even look in an Excel workbook and go, these are the parts where code is.

26:02 Like you can't even see the code.

26:04 You got to go touch every little cell to see, is this a formula?

26:07 Is this a value?

26:08 What other, it's like not linear, it's circular, it's cross sheet.

26:12 It's, it's a little bit like VB with go-to.

26:15 Yes.

26:15 You know?

26:16 Yeah, yeah, yeah, yeah, exactly.

26:18 It's a lot of go-to statements.

26:20 That's, that's a very good way to put it.

26:23 Like a, like a two-dimensional go-to.

26:25 Yeah, yeah, yeah, absolutely.

26:27 One of the other problems with Excel is I do think it lends itself to some bad practices.

26:33 So it is very easy in Excel to edit the data, right?

26:36 If you pull in some data and a cell has the wrong value, you can just go in there and type it and fix it.

26:42 Well, the problem is next time you pull the data, you don't know that you just fixed it.

26:47 It's all live.

26:48 So by having a program, you at least have to be more conscious in Python about changing a value.

26:56 You certainly can, but you have to make a very explicit choice to do that.

27:00 Right.

27:00 If you somehow pulled a data source and dropped it in there, but then you tweaked it or you, yeah, then that's right.

27:06 You're like, you never know.

27:07 There's no history or artifact that says that this happened.

27:09 Exactly, exactly.

27:10 And then I think, you know, outside of the just overall benefits of having a program that's a text file that you can version control and understand,

27:19 I think that the biggest benefit is you have access to all of the Python ecosystem.

27:25 So all of the libraries, everything that's out there, whether it's data science or a web scraping or, you know, any of the thousands of things you can do with Python, you actually have access to it.

27:37 Whereas even Excel, as you start to get into VBA, you can certainly do more.

27:44 But if you wanted to go parse an XML file or scrape a web page, you can probably do it.

27:51 But there's going to be a lot better ways to do it in Python.

27:54 Yeah, sure.

27:55 To try to calm object in C++ and distribute that, you'll be fine.

27:59 Yeah, exactly.

28:00 Exactly.

28:01 Yeah.

28:02 That just sounds horrible.

28:03 I don't want to go anywhere near that.

28:04 Yeah.

28:05 Yeah.

28:05 And obviously that's out of reach of the people who actually want to do those things, right?

28:08 Right.

28:08 So I think you're right.

28:10 Like the ability to acquire and keep the data up to date, right?

28:14 Like I'm going to use Beautiful Soup and Request to go grab this table off our internal intranet every day.

28:21 So when I make the report, it automatically starts with absolutely fresh data and things like that, right?

28:27 Right.

28:27 Absolutely.

28:27 And then my experience has been no matter how clean the data is, someone wants the report a little bit different or you need to apply some sort of business logic to clean it up.

28:38 And if you have been doing that by hand and then you forget to do it, everybody gets upset.

28:44 So then you can put it in Python, you can code it there and say, this translation always needs to happen and it's there.

28:50 And then people don't get upset when it's missing.

28:53 So there's just so many benefits to like having that repeatable record of what you've done with the data analysis.

29:02 Yeah.

29:02 We'll touch more on this later, but just having the ability to press a button and have the answer without going, what are the seven manual steps I got to do?

29:12 And I hope I don't forget them.

29:13 And, oh, I'm on vacation next week.

29:16 And Sarah, who's never done this, it's her job to do those seven steps and just even have access to like, oh, my goodness.

29:22 Right.

29:22 Like, so just having this repeatable alone is pretty valuable.

29:26 Absolutely.

29:27 Absolutely.

29:27 And then the other thing I'd say is I think it is interesting that Excel, as powerful as it is, it's pretty easy to get a data set that just doesn't work in Excel.

29:37 So if you get a couple hundred thousand rows in Excel starting to try and manipulate it, it starts to slow down.

29:44 It'll bog down your system.

29:45 Whereas if you read that data into Pandas and do the manipulations, it's pretty fast.

29:51 As long as you can fit it in memory, you can manipulate a lot of data in Pandas that would be difficult to manipulate in Excel.

29:58 Yeah, that's a good point.

29:59 And there's limits, actual limits to how much data Excel will accept, like a million rows or something, right?

30:04 Yes, absolutely.

30:06 And, you know, it does keep going up and you can have more rows and more columns, but it doesn't mean you can actually work with it when it's in there.

30:13 Yeah, that's a different story also.

30:16 But, yeah, cool.

30:18 And then also, it's like you say, it's a proper language, right?

30:21 If you learn to work with Excel, you can make really complicated Excel sheets.

30:26 You learn to work with Python, all of a sudden, you start to see these other problems like, oh, I always copy these files from here to there and I always make these little edits to them.

30:34 Or I always, you know, have to grab this data and transform it that way.

30:37 And, like, it's sort of a good way to draw them into this automate your world space, right?

30:45 Absolutely, absolutely.

30:46 And then that's the thing that is nice about it is the internet loves to have language wars and we could fight about whether the Python is really the best language for the task.

30:56 But in most places, it's going to be a pretty good choice.

30:59 It may not be the best.

31:00 So, if you have learned Python and then you expand and start doing something a little bit more than just the Excel manipulations you're doing, you're not going to have to relearn a bunch of things.

31:09 You're not going to be going down a path where you've got to make a huge course correction at some point in your career if you decide to continue developing that skill.

31:18 The hard part about learning programming is not learning the Python syntax, right?

31:23 Like, that's not so hard.

31:24 It's like, what is a loop?

31:25 When do I use loops?

31:27 What is a conditional?

31:27 What is a data type?

31:29 What data types are?

31:30 Like, that's basically the same across all the languages, right?

31:33 Exactly.

31:33 It's just like, I need curly braces and parentheses and why are there semicolons everywhere, right?

31:37 But still, like, it's no big deal, right?

31:39 Yeah.

31:39 Yeah, absolutely.

31:40 Absolutely.

31:40 Yeah, interesting.

31:41 So, one of the things that I keep repeating is we don't need 100 million software developers, but we do need people who can do whatever they do better, right?

31:51 And I feel like software development, a little bit of it, like the kind you're talking about, can really amplify whatever you're good at anyway, right?

32:00 So, from this context, you know, maybe who people are listening, they're probably developers, but they probably work at companies where they get asked to help with Excel sheets or they have these automation problems that, like, start from Excel.

32:13 Who should they go start going, I know I could write this for you, but let me help you learn to automate your world a little bit.

32:20 Like, what type of folks should be reaching out to?

32:23 So, I feel like there is, anytime you have a group of people that use Excel, and I don't know what the number is, but five to ten people, it seems like there's always someone that's like the Excel super user.

32:34 So, it may be...

32:36 Probably there's color, a lot of color in there, some formulas, right?

32:39 Exactly, exactly.

32:40 Now, definition of super user is going to vary.

32:42 I mean, sometimes it could be someone that is really good at some pretty complex VBA.

32:47 It could just be the person that understands how to put a pivot table together.

32:51 But I think in any group where you have people that are doing repeatable type processes, there is someone that is the de facto expert.

33:01 And that is the person that I would target and kind of say, okay, let's take a look at what you're doing here and see if there's an alternative to use this tool called Python to improve this process.

33:15 You raise a really good point.

33:17 And I think one of the big challenges here is these people aren't going to say, I want to learn programming.

33:25 To them, it's like, I can't learn programming.

33:27 I need to go to a university and get a four-year degree.

33:31 And I would encourage the people that are listening, if they do have that Python knowledge, to kind of help ease them into that and say, you know what, you don't have to be a computer scientist to do this.

33:42 Let me show you some of the simple things that you can do in Python to help with this process and get them started.

33:49 And so, yeah, focusing on those kind of natural experts in the group.

33:53 Yeah, that makes a lot of sense.

33:54 And the fact that Python is not a compiled to machine instructions with a compiler and linker type of language means you don't have to drag them through the sort of software, computer science-y tool chain.

34:07 You can just say you write this in a text file, preferably in a proper editor that helps you with spaces.

34:13 Exactly.

34:13 And autocomplete.

34:14 And then you're kind of good, right?

34:15 Yeah.

34:16 You raise a good point.

34:17 I think these, if you are a Python expert, you should be able to help get Python set up on their system.

34:24 And I know you've talked about this before.

34:26 It is much easier than it used to be back in the past, but it's still a non-trivial activity for someone that's not comfortable with this.

34:35 I mean, I've had experiences where I've deployed some Python scripts and told people, okay, open a command prompt on your Windows system and type Python file name.

34:44 And they, you know, it just kind of blows their mind a little bit.

34:47 They're just not used to that.

34:49 Most people don't, you know, type commands on their laptop.

34:53 So that is a, exactly.

34:56 Like they maybe don't know what the command prompt is at all.

34:58 Exactly.

34:59 It's very possible.

34:59 Exactly.

35:00 And when you're talking Excel and you're talking business users and you're intersecting those two things, there's a good chance that that intersection lands on Windows and not Mac.

35:09 It's very unlikely it's on Linux, right?

35:11 Absolutely.

35:11 Yes.

35:12 This portion of Talk Python to Me is brought to you by Stellarress, the AI-powered talent agent for top tech talent.

35:20 Hate your job or feeling just kind of meh about it?

35:23 Stellarress will help you find a new job you'll actually be excited to go to.

35:27 Stellarress knows that a job is much more than just how it sounds in a job description.

35:32 So they built their AI-powered talent agent to help you find the ideal job.

35:36 Stellarress does all the work and screening for you, scouting out the best companies and roles and introducing you to opportunities outside your network that you wouldn't have otherwise found.

35:46 Combining deep AI matching with human support, Stellarress pairs things down to a maximum of five opportunities that tightly match your goals, like compensation, work-life balance, working on products you're passionate about, and team chemistry.

35:59 They then facilitate warm intros, and there's never any pressure, just opportunities to explore what's out there.

36:05 To get started and find a job that's just right for you, visit talkpython.fm/Stellarress.

36:11 That's talkpython.fm/S-T-E-L-L-A-R-E-S.

36:17 Or just click the link in your show notes in your podcast player.

36:19 What are the recommendations for making that work?

36:24 Like Anaconda, things like this, or what do you suggest there?

36:27 I'm a big fan of the Anaconda distribution.

36:29 I tend to use Miniconda, so that's the more stripped-down version that has just not all of the libraries.

36:36 Getting that installed on the system, it doesn't take as much space.

36:39 And then adding in Pandas and all the other Jupyter notebooks and pieces that you need.

36:45 My experience has been that it works really well on Windows, and that's probably the easiest way to get started.

36:52 Python on Windows is getting better, especially if they have Windows 10, right?

36:56 There's some pretty good stuff that's happening.

36:57 Like you heard about Microsoft embracing Python more, right?

37:01 We had Steve Dower on the show recently.

37:03 He's one of the core devs at Microsoft.

37:05 And he talked about how he got Python 3 in the Windows store, which I think is a big deal for business users.

37:12 Absolutely.

37:13 And then, you know, the other challenge, you've got this challenge of how do you get the users, but then there is kind of this internal corporate IT challenge some places where laptops could be pretty locked down, and you don't have administrative rights, and you can't actually install programs.

37:30 I really am excited to see Microsoft's embrace of Python.

37:35 And, you know, it's really funny to me.

37:38 I think of Microsoft, you know, 20 years ago, and it was the evil empire and embrace and extend and extinguish.

37:43 But now, I mean, they're really doing some great things.

37:45 And what they're doing with Python, I think, is really good.

37:48 And I think that really plays it well for these types of roles where people are using Excel and now Python, I think, truly is a legitimate option for them.

37:57 Yeah, I do, too.

37:58 And I think a lot of the lack of requiring admin rights, like, say, if you go in and you change your path or even just like techie, there's not even necessarily an admin operation, right?

38:09 You could do that for your user profile, but go to the Windows Store, type Python, click that, and it's already in your path.

38:15 You've already got Python, and Python 3 is a command, right?

38:17 Like, that's another hurdle that they erased there.

38:20 Exactly.

38:21 It's pretty excellent.

38:22 If the IT department starts to get upset about it, you can at least say, well, it is in the Windows Store.

38:26 It's not like something that is just wild and free on the internet.

38:30 Right.

38:31 It's like, I didn't get that from some weird, like, mirroring site of, like, an open source link.

38:37 And it looks super sketchy, but it is linked off of the main project site.

38:41 So I'm going to kind of trust, like, it's not that, right?

38:43 Exactly.

38:43 It's not like trying to download Putty from some weird place or social course or something.

38:47 And also, it runs with low privileges, right?

38:50 It can't touch the registry.

38:51 It can't touch the shared file areas, like the one that comes from the store, the Python from the store.

38:56 So that's pretty good.

38:58 Yes, absolutely.

38:59 Yeah.

38:59 But that said, that's a CPython pip world.

39:02 That's not a Miniconda, Anaconda world.

39:04 Like, would you still recommend people go down the Miniconda side for now?

39:07 In my experience, you can install Miniconda without admin, right?

39:11 So you install just as the local user.

39:13 You know, I don't understand under the hood what the real difference is between that one and the Windows Store version.

39:19 But I do know, like, my laptop is fairly well locked down.

39:23 And I can install it and run.

39:25 And it does everything I need.

39:27 I haven't had to mess with the registry or any of the, you know, private Windows files.

39:31 But for doing data analysis, replacing Excel files, it seems to work just fine.

39:35 Yeah, for sure.

39:36 So you're talking to these power users, and probably a lot of them, if they're Excel power users, it's very likely that they're kind of higher up in the organization.

39:45 You know, how much of the conversation goes something like, oh, I should use Python?

39:50 How much is that going to cost us for some corporate licenses to Python?

39:53 And, oh, wait, it's open source.

39:55 Should we be using open source?

39:56 Like, is this a conversation that you have to have, or people just kind of brush it off?

39:59 So I think it is a conversation you have to have.

40:03 And I would say it depends on the organization and your role in the organization, how comfortable you are pushing the boundaries.

40:10 So if you feel like, you know, if I do this, people are going to be okay with it.

40:15 But if you're uncomfortable, you know, you definitely need to make sure you talk to the powers that be so that they can provide a little air cover if someone does get upset about bringing in this new Python tool into their ecosystem.

40:27 Yeah, interesting.

40:28 So I think it does help a lot that it's in the Windows store.

40:31 And then we have Anaconda Inc.

40:32 It's like official backers of sort of their distribution.

40:35 And it gives it some legitimacy, I would say.

40:37 Exactly, exactly.

40:39 Yeah, all right.

40:39 So let's suppose the listeners out there are like, I have this Python knowledge.

40:43 I don't want to be asked to build another forms over a data app.

40:46 I want to help these people do their world without me helping them out.

40:49 You know, I want to teach them to fish, basically.

40:51 How should they get started, right?

40:52 Like, they sit down on Tuesday and they say, hey, let me show you something.

40:56 Let me get started.

40:57 Like, what do you think?

40:58 Yeah.

40:58 Some steps there.

40:59 Sure.

41:00 It is a hard part because there's not just the one place to go to say, hey, here's how you learn Python and the appropriate tools to solve your problem.

41:09 So what I think works best is people might have the natural reaction.

41:14 I want to focus on this really painful problem I have.

41:17 It's a really big problem.

41:18 It takes a lot of my time.

41:19 It's very error prone, whatever.

41:21 I would say actually start with something that's maybe a little simpler, that's very well understood, and try and automate a simple process and just get the basics down before you move on to the more complex stuff.

41:36 So maybe there is a process where you're moving some files around or opening up Excel and copying and pasting a few rows or columns from one to another.

41:47 Start with that and just kind of get that going and show them how to set up on their system, how to read in a file, how to make a couple changes with the data frame, with Panda's data frame, save it back out, and just start there.

42:02 And try and do something in 10 to 20 lines of Python code to get the ball rolling.

42:09 Yeah, that makes a lot of sense.

42:10 And I do think it's these little, almost death by a thousand paper cuts type of problems that if you can just solve some small to medium ones, but they become repeatable.

42:21 If they could be scheduled, if they could become automatic.

42:24 So you just had an email when you came in on Monday instead of you doing the report and sending it to the people who need it.

42:30 It just happens automatically and you get a copy on Monday.

42:33 Those kinds of things are pretty easy to attack.

42:36 But I think a couple of those and then the joy would start to spread.

42:40 You're like, you know what?

42:40 This is awesome.

42:41 What else can I do with this?

42:42 Right?

42:43 Exactly.

42:43 Exactly.

42:44 And then what I found too is sometimes maybe there's a report that you do once a week, but it's automated now so you could do it daily.

42:52 And suddenly people are like, oh, wow, daily.

42:55 Yeah, this is really useful.

42:56 So it's not just the benefits of time savings.

43:00 It's the benefit of like now it's easy for me to do, so I'll do it.

43:04 Whereas before, if it took me several hours to pull something together, I'm probably not going to be willing to do that every day just because I don't have the time.

43:12 So I think there are a lot of unintended benefits of doing automation.

43:17 For sure.

43:17 And like the next step from there is on demand.

43:20 Exactly.

43:21 Right.

43:21 Like you want to know what the weekly sales rate versus last week were store by store?

43:28 You just click here and then you have it in a minute.

43:30 You know?

43:30 Exactly.

43:31 Exactly.

43:31 And you're right.

43:32 I mean, you kind of get this momentum.

43:34 It builds on itself.

43:35 So once you've taken something simple and gotten started, then, oh, I want to tweak it a little bit.

43:41 That's where you also see the value is.

43:43 I'm not starting from scratch.

43:44 I can take what's already there and maybe just add some additional code or use that as a basis for a new process.

43:51 It just starts to build and you kind of get this library of tools and ways to manipulate the data and report the data out that just really, you know, accelerate your speed to get things done.

44:03 Yeah.

44:03 It's obvious to us as programmers, but if you always have to get the data from one place and maybe you have to do different stuff with it, once you write that part to get the data once, like that is done, right?

44:12 That is a solved problem.

44:13 Now you build on top of that capability, right?

44:16 And I'm sure a lot of folks that were addressing, you know, like talking about helping them here, they just don't have that mindset of if I solve it in one way, that solution is reproducible and reusable, right?

44:26 Right.

44:26 And that is a challenge, right?

44:28 So anyone that's learning this new tool, they're going to be slower in the beginning and they're going to have to invest time.

44:36 And, you know, in my experience, I've had to learn a lot of this off hours.

44:41 It's not something that you just can get done in an eight hour day.

44:45 And so there is a little bit of a targeting of that individual.

44:48 Is this individual at least open to trying to learn some on their own so we can point them in the right direction and they can learn about Python?

44:56 They can learn about these tools, but are they going to be able to follow up on their own or are they just going to say, I need you to do it for me?

45:03 If that's the case, they're probably not the best candidate for this.

45:06 Right.

45:06 Right.

45:07 I mean, it seems to me like if you're doing some sort of four hour task weekly and there's somebody who's going to help you, they could sit down and say,

45:14 instead of doing that manually, why don't we spend those four hours on Monday, build it together?

45:20 I mean, this is not me offering to be your private software developer, but this is me helping you down this path.

45:26 I'm sure that we could do this in four hours.

45:28 Let's sit down and do it together.

45:29 So you understand it and then you can take it on and sort of keep it going.

45:33 You know, I think things like that may work.

45:35 Yeah, absolutely.

45:36 And then, you know, one of the things that I've wanted to do, but I haven't really done a whole lot of is, you know, what kind of user groups can you set up in your company so that you have some of these resource, peer resources to help them work through the process?

45:51 Your podcast about the Apple Python training was really, really interesting and, I mean, certainly a much larger scale than what I'm talking about.

45:59 But I think that that would be another option is to try and get four or five dozen people like-minded individuals together and over the lunch hour start to introduce these concepts and build a community where they can learn and share their learning.

46:13 That's a great idea to kind of have all these Excel power users who are escaping Excel through Python start to come together and give a little presentations like, yeah, last week I did this automation and let me just talk to you about how it worked and what I had to overcome.

46:27 And yeah, that'd be great.

46:28 Exactly.

46:28 Yeah.

46:29 And because what I think will happen is there are a lot of common challenges.

46:33 You know, how do I get access to this database or how do I massage the data in a certain way that is probably fairly common in an organization?

46:43 So if you can share that, you know, or like simple things like I've got to do a lot of PDF file manipulations.

46:50 Okay.

46:50 What library are you using?

46:52 What are some of the ways to do that?

46:54 I think there's a ton of those types of leverageable problems across an organization.

46:59 As soon as someone finds the solution to getting that table out of that PDF document, other people are like, I have to copy it out of the PDF document.

47:07 What are you doing?

47:07 Right?

47:07 This is cool.

47:08 Exactly.

47:09 Exactly.

47:10 Yeah.

47:10 I guess you probably got to set expectations as well, right?

47:14 Like you can't say like, look, I know you've got super cool apps on your phone and you've been to Airbnb.com.

47:20 This is not what we're talking about.

47:23 What we're talking about is a few steps reproducibly really quick that you don't like just setting the expectations of you can't start by building, you know, like some kind of super glamorous app.

47:34 Absolutely.

47:35 It's a really good point.

47:36 And that is one of the challenges is the, as much as we rail on Excel and maybe access, which we haven't talked about, but similar sort of thing that, you know, it's still really intuitive for someone new to pick it up and do something.

47:52 And, you know, you can get results and it looks pretty, like you said, whereas are people going to be impressed that I ask you to open up the command prompt and type Python, this, this, and this, and then a file is output somewhere.

48:05 I mean, it doesn't have that wow factor that you would with more of a, you know, a GUI type application.

48:12 And so I agree a hundred percent that you need to be able to get people comfortable with what you really can and can't do.

48:18 Right.

48:18 Right.

48:19 So it feels like success, not failure when it's working.

48:22 Right.

48:22 Exactly.

48:23 Exactly.

48:24 I think there are a few simple things you could do to help.

48:27 Like right now, as our discussion is gone, like the solution is I go to the command prompt, I type Python, maybe Python through space, the script that does the magic.

48:37 Maybe I set up a windows task scheduler that does that for, I don't know, some, some way that that thing is running.

48:43 Right.

48:43 But it's basically a command CLI type of experience.

48:46 You know, you could use something like GUI, G-O-O-E-Y, which is a really simple way to turn a CLI interface into a GUI cross-platform interface, like a GUI GUI.

48:57 Yep.

48:57 And the other one that you brought up that's pretty straightforward is Jupyter Notebooks.

49:02 Yes.

49:02 Yes.

49:03 Right.

49:03 Do you think that helps a little bit with the acceptance layer?

49:06 Like you can visualize a graph, right?

49:07 You could do a bokeh or matplotlib thing.

49:10 Absolutely.

49:11 And it's interesting you mentioned GUI, which, you know, is hard to talk about, but when you see it, you understand what it is.

49:18 But I wrote an article about it and I've used it a lot.

49:22 And it is, I think it's a fabulous tool for people that have, you kind of have this command line and trying to teach people how to use a command line can be frustrating.

49:31 But GUI, you can just easily put that on top of it.

49:34 And especially where it's like selecting files, selecting dates, selecting a couple of values from a dropdown.

49:40 Got some combo box, exactly.

49:42 The dropdowns.

49:43 Yeah.

49:43 Exactly.

49:43 It's so easy, right?

49:44 It is.

49:45 And the other thing is it's WX Windows, so it looks native.

49:49 You know, it's not an ugly GUI.

49:52 And I personally use that when I put some things together and people just know how to use it.

49:58 And they don't really care what you did, just that, oh, it looks like I would expect a Windows app to work.

50:03 So that's great.

50:04 Yeah.

50:04 The fact that they have a thing, they double click.

50:07 If you use Pi Installer or Pi Oxidizer or something, you get like an EXE or a .app file.

50:12 You just, you put it in your doc or your taskbar, you double click it.

50:15 Like all of a sudden, it's almost a first class citizen in your world.

50:18 Exactly.

50:19 Yeah.

50:19 Yeah.

50:20 Totally.

50:21 And then Jupyter Notebooks, I think, are certainly another option.

50:24 I don't have experience with like building a Jupyter Notebook hub that you would share out with everybody.

50:30 But I do agree that having that where you tell people, okay, run the notebook, go here in your browser, use some of the widgets to populate some information.

50:37 I think that's absolutely a valid option.

50:40 Yeah.

50:40 It seems like that would really help with the acceptance.

50:44 You're like, okay, this is not so scary.

50:45 It's not the terminal or command prompt, which I know in reality, it's like not that big of a deal.

50:52 But psychologically, it's a massive deal for some people, right?

50:56 Like they're just like, oh, forget that.

50:58 This is not the way we work here.

51:00 This is not the way.

51:01 Exactly.

51:01 And actually, where I see Jupyter Notebook, what I have been trying to do is almost like when I have a data challenge.

51:10 So I get an Excel file or I need to do a project.

51:12 And I wrote about it on my blog.

51:14 I have a cookie cutter that creates like a couple standard Jupyter Notebook directory or directory of some Jupyter Notebooks, stubs, and my in and out directories and the raw data file.

51:24 And so it's a consistent methodology.

51:28 So I know if I go back six months later and open it up that I have kind of that Jupyter Notebook history of what I've done and all the files are where I expected.

51:37 And I put notes in there about where the files come from and what am I trying to accomplish and, you know, all those kinds of things.

51:43 And I think Jupyter Notebook could really be useful for that type of analysis.

51:48 Instead of in my perfect world, instead of reaching for Excel every time you need to do analysis, hey, I'm just going to use a Jupyter Notebook to start looking at the data and treating it, you know, almost like a data science problem versus I'm just going to, you know, hack together something in Excel.

52:03 That's really cool.

52:04 I love the cookie cutter idea as well.

52:05 It's nice.

52:06 Yeah.

52:06 And it's really worked out well.

52:08 Once I've started to use it and now I've been using it for a few months and go back and look at projects where I've used the cookie cutter strategy.

52:16 I'm like, oh, yeah, this is really cool.

52:17 I kind of know where everything is and I know what to expect.

52:21 And I can then create that feedback loop where there are maybe snippets of code that I want to include in future ones.

52:29 So you can just kind of keep populating that cookie cutter with the most recent code.

52:34 That's right.

52:34 You take your learnings from the last one and you kind of make the project a little bit better, a little bit better, a little bit better.

52:39 Exactly.

52:40 You know, and the other thing we talked a lot about Excel, one of the things that I think is I've, you know, started to learn more about data science and use some of these tools.

52:51 And there is a different approach to looking at data when you look at it almost from this data science approach versus looking at it from Excel and the data science approach of you do exploratory data analysis on the data set.

53:04 And you start to learn things that are easy to do with the pandas data frame that are maybe hard to do with Excel.

53:09 So just simple things about like, where are my null values?

53:13 Where do I have duplicates?

53:15 Where, you know, where is maybe the data not as clean as I would expect?

53:18 You can kind of go through that as part of that process that I think is maybe a little more robust, a little more natural for data analysis than just trying to open up an Excel file and do a bunch of auto filters and pivot tables.

53:32 Yeah, and it doesn't matter how big the data is or how wide it is.

53:35 Whereas in Excel, it's like, you kind of got to live visually in the data.

53:39 You know what I mean?

53:40 You're like, you got to have your charts and stuff swim there rather than it's fine.

53:45 It has 200 columns.

53:46 You know, we load it and it's just like you show it below where the data went, right?

53:49 It's all good.

53:49 Yeah, exactly.

53:50 And I think it also starts to, you start to learn the data science concept.

53:54 So the concept of tidy data, you know, is certainly something I never really understood just as an Excel user.

54:01 Yeah, define that for us.

54:02 So essentially, I knew you were going to ask me that.

54:05 Now, I should have looked up a better definition.

54:07 But instead of when you have a row of data, it's like a single observation in that row versus having multiple columns of data.

54:17 So think about like if you're observing cars passing on the street, you would have a car and then maybe you'd have the make and the model versus multiple columns with all the different makes and models and yes, no's in each column.

54:31 And the reason it's important is a lot of these data science tools then make it very easy to manipulate and more importantly, visualize the data when it's in tidy format.

54:42 And once you get it in that format, then suddenly the opportunities for visualization that are available to you are really powerful that you just can't do in Excel very well.

54:52 Yeah, that's cool.

54:53 I see a lot of interesting things sort of coming from this, right?

54:56 Somebody moves a little bit out of Excel, they kind of get in with Jupyter, they kind of learn Pandas.

55:02 And all of a sudden, they start thinking, you know what, how hard would it be to train a machine learning model to predict rather than just a report, right?

55:11 Exactly.

55:11 There's just a few little steps.

55:13 Yeah.

55:13 Those things are not that hard, but you need tidy data and you need like good, got to get the right factors or aspects of your data you want to feed to the model and all that kind of stuff, right?

55:22 Absolutely.

55:22 And you know, one of the things that I put in the notes for us to talk about was Facebook's profit tool.

55:29 Yeah, tell us about that.

55:30 I think this is a really interesting example of a really powerful tool that lots of analysts need.

55:38 And if you just learn a little bit of Python, you can do some really cool things.

55:43 So the project for those not familiar with this, it's an open source project that Facebook put out, I don't know, a couple years ago.

55:50 And it's a tool for making predictions about time series data.

55:56 So think about if you wanted to predict, like how many hits are you going to have on your website in the future?

56:02 Or, you know, their example is about hits on Wikipedia hits on Peyton Manning is the thing that they walk through.

56:09 But I think any organization is going to have a lot of, can you predict the future?

56:14 Can you predict what we need to manufacture?

56:16 Can you predict what our sales are going to be like?

56:18 Can you predict what this marketing campaign is going to do?

56:21 Yeah.

56:21 Imagine you were the person who had that answer.

56:23 Yeah, exactly.

56:25 Right?

56:25 You would just be like, you're the magician.

56:27 Exactly.

56:28 And what I think, what I suspect happens is, you know, most big organizations, maybe in supply chain, you do have someone that is really smart on this type of prediction.

56:38 But you have to predict all across the organization.

56:42 And so what do most people do?

56:43 You know, you plot out your history and maybe you kind of do some sort of regression line and kind of make a guess.

56:49 But if you start to look at all the different mathematical options you have for predicting time series data, it's mind boggling, right?

56:57 Like, how could you figure all that out?

56:59 So Facebook essentially had this problem.

57:01 So they built this library where they put some fairly sophisticated math behind it, but a very simple API where you can only change a few variables and get some really nice predictions and some really nice visibility to the trends in your data.

57:19 So I've been really impressed with it and think it's a really cool tool because it strikes a really nice balance of power, but easy to use for someone that's not a, you know, PhD math professor.

57:33 Yeah, that's cool.

57:33 And it seems like it's a good fit for the people we're talking about in the show.

57:37 Yeah, absolutely.

57:37 Yeah.

57:38 Nice.

57:38 All right.

57:39 We're getting a little short on time, but I do want to ask you a few more things real quick before we go.

57:42 Let's close this out.

57:44 Like we talked about all the benefits.

57:45 We talked about all the good things here, but maybe let's just talk on a few areas on what Python needs to do better to better take advantage of this group and just go farther.

57:55 Sure.

57:56 So a couple of things.

57:57 So we did have a good discussion about should we use Miniconda?

58:01 Should we use Python from the Windows Store?

58:03 You know, I think that is still a good area for us to figure out, you know, what is the preferred way to get Python out and then install it on a system?

58:11 And then more importantly, how do you keep it updated?

58:14 There's still some kind of wonkiness between installing with Conda and Pip, and I've been bitten by it before where suddenly I'm installing with pip and it's unwinding some of my Condas install.

58:27 And that is something that a new user is not going to figure out.

58:32 That's not going to be okay.

58:33 Yeah.

58:33 No.

58:34 So I think that's something that needs to be solved.

58:38 And I know folks are working on that.

58:40 I think the other one is this whole, like we talked a little bit about data visualization and it could probably be expanded more broadly is like once you start getting into Python, there are some areas where it's hard to make a decision about which direction I should go.

58:55 So data viz is a great example.

58:56 Like I want to plot something.

58:58 What do I do?

58:59 Do I do a matplotlib?

59:00 Do I do bokeh?

59:01 Do I do seaborne?

59:03 You know, all the various options.

59:05 You come from having one choice, the little charts, insert chart drop down to the paradox of choice of open source, right?

59:12 Exactly.

59:13 Exactly.

59:14 And then I think the other one, and I know you've talked about it a little bit, is the whole distribution problem.

59:19 So how do you get these Python scripts, programs, notebooks, whatever they are out there so that other people can use?

59:29 And do you have to go through some sort of EXE conversion?

59:33 Do you have to have some other way to distribute the files?

59:36 You know, in my mind, Envision, wouldn't it be cool that you had kind of a little launcher type thing that would go and check and download the most recent files and put a little GUI around it?

59:45 But something like that needs to be in place.

59:48 Otherwise, you're going to have a whole different type of maintenance nightmare.

59:52 Yeah.

59:52 I mean, we talked about GUI and PyInstaller, but these are sort of not super perfect projects.

59:59 They're great.

01:00:00 They're really awesome that they exist, but they're not a general solution.

01:00:04 Right.

01:00:04 Right.

01:00:05 And that's the reason Excel works so well, is I can put an Excel file together.

01:00:10 And once I give you that Excel file, for the most part, it's going to work because it's got all that Microsoft Office guts behind it.

01:00:18 Yeah, absolutely.

01:00:19 All right.

01:00:19 One other thing.

01:00:20 Are you familiar with the Microsoft user voice where they ask for features and they have one on Excel, one where somebody suggested that Python replace VBA as the Excel scripting language or maybe in addition to?

01:00:32 Have you seen this?

01:00:32 I've seen that.

01:00:33 Yes.

01:00:34 Yeah.

01:00:34 And there's a lot of excitement.

01:00:35 Yeah.

01:00:35 It blew up.

01:00:37 I mean, there's like 5,800 votes and just people going, wow, please, please, please make it happen.

01:00:42 You know what I mean?

01:00:43 And I'm probably like most people.

01:00:45 When I first saw it, I thought, wow, this is awesome.

01:00:47 Wouldn't that be perfect?

01:00:48 But the more I thought about it, I don't know if that's really a good thing or not, because it probably depends on the implementation.

01:00:55 And I'm not smart enough to understand how they would do it.

01:00:58 But like, I think like we talked about the power of Python is the ecosystem.

01:01:03 And so that I can use conda or pip to install something like, am I going to have all that available to me in Excel or would it just be?

01:01:11 Yeah, that's really interesting.

01:01:12 I mean, you can, I think you can embed like DLLs into Excel through like .NET.

01:01:19 And then there's Python and SQL Server.

01:01:21 There may be a way.

01:01:22 But, you know, as you say that, I also kind of think some of the advantages we spoke about, version control, like not a whole bunch of go-tos, but straightforward code.

01:01:30 You know, like all those things would still be absent, right?

01:01:32 So I'm kind of on the fence as well.

01:01:34 I mean, I'd sure rather see Python than VBA, but maybe VBA is needed there to chase them out of that area into a better place.

01:01:41 Yeah, I think so.

01:01:42 But we'll wait and see.

01:01:43 I don't know if it's, they're making progress with it or not.

01:01:46 Yeah, I don't know.

01:01:47 My feeling is there's not a lot of action.

01:01:49 There's kind of like, wow, people really want this more than we thought.

01:01:52 I don't know if we want to do this, but that's just my looking from the outside.

01:01:56 I agree.

01:01:57 I agree.

01:01:57 Cool.

01:01:58 All right.

01:01:58 Well, Chris, it's been great to talk about.

01:02:00 Before you get out of here, let me hit you with the final two questions.

01:02:03 If you're going to write some Python code, what editor do you use?

01:02:05 Mostly I'm Sublime or Jupyter Notebook.

01:02:09 Okay.

01:02:09 Yeah, that's a good fit.

01:02:11 And then notable PyPI package.

01:02:12 Yeah.

01:02:13 We talk quite a bit about Pandas.

01:02:15 You know, I'm a huge, huge Pandas fan.

01:02:17 I assume most of your listeners are familiar with that.

01:02:20 But I'll put a plug in from a data viz perspective.

01:02:24 So I've been using Altair quite a bit recently.

01:02:27 And I know you've talked about it a little bit.

01:02:29 And I'm really starting to like that package.

01:02:32 It seems to fit well with the types of business analysis that I'm involved with.

01:02:37 Still enjoy Seaborn for some of the more complicated statistical analysis.

01:02:42 Altair is pretty cool.

01:02:44 And I'm hopeful that it will continue to develop and, you know, get more and more powerful and

01:02:50 more used.

01:02:50 Yeah, for sure.

01:02:51 It looks really cool.

01:02:52 And there's some nice stuff.

01:02:53 And also shout out to Altair recipes.

01:02:55 So like quick little things you can use.

01:02:57 That's a nice project to even make it simpler, right?

01:02:59 Yes, absolutely.

01:03:00 All right.

01:03:01 Well, final call to action.

01:03:02 People surely work in their company with folks who have this Excel hell problem.

01:03:07 What do you tell them?

01:03:08 Yeah, you know, reach out to if you know Python and you're in a big organization, there's probably

01:03:13 people out there that could really benefit from your knowledge.

01:03:16 And you don't necessarily need to quit your day job, but reach out to those folks, spread

01:03:21 the word a little bit about Python and be good resource for them to get up to speed and

01:03:26 understand what Python can do for them.

01:03:27 Yeah, awesome.

01:03:28 I agree.

01:03:28 I think you can make a huge difference by writing software, but sort of empowering these

01:03:33 folks to go explore what they got to do.

01:03:35 Yeah.

01:03:36 And be patient as they learn.

01:03:38 Yeah, absolutely.

01:03:40 Be patient.

01:03:40 It's going to take some, but I'm sure it's worth it.

01:03:43 All right.

01:03:43 Thanks for being on the show.

01:03:44 It's great to chat with you.

01:03:45 Thanks a lot.

01:03:45 I appreciate it.

01:03:46 Yep.

01:03:46 Bye.

01:03:46 Bye.

01:03:46 This has been another episode of Talk Python to Me.

01:03:50 Our guest on this episode was Chris Moffitt, and it's been brought to you by Linode and Stellaris.

01:03:56 Linode is your go-to hosting for whatever you're building with Python.

01:03:59 Get four months free at talkpython.fm/Linode.

01:04:03 That's L-I-N-O-D-E.

01:04:05 Find the right job for you with Stellaris, the AI-powered talent agent for the top tech talent.

01:04:11 Visit talkpython.fm/Stellaris to get started.

01:04:15 That's talkpython.fm/S-T-E-L-L-A-R-E-S.

01:04:20 Stellaris.

01:04:20 Want to level up your Python?

01:04:23 If you're just getting started, try my Python Jumpstart by Building 10 Apps course.

01:04:28 Or if you're looking for something more advanced, check out our new async course that digs into

01:04:33 all the different types of async programming you can do in Python.

01:04:36 And of course, if you're interested in more than one of these, be sure to check out our

01:04:40 Everything Bundle.

01:04:41 It's like a subscription that never expires.

01:04:42 Be sure to subscribe to the show.

01:04:45 Open your favorite podcatcher and search for Python.

01:04:47 We should be right at the top.

01:04:48 You can also find the iTunes feed at /itunes, the Google Play feed at /play, and the

01:04:54 direct RSS feed at /rss on talkpython.fm.

01:04:57 This is your host, Michael Kennedy.

01:04:59 Thanks so much for listening.

01:05:01 I really appreciate it.

01:05:02 Now get out there and write some Python code.

01:05:04 I'll see you next time.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon