#200: Escaping Excel Hell with Python and Pandas Transcript
00:00 Michael Kennedy: Do you know or maybe you work with people who abuse Excel? Is it their hammer to pound all the computational problems that get in their way? Well join me to chat about this opportunity to bring Python deeper into their lives. You'll meet Chris Moffitt, who runs Practical Business Python. He works with lots of folks who could make better use of Python to solve their business problems and has a ton of material on his website focusing on just that. It's time to escape Excel hell with Python and Pandas. This is Talk Python To Me, Episode 200, recorded January 21st, 2019. Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystems, and the personalities. This is your host, Michael Kennedy, follow me on Twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via @talkpython. This episode is brought to you by Linode and Stellares. Please check out what they're offering during their segments, it really helps support the show. Chris, welcome to Talk Python.
01:11 Chris Moffitt: Hey great to be here.
01:12 Michael Kennedy: Yeah it's great to have you here, you've got some really interesting and amusing presentations around Excel.
01:18 Chris Moffitt: Yes.
01:19 Michael Kennedy: Not doing bad things with Excel, but maybe using it when they should use something else like Python or Pandas.
01:25 Chris Moffitt: Absolutely, yeah, yeah, I mean, Excel in and of itself probably isn't bad or good, but it's like many things the only tool that people have and sometimes it can be used in ways that cause a lot of pain in the future.
01:39 Michael Kennedy: Yeah it definitely can. You know if you think of what is the most widely used and deployed databases within companies, like on install, printstall, it's probably Excel, right?
01:49 Chris Moffitt: Absolutely, absolutely. And I think it's funny. I talk to other people in the Python community, maybe where they do more like Django development or really deep data science, or testing, and I don't think a lot of people understand how pervasive Excel is and how it's used for everything. And the idea that you just replace Excel is really a non-starter in most organizations.
02:12 Michael Kennedy: Yeah absolutely, it's pretty crazy. It's going to be fun to talk about taking people on that journey, maybe saying like, "Look, you could dip your toe in this programming world. You don't have to want to be a developer, but do you know how much better your life would be if you could learn like an if statement and a while loop and a few function calls?" It would be a lot better, right?
02:31 Chris Moffitt: Absolutely, absolutely. Especially some of the things that power users do in Excel, it is programming. I mean, they don't maybe think about it that way but it is. And I kind of talk about it. If you can write a nested if statement, in Excel you can probably program Python.
02:47 Michael Kennedy: Yeah, because if statements in Excel are painful.
02:50 Chris Moffitt: They're awful.
02:50 Michael Kennedy: Before we get to that, Before we get to that, 'cause it all fits on one line with no organizational structure.
02:55 Chris Moffitt: Exactly.
02:56 Michael Kennedy: I've written some myself. So before we get to that though, let's just start with your story. How do you get into programming and Python?
03:02 Chris Moffitt: Sure. So, I grew up in the early '90s. So my Apple 2C was first computer. Studied, learned basic and actually learn Pascal in high school. And then in college I studied Computer Science and Electrical Engineering. So I learned C, C++, kind of the traditional computer science curriculum at the time. Really enjoyed it, but when I graduated, I took a job where I didn't actually use any of it. So, some of those skills kind of atrophied a little bit. Fortunately throughout my career, as I've moved to other roles, I've been in typically a little bit more business type roles but I've been able to use my technical skills to try and solve problems. So that's been kind of my evolution over the past many years that I've been doing this.
03:49 Michael Kennedy: Interesting. So, what are your impressions and experiences of being this, a little more on the business side of things but having that technical skill and looking at other people, trying to solve problems manually, copy-paste, like no version control. Just looking at them and going, "Do you know how much easier this could be? What are you doing?" is that sort of something that you encounter often?
04:11 Chris Moffitt: Absolutely, yeah. That's definitely, one of the biggest challenges, you mentioned something like version control, one of those things that is just, once you start using it, and get used to it, not having it is just extremely painful. But most people who don't program really don't use version control. They haven't used GitHub or BitBucket, or even SVN back in the day. When you see how manual things are, how people try and rename files as the poor man's versions control, it is painful. It is very painful and I've had to think a lot about how do you approach that problem and not just tell people, "There's a better way to do it. Just scrap everything you're doing and move to my way."
04:54 Michael Kennedy: Right, how do you help them make a few steps. So, if renaming the files, having Excel Report May Version 3 is like Version 3 Kennedy or something that I like.
05:05 Chris Moffitt: Exactly.
05:05 Michael Kennedy: A weird naming scheme. If that is the version control, then Outlook probably is GitHub. If you attach it to an e-mail, it's kept live in this thread of email but there are different versions in there, right?
05:20 Chris Moffitt: Absolutely. It's either Outlook or it's in the file share buried somewhere in multiple nested file subfolders that are impossible for anyone to actually find or understand. Absolutely.
05:32 Michael Kennedy: Yeah, yeah. I think a lot of people either work independently or they work at small companies, but the larger the company, the more people copied on those email threads. There's like 20 people, and they all have their little tweaks in versions. It's crazy, right?
05:46 Chris Moffitt: It is, and it's so frustrating because you want to just say, "Hey, can't we just put the dataset in one place?", and we're all working off of it, and using version control just like you would in an open source project, that the concept just isn't there. I mean, there are some tools, some of the more recent Microsoft tools, support it a little bit but it's still not, at least in my experience, it doesn't seem to be as common as you would expect.
06:12 Michael Kennedy: Yeah. A lot of times what the answer seems to be, at these big organizations is like, "Well, this Outlook named thing as a version control sucks. So let's use SharePoint." which is like its own special type of hell.
06:25 Chris Moffitt: Yes. I haven't used SharePoint a lot mainly because I've heard it's its own special type of hell, and I think now Microsoft Teams is the new thing but I haven't used that enough to really figure it out because it's a lot of time and energy figuring out how to use it and then to get the people you're working with to use it as well.
06:45 Michael Kennedy: Yeah. Let me see if I can summarize my SharePoint experience. Like, "Oh yeah, that's on the shared SharePoint project site. Okay. Where's that?" And eventually someone'll send me a link, and then I'll click on it, and then it'll say access denied. I'll say, "You never gave me access to that part of the project." And they'll come back then they'll give me access to it. Then you can check out the file which downloads it and locks other people from checking it out, but you know maybe someone's already checked it out and forgotten it so you can't get it. It's just like, what are we doing? This is crazy, man.
07:16 Chris Moffitt: Yeah, it is.
07:16 Michael Kennedy: So, I know. It's really frustrating, those things. Yeah.
07:19 Chris Moffitt: Yeah. I've had similar experiences as a user, just trying to get it to work and it just didn't seem like it was worth the effort at the time.
07:26 Michael Kennedy: Yeah, for sure. So you've started Practical Business Python, this project, to sort of helped take people along. Tell us a little bit about that.
07:35 Chris Moffitt: Sure. So, I've always kind of had a little bit of that technical itch that I wanted to scratch and I did some Django development back in the day. And after a while, for multiple reasons, I wasn't doing that as much and so I felt like I just didn't have that much going on. And probably, I think in 2014, I decide, I'd like to do something else to get that technical information out there, to keep my skills up, to share with others, but I didn't want to embark on a huge open source project, something that I was going to have to maintain. So I thought, "You know what, I'll do a blog, get this started." And it's funny, originally, I thought I was going to spend more time talking about Raspberry Pi, Internet of Things type work, but as I started getting into it, that's when the whole data science thing where I guess probably when I started learning more about data science. And then learned about Pandas, and those tools, and started applying them to the problems I had and it was just like, "Wow, this is really cool." I got to learn a lot. Got a lot of really positive feedback from folks. So it was really exciting.
08:40 Michael Kennedy: Yeah that's awesome and Pandas is nice because it solves a lot of those problems automatically. It'll read a CSV directly and it'll kind of let you sort, and things like that. It has a lot of the features at the code level that other people might need at the Excel or GUI level. Right?
08:58 Chris Moffitt: Absolutely, absolutely. And I think one of the things that really sold me on Pandas, I had done in my role. I identified one process that was just extremely manual. A lot of work they would take to build this report, essentially on a monthly basis, and going back to when you only know one tool, or when you only know how to use a hammer, everything that looks like a nail. So, I decided I was going to build my data processing thing using Django. So Django Admin and Django ORM, and I put it together and it worked. But it was slow. So I would do this; I would kick this process off and it would take maybe an hour to run but it was an hour I could go to lunch, get coffee, whatever. So it was no big deal and it worked well. Eventually, once I learned Pandas and got over that learning curve, and then I'd solved the same exact problem in Pandas, on the same exact hardware and it took two minutes to run.
09:56 Michael Kennedy: That's incredible
09:57 Chris Moffitt: It's just Pandas because it uses NumPy and all the lower level functions written in C, and if you do things right it's pretty quick. And so, once I kind of had that experience, I'm like, "Okay, there's no reason we can't use some of these tools to solve some of the problems we have right now."
10:13 Michael Kennedy: Yeah, that makes a lot of sense. And you know Pandas, its origins, a lot of people think of it around data science. So maybe they think machine learning, they think science, but it really comes from the financial industry and it's probably got its roots right from Excel more or less, right, like right near there.
10:29 Chris Moffitt: Exactly. If you start kind of looking around the Pandas, there's a lot of things around time series and grouping of data that are just really, like you said, really kind of from the finance world and are very applicable to a lot of the kind of common problems that people use Excel for in many large organizations.
10:48 Michael Kennedy: Yeah. I was looking over your blog and all the articles you've written. There's a ton of resources there. It looks like you've been at this for quite awhile. And you said you used the Feynman Learning Technique, as in Richard Feynman, the physicist, right?
11:01 Chris Moffitt: Yes.
11:01 Michael Kennedy: What's the story there?
11:02 Chris Moffitt: Well, I certainly didn't start off doing this but this is something I don't remember when I first learned about it but basically, the idea is, you take some sort concept, you learn it, and then figure out how to teach it to someone that really doesn't know anything. Feynman talks about teaching it to a toddler. Certainly my articles aren't at the toddler level, but by going through and actually writing a blog post and understanding it at the level of detail that you need to to write a post and put it out there on the Internet, it really does force you to kind of have this feedback loop that he talks about where you think you understand something but once you try and put it in a blog post, you realize, "Oh, I didn't understand as well as I thought I did." And so then you kind of loop back and by time you're done actually writing a good post, what I find is I really understand the concept pretty well and that's what I try and do with the blog post; is get it out there so that others can learn.
11:55 Michael Kennedy: Yeah, it's interesting. You see these polished blog posts or conference presentations, or even online courses, and you're like, "That person so deeply knows this stuff that they're teaching me this. It's great I got this article or this video or whatever, but how am I ever going to reach that level?" And the reality is, probably that person who did that presentation went on that journey as part of creating this thing. They may have not known very much at the beginning, but they kind of applied this technique knowingly or unknowingly to go like, "I am interested, I know a little, I'm going to just dig in and like really put it together."
12:32 Chris Moffitt: Absolutely. And I think the funny thing is I was actually working on a post this morning where I'd written it a couple of years ago and the API, Pandas' API has changed, so I need to update it. And I had to really go back and figure out what I'd done. I know people have talked about this before, about there's got to be some name for when you have a problem and you Google it and it's a blog post that you wrote and forgot about. I mean, it happens to me all the time.
12:57 Michael Kennedy: I was just thinking of that! I've had that problem happened to me as well. "Where am I? Gosh, I really don't remember how this works. I know I did something with it a while ago." and I google it, and the best answer is something that I've done, I'm like, "All hope is lost."
13:11 Chris Moffitt: Yes exactly.
13:13 Michael Kennedy: If this is at the top and I don't remember it, we're kind of done here.
13:16 Chris Moffitt: Yeah. But it's a really good point. I mean, I think it's very easy for new users to get discouraged because it is a lot to learn, and I think it's really important you do a great job on your show with the other guests and letting people know that we're all human, we're all trying to learn. And some of this stuff seems to slip out of your brain over time and you have to continually re-learn it.
13:40 Michael Kennedy: Right. Well, at the same time, so much stuff is changing that you have to eject some stuff to make room for it. It's a matter of, can I go back and Google that thing or do I really need to force myself to remember it? A lot of times it's like, I have it written down, or I have an example, let's kind of just sort of forget that and really focus on this new library. I want to learn Altair for graphing and I'm just going to forget the matplotlib, because this is what I want to focus on. There's just so much new knowledge, it's like this constant battle for cognition, basically.
14:13 Chris Moffitt: Right, absolutely. And I think at the end of the day, a lot of what I'm trying to do at the blog is not so much teach people to understand everything but it's almost teaching them what is possible. So, it's showing the tools that are out there and saying, "Here are the types of problems, and here is a way to sort of present the problem maybe in a way that will apply to what you're working on today."
14:36 Michael Kennedy: Sure. So, what are some of the more popular topics or things that have come across in the blog that people liked?
14:41 Chris Moffitt: It's interesting. So, one of the things I've learned in writing a blog is it is very hard to gauge what is going to be popular when you publish it and what's going to live on through the magic of Google search engines. Some of the articles seem really popular in the beginning, but then hardly ever pop up in searches and don't get much traffic. Then others that seem to maybe be a little less exciting are the ones that tend to pop up on Google. So one of the most popular articles for a while was my Pandas pivot table. I really enjoyed that one. I think that one was a good example where I felt like it was going to be popular. People enjoyed it. I included, I think, a good summary of how to use the Pandas pivot table, and that was really popular. The one that surprised me is the Pandas data type article. So, I explained how to use the different Pandas data types, and that is one of my most popular Google Search items right now. So I just can't quite figure it out. Sometimes I think about it and then other times I'm just like, "Well I'm going to write what I think people want." and sometimes they're a hit and sometimes they're not.
15:52 Michael Kennedy: Yeah, it's a really tricky thing to try to guess that. You just kind of got to throw it out there and let the world decide what they're interested in because I've done that with blogging where I've thought, "I'll just throw this out like whatever, I've got a spare 20 minutes." And other times I'll spend a whole day writing something really polished and I'm like, "This going to be the good one." It's kind of like, "Meh." People don't care. The one I threw together, is super popular. I'm like, "If I knew it was going to be popular, I'd have put more effort into it." Same thing with videos on YouTube, same thing with even the Talk Python episodes, right? Similarly, all this could be super popular. I'm not sure people are going to love that one and it's all over the map.
16:29 Chris Moffitt: Yeah, absolutely. And I think you've talked about this on some of your prior podcasts. Like, as technical people, we're not necessarily good marketers. We kind of expect all this content to stand on its own.
16:41 Michael Kennedy: The world's a meritocracy. If I make it well, it's it's going to be fine. It would be cool if that were true; it's not.
16:46 Chris Moffitt: It's not. It's not. And so, that's one of the things. I don't do a whole lot of promoting old articles. I do wonder, should I do that? Should I kind of remind people, "Hey, here's something I wrote two years ago that's still applicable." that's not something I do but I think it should be done to really kind of keep this stuff fresh and remind people what's out there.
17:05 Michael Kennedy: Right. It's not a bad idea. I mean, one of the ways to promote anything like a blog, or courses or videos or whatever it is, to just keep making new stuff that's exciting, some of it will not be as exciting as you thought it would, some of it will be more and it'll draw people in. And then they'll find just keeping your profile high will help your older stuff stay relevant, but there's also ways you could continue to promote them more explicitly for sure.
17:31 Chris Moffitt: Sure, sure. And it is funny. I kind of forget once I've got an article out there and there's kind of that initial bump for the first couple weeks while people are discovering it, and then I almost forget about it but it is amazing when someone will stumble upon an article, that's three or four years old and it solves the exact problem. And they leave a comment or send me an email or reach out and say, "Hey Chris, that was awesome. Thanks. You saved me dozens of hours of time." So, it feels good and it's good to remember that that's going on as well.
18:02 Michael Kennedy: Yeah, the long tail is definitely up.
18:04 Chris Moffitt: Exactly.
18:05 Michael Kennedy: Yeah. So let's dig into a little bit of this Excel hell story. You did a great presentation and we'll link to the slides, or PDF the slides and the show notes. You talk about basically what is Excel hell and what is Python and Pandas. And it's a pretty good synopsis for folks who are presenting to a business audience.
18:25 Chris Moffitt: Sure. Excel hell, in my mind, is when you have a problem, it's a legitimate problem you're trying to solve at work, in your business setting, and you've solved it using Excel. And then you go back and try and figure out what you did and it's impossible to recreate what you've done or impossible to just like tweak it slightly to use it for a new problem, it's just the kind of pain you feel once you open up something you worked on six months ago in Excel and try and recreate it and understand, "What in the world did I do? Where were my data sources? What's the most recent version? What have I changed?" I mean, that in my mind is what Excel hell is.
19:10 Michael Kennedy: Yeah. Does it involve maybe like when you open up the worksheet, that the app kind of goes gray or white 'cause it's not responding for a while 'cause something is happening in there?
19:20 Chris Moffitt: Yeah. I mean, there's certainly one where you're like, I think that's the challenge with Excel is there are so many different ways you can interact with the data. So you can have formulas, you can have VBA behind the scenes, you can have multiple tabs, you can have linked workbooks. So when you open one, there's really no way to figure out what is going on? How did they get there? What manipulations have been made? How is this supposed to be used? People underestimate how challenging it is to decipher the complexities of a big Excel workbook.
19:56 Michael Kennedy: Yeah, and the reason you get there is because these problems land in this weird middle ground. They're not big enough to bring in the software team to build a custom solution. There's probably not a SaaS solution that's immediately obvious out there. But it's beyond just data. It's like it has to make some decisions, and as soon as the conditionals start to appear in there, it starts to seem like you're taking that first step down that slope, 'cause like, "Now here's where the if statements are in the table." and you're like, "Okay, here we go."
20:30 Chris Moffitt: Exactly, exactly. And then there are so many cases of when you're doing it and you have a vlookup formula and you forget to copy it down or you have the ranges wrong. I mean, there are just so many types of formulas, so many types of interactions that seem seductively easy, but just set you up where it's so easy to make a mistake and so hard to troubleshoot it when you do.
20:54 Michael Kennedy: Yeah, I mean, if you miss copy the formula, right? How do you look at that thing and go, "Yeah, okay." Especially if it's spanning multiple sheets, it can be a beast.
21:03 Chris Moffitt: Exactly, exactly. And then, like we were talking about earlier, the whole version control mess. I mean, there is no way to just do, or no easy way to do a diff of the code and the spreadsheets to understand what's changed and why did it change. Very painful.
21:19 Michael Kennedy: Right. Certainly direct version control won't help you 'cause I think it's an XML file that is the source file of Excel but it's zipped. So, you can't actually diff the code. Again it's like, if they had just left it unzipped, effectively, at least you could diff it.
21:34 Chris Moffitt: Right.
21:34 Michael Kennedy: But, no.
21:35 Chris Moffitt: Yeah, all hope is lost.
21:39 Michael Kennedy: This portion of Talk Python to Me is brought to you by Linode. Are you looking for hosting that's fast, simple, and incredibly affordable? Well, look past that bookstore and check out Linode at talkpython.fm/linode. That's L-I-N-O-D-E. Plans start is just five dollars a month for a dedicated server with a gig of ram. They have 10 data centers across the globe, so no matter where you are or where your users are, there's a data center for you. Whether you want to run a Python web app, host a private git server, or just a file server, you'll get native SSDs on all the machines, a newly upgraded 200 gigabit network, 24/7 friendly support even on holidays, and a seven day money back guarantee. Need a little help with your infrastructure? They even offer professional services to help you with architecture, migrations, and more. Do you want a dedicated server for free for the next four months? Just visit talkpython.fm/linode.
22:32 Chris Moffitt: We talked about, coming from a programming background, I mean, even if you've just had a little bit of experience, the idea of a an if statement and if...then...else or a case statement or switch statement, or defining functions, als is fairly straightforward but when you try and cram all that into a single Excel cell, like we said, trying to do a nested if statement is just impossible. I mean, it is very, very hard to do and understand it. And I think anyone that's worked with Excel has opened up these spreadsheets and you click on the tab or on a cell and there's just this massive formula with all these parentheses and all this kind of error catching, and you're like, how can I understand if this is even right, how can I troubleshoot it? So that's Excel hell in my mind.
23:25 Michael Kennedy: Sounds about right. I think for the purpose of this discussion, it sounds like it's more or less the same situation if you're actually using Google Sheets or some other, technically another tool than Excel, but something Excel-like, right?
23:39 Chris Moffitt: Yes, absolutely. I mean, I think there are some things that Google Sheets makes a little bit better than I think and, I'm not a huge Google Sheet user but it does kind of the versioning automatically, and seeing the history,
23:50 Michael Kennedy: True.
23:51 Chris Moffitt: I think it's a little bit better. One of the things I think Google Sheets does better is it's easier to share. So if two people are working in it at one time that works better than Excel where it just locks the file and you have to yell at your co-worker to get out, that kind of thing.
24:06 Michael Kennedy: Yeah, yeah. Exactly. It is a little better there for sure, but it's also simpler. So maybe, you can't push it as far. You can't get as far down that hole before you have to scream for mercy.
24:17 Chris Moffitt: Right, yeah. I haven't dived into Google Sheets to see how complex can you do the formulas. I know it doesn't have VBA but I'm sure you can program it behind the scenes. I'm sure someone's done something pretty crazy. But I think just Excel is just so pervasive and so misused in the organizations.
24:37 Michael Kennedy: Yeah, yeah. Well thankfully, both Excel and Google Sheets have Python packages that will let you work with them. Which we'll get to.
24:44 Chris Moffitt: Yes, exactly.
24:46 Michael Kennedy: Let's start with the question, why is Python better than Excel? Especially from the perspective of somebody who's like a power user but they don't consider themselves a programmer, they wouldn't pick up code and think that that's okay for them. Like, that's for someone else.
25:02 Chris Moffitt: Right. There's a couple things. One of them is Python, and when you use these tools, are going to encourage a little bit better practice so it's more reproducible. So you can look at a Python script, and even if it's the newest of newbies and it's maybe not the most Pythonic, you can at least step through it and understand what's going on. And if you've maybe put some comments in there or used a Jupyter Notebook to capture some of this information, there is some reproducibility built into that and you can step through it and see what's happening and you can see, "I'm taking data A, and data B, and data C, and I'm joining them together, and I'm cleaning them. I'm doing all these types of things that a program." It's going to be fairly straightforward how this is happening and in what order, whereas Excel, just by its very nature, doesn't do that.
25:57 Michael Kennedy: It's super hard to even look at an Excel workbook and go, "These are the parts where code is." You can't even see the code. You've got to go touch every little cell to see, is this a formula? Is this a value? It's like not linear, it's circular, it's cross sheet. It's a little bit like VB with goto, you know?
26:17 Chris Moffitt: Yeah, yeah exactly. It's a lot of goto statements. That's some very good way to put it.
26:22 Michael Kennedy: I think it's like a two-dimensional goto.
26:25 Chris Moffitt: Yeah, yeah absolutely. One of the other problems with Excel is I do think it lends itself to some bad practices. So, it is very easy in Excel to edit the data. If you pull in some data and a cell has the wrong value, you can just go in there and type it and fix it. Well the problem is, next how you pull the data, you don't know that you just fixed it. It's all live. So by having a program, you at least have to be more conscious in Python about changing a value. You certainly can but you have to make a very explicit choice to do that.
27:00 Michael Kennedy: Right. And if you somehow pull the data source and dropped in there but then you tweaked, then that's right. There's no history or artifact that says that this happened.
27:09 Chris Moffitt: Exactly, exactly. And then I think, outside of the just overall benefits of having a program that's a text file that you can version control and understand, I think the biggest benefit is you have access to all of the Python ecosystem. So all of the libraries, everything that's out there, whether it's data science or a web scraping, or any of the thousands of things you can do with Python, you actually have access to it. Whereas, even Excel, as you start to get into VBA, you can certainly do more but if you want to go parse an XML file or scrape a web page, you can probably do it but there's going to be a lot better ways to do it in Python.
27:55 Michael Kennedy: Yeah, sure. To try to COM object in C++ and distribute that, you'll be fine.
27:59 Chris Moffitt: Yeah, exactly.
28:02 Michael Kennedy: It just sounds horrible, I don't want to go anywhere near that! And obviously that's out of reach of the people who actually want to do those things, right? I think you're right, the ability to acquire and keep the data up to date. I'm going to use beautifulsoup and requests to go grab this table off our internal intranet everyday, so when I make the report it automatically starts with absolutely fresh data and things like that, right?
28:27 Chris Moffitt: Right, absolutely. And then, my experience has been, no matter how clean the data is, someone wants the report a little bit different or you need to apply some sort of business logic to clean it up. And if you have been doing that by hand and then you forget to do it, everybody gets upset. So, then you can put it in Python, you can code it there and say, "This translation always needs to happen." and it's there, and then people don't get upset when it's missing. So there's just so many benefits to having that repeatable record of what you've done with the data analysis.
29:02 Michael Kennedy: Yeah. We'll touch more on this later, but just having the ability to press a button and have the answer without going, "What are the seven manual steps I got to do and I hope I don't forget them, and, Oh, I'm on vacation next week. And Sarah, who's never done this, it's her job to do those seven steps. Does she even have access to it?" Oh my goodness. So, just having this repeatable alone is pretty valuable.
29:26 Chris Moffitt: Absolutely, absolutely. And then, the other thing I'd say is, I think it is interesting that Excel as powerful as it is, it's pretty easy to get a data set that just doesn't work in Excel. So if you get a couple hundred thousand rows in Excel, starting to try and manipulate it, it starts to slow down. It'll bog down your system. Whereas, if you read that data into Pandas and do the manipulations, it's pretty fast. As long as you can fit it in memory, you can manipulate a lot of data in Pandas that would be difficult to manipulate in Excel.
29:58 Michael Kennedy: Yeah, that's a good point. And there's limits, actual limits to how much data Excel will accept; like a million rows or something, right?
30:04 Chris Moffitt: Yes, absolutely. And it does keep going up and you can have more rows and more columns but it doesn't mean you can actually work with it when it's in there.
30:14 Michael Kennedy: That's a different story also, but yeah, cool. And then also it's, like you say, it's a proper language. You learn to work with Excel, you can make really complicated Excel sheets. You learn to work with Python, all the sudden you start to see these other problems like, "Oh, I always copy these files from here to there and I always make these little edits to them. I always have to grab this data and transform it that way." It's sort of a good way to draw them in to this automate your world space, right?
30:45 Chris Moffitt: Absolutely, absolutely. And then, that's the thing that is nice about it; is the internet loves to have language wars and we could fight about whether the Python is really the best language for the task, but in most places it's going to to be a pretty good choice. It may not be the best. So if you have learned Python and then you expand and start doing something a little bit more than just the Excel manipulations you're doing, you're not going to have to re-learn a bunch of things. You're not going to be going down a path where you've got to make a huge course correction at some point in your career if you decide to continue developing that skill.
31:18 Michael Kennedy: The hard part about learning programming is not learning the Python syntax. That's not so hard. It's like, what is a loop? When do I use loops? What is a conditional? What is a data type? What data types are there? That's basically the same across all the languages. It's just like, I need curly braces and parentheses. And why are there semicolons everywhere? But still, it's no big deal, right?
31:39 Chris Moffitt: Yeah, yeah absolutely.
31:40 Michael Kennedy: Yeah, interesting. So, one of the things that I keep repeating is we don't need 100 million software developers but we do need people who can do whatever they do better. And I feel like software development a little bit of it, like the kind you're talking about, can really amplify whatever you're good at anyway. So, from this context, maybe people are listening. They're probably developers but they probably work at companies where they get asked to help with Excel sheets or they have these automation problems that start from Excel. Who should they go start going, "I know I could write this for you but let me help you learn to automate your world a little bit." What type of folks should be reaching out to you?
32:24 Chris Moffitt: So I feel like anytime you have a group of people that use Excel, I don't know what the number is but five to 10 people, it seems like there's always someone that's like the Excel super user. So it may be...
32:34 Michael Kennedy: All this color, A lot of color and there are some formulas, right?
32:39 Chris Moffitt: Exactly, exactly. Now definition of super users is going to vary. I mean, sometimes it could be someone that is really good at some pretty complex VBA, it could just be the person that understands how to put a pivot table together. But I think in any group where you have people that are doing repeatable type processes, there is someone that is the de facto expert. And that is the person that I would target and kind of say, "Okay, let's take a look at what you're doing here and see if there is an alternative to use this tool called Python to improve this process." You raise a really good point. I think one of the big challenges here is these people aren't going to say, "I wannna learn programming." To them it's like, "I can't learn programming. I need to go to a university and get a four-year degree." I would encourage the people that are listening, that if they do have that Python knowledge to kind of help ease them into that and say, "You know what, you don't have to be a computer scientist to do this. Let me show you some simple things that you can do in Python." to help with this process and get them started. And so yeah, focusing on those kind of natural experts in the group.
33:53 Michael Kennedy: Yeah, that makes a lot of sense The fact that Python is not compiled with diminishing instructions with a compiler and linker type of language means you don't have to drag them through the sort of software computer sciencey tool chain. You can just say you write this in a text file, preferably in a proper editor that helps you with spaces and autocomplete, and then you're kind of good, right?
34:15 Chris Moffitt: Yeah, you raise a good point. I think these, if you are a Python expert, you should be able to help get Python setup on their system. And I know you've talked about this before, it is much easier than it used to be back in the past, but it's still a non-trivial activity for someone that's not comfortable with this. I mean, I've had experiences where I've deployed some Python scripts and told people, "Okay, open a command prompt on your windows system and type python filename" and, it just kind of blows their mind a little bit. They're just not used to that. Most people don't type commands on their laptop. So that is a...
34:53 Michael Kennedy: At all.
34:55 Chris Moffitt: Exactly.
34:56 Michael Kennedy: They maybe don't know what the command prompt is at all. It's very possible When you're talking excel and you're talking business users, and you're intersecting those two things, there's a good chance that, that intersection lands on Windows and not Mac. It's very unlikely it's on Linux, right?
35:11 Chris Moffitt: Absolutely, yes.
35:14 Michael Kennedy: This portion of Talk Python to Me is brought to you by Stellares, the AI-powered talent agent for top tech talent. Hate your job or feeling just kind of meh about it? Stellares will help you find a new job you'll actually be excited to go to. Stellares knows that a job is much more than just how it sounds in a job description. So they built their AI-powered talent agent to help you find that ideal job. Stellares does all the work and screening for you, scouting out the best companies and roles and introducing you to opportunities outside your network that you wouldn't have otherwise found. Combining deep AI matching with human support, Stellares pairs things down to a maximum of five opportunities that tightly match your goals, like compensation, work-life balance, working on products you're passionate about, and team chemistry. They then facilitate warm intros and there's never any pressure, just opportunities to explore what's out there. To get started and find a job that's just right for you, Visit talkpython.fm/stellares. That's talkpython.fm/S-T-E-L-L-A-R-E-S, or just click the link in your show notes in your podcast player. What are the recommendations for making that work? Like Anaconda, things like this, or what do you suggest there.
36:27 Chris Moffitt: I'm a big fan of the Anaconda distribution. I tend to use Miniconda. That's the more stripped down version that has just not all of the libraries. Getting that installed on the system, it doesn't take as much space, and then adding in Pandas and all the other Jupyter Notebooks and pieces that you need. My experience has been that it works really well on Windows and that's probably the easiest way to get started.
36:52 Michael Kennedy: Python on Windows is getting better, especially if they have Windows 10. There's some pretty good stuff that's happening. Like, you've heard about Microsoft embracing Python more. We had Steve Dower on the show recently. He's the one that core devs at Microsoft. He talked about how he got Python 3 in the Windows Store, which I think is a big deal for business users.
37:12 Chris Moffitt: Absolutely. And then the other challenge, you've got this challenge of how do you get the users, but then there is kind of this internal corporate IT challenge some places where laptops could be pretty locked down and you don't have administrative rights and you can't actually install programs. I really am excited to see Microsoft's embrace of Python. It's really funny to me. I think of Microsoft 20 years ago, and it was the evil empire and embrace and extend and extinguish, but now, I mean, they're really doing some great things and what they're doing with Python I think is really good and I think that really plays it well for these types of roles where people are using Excel and now Python, I think truly is a legitimate option for them.
37:57 Michael Kennedy: Yeah, I do too. I think a lot of the lack of requiring admin rights, like say, if you go in and you change your path, or even just like techie. There's not even necessarily an admin operation, you could do that for your user profile. But go to the Windows Store, type Python, click that and it's already in your path. You've already got python and python3 as a command. That's another hurdle that they erased there.
38:20 Chris Moffitt: Exactly.
38:20 Michael Kennedy: That's pretty excellent.
38:22 Chris Moffitt: If the IT Department starts to get upset about it, you can say, "Well, it is in the Windows Store. It's not like something that is just wild or free on the Internet."
38:31 Michael Kennedy: Right. It's like, "I didn't get that from some weird mirroring site of an open source link. It looks super sketchy but it is linked off of the main project site. So I'm going to kind of trust", Like it's not that, right? It's not like trying to download PuTTY from some weird place that's SourceForge. Also it runs with low privileges. It can't touch the registry, it can't touch the shared file areas, the one that comes from the store, the Python from the store. So that's pretty good.
38:58 Chris Moffitt: Yes, absolutely.
38:59 Michael Kennedy: But that said, that's a CPython pip world, that's not a Miniconda, Anaconda world. Would you still recommend people go down the Miniconda side for now?
39:07 Chris Moffitt: In my experience, you can install Miniconda without admin rights. So, you install just as the local user. I don't understand under the hood what the real difference is between that one and the Windows Store version, but I do know my laptop is fairly well locked down and I can install it and run and it does everything I need. I haven't had to mess with the registry or any of the private Windows files, but for doing data analysis, replacing Excel files, it seems to work just fine.
39:36 Michael Kennedy: Yeah, for sure. So you're talking to these power users, and probably a lot of them, if they're Excel power users, it's very likely that they're kind of higher up in the organization. How much of the conversation goes something like, "Oh, I should use Python. How much is that going to cost us for some corporate licenses to Python?" And, "Oh wait, it's open source. Should we be using open source?" Is this a conversation that you have to have or people just kind of brush it off?
40:00 Chris Moffitt: I think it is a conversation you have to have, and I would say, it depends on the organization and your role in the organization, and how comfortable you are pushing the boundaries. So if you feel like, "If I do this, people are going to be okay with it.", but if you're uncomfortable, you definitely need to make sure you talk to the powers that be, so that they can provide a little air cover, if someone does get upset about bringing this new Python tool into their ecosystem.
40:28 Michael Kennedy: Yeah, interesting. So I think it does help a lot that it's in the Window Store then we have Anaconda, Inc. as like official backers of sort of their distribution and it gives it some legitimacy, I would say.
40:38 Chris Moffitt: Exactly, exactly.
40:39 Michael Kennedy: All right. So let's suppose the listeners out there are like, "I have this Python knowledge. I don't want to be asked about another forms over data app. I want to help these people do their world without me helping them out. I want to teach them to fish basically.", how should they get started? They sit down on Tuesday and they say, "Hey, let me show you something. Let me get started." What do you think of some steps there?
40:59 Chris Moffitt: Sure. It is hard part because there's not just the one place to go to say, "Hey, here's how you learn Python and the appropriate tools to solve your problem." So, what I think works best is, people might have the natural reaction. I want to focus on this really painful problem I have. It's a really big problem, it takes a lot of my time. It's very error prone, whatever. I would say actually start with something that's maybe a little simpler, that's very well understood, and try and automate a simple process and just get the basics down before you move on to the more complex stuff. So maybe there is a process where you're moving some files around or opening up Excel and copying and pasting a few rows or columns from one to another, start with that and just kind of get that going, and show them how to setup on their system, how to read in a file, how to make a couple changes with the data frame, with Pandas data frame, save it back out, and just start there. And try and do something in 10 to 20 lines of a Python code to get the ball rolling.
42:09 Michael Kennedy: Yeah, that makes a lot of sense. And I do think it's these little almost death by a thousand paper cuts type of problems that if you can just solve some small to medium ones, but they become repeatable. If they could be scheduled, they could become automatic. You just had an email when you came in on Monday. Instead of you doing the report and sending it to the people who need it, it just happens automatically and you get a copy on Monday. Those kinds of things are pretty easy to attack. But I think a couple of those and then the joy would start to spread. You're like, "You know what, this is awesome. What else can I do with this?" Right?
42:42 Chris Moffitt: Exactly, exactly. And then what I found too, is sometimes maybe there's a report that you do once a week. But it's automated now so you could do it daily. And suddenly people are like, "Oh, wow! Daily, this is really useful." So it's not just the benefits of time savings, it's the benefit of like, "Now it's easy for me to do, so I'll do it. Whereas before, if it took me several hours to pull something together, I'm probably not going to be willing to do that every day just 'cause I don't have the time." So I think there are a lot of unintended benefits of doing the automation.
43:17 Michael Kennedy: Yeah, for sure. And the next step from there is on demand. You want to know what the weekly sales rate versus last week were, store by store, you just click here and then you have it in a minute.
43:30 Chris Moffitt: Exactly, exactly. And you're right. I mean, you kind of get this momentum it builds on itself. So once you've taken something simple and gotten started, then, "Oh, I want to tweak it a little bit." That's where you also see the value. I'm not starting from scratch. I can take what's already there and maybe just add some additional code, or use as a basis for a new process. It starts to build and you kind of get this library of tools and ways to manipulate the data and report the data out. That just really accelerate your speed to get things done.
44:03 Michael Kennedy: Yeah, it's obvious to us as programmers but if you always have to get the data from one place and maybe you have to do different stuff with it, once you write that part to get the data once, that is done. That is a solved problem. Now you build on top of that capability. And I'm sure a lot of folks that we're addressing, talking about helping them here, they just don't have that mindset of, "If I solve it in one way, that solution is reproducible and reusable.", right?
44:26 Chris Moffitt: Right. And that is a challenge. So anyone that's learning this new tool, they're going to to be slower in the beginning and they're going to have to invest time. In my experience, I've had to learn a lot of this off hours. It's not something that you just can get done in an eight-hour day. And so, there is a little bit of a targeting of that individual. Is this individual at least open to trying to learn some on their own so we can point them in the right direction and they can learn about Python, they can learn about these tools, but are they going to be able follow up on their own or are they just going to say, "I need you to do it for me." If that's the case, they're probably not the best candidate for this.
45:07 Michael Kennedy: Right? It seems to me like if you're doing some sort of four-hour task weekly, and there's somebody who's going to help you, they could sit down say, "Instead of doing that manually, why don't we spend those four hours on Monday, build it together. I mean, this is not me offering to be your private software developer, but this is me helping you down this path. I'm sure that we could do this in four hours. Let's sit down and do it together. So you understand it and then you can take it on and sort of keep it going." I think things like that may work.
45:35 Chris Moffitt: Yeah, absolutely. And then, one of the things that I've wanted to do but I haven't really done a whole lot of is, what kind of user groups can you set up in your company so that you have some of these resource, pure resources, to help them work through the process? Your podcasts about the Apple Python training was really, really interesting and, I mean, certainly a much larger scale than what I'm talking about but I think that that would be another option, is to try and get four or five, a dozen people, like-minded individuals together and over the lunch hour, start to introduce these concepts and build a community where they can learn and share their learning.
46:13 Michael Kennedy: That's a great idea. I kind of have all these Excel power users who are escaping Excel through Python, start to come together and give little presentations. "Last week I did this automation. Let me just talk to you about how it worked, and what I had to overcome.", and yeah that'd be great.
46:28 Chris Moffitt: Exactly. And because what I think will happen is there are a lot of common challenges. How do I get access to this database, or how do I massage the data in a certain way that is probably fairly common in an organization. So if you can share that or simple things like I've got to do a lot of PDF file manipulations. Okay, what library are you using? What are some of the ways to do that? I think there's a ton of those types of leverageable problems across an organization.
46:59 Michael Kennedy: Right. As soon as someone finds the solution to getting that table out of that PDF document, other people are like, "I have to copy it out of the PDF document. What are you doing? This is cool."
47:09 Chris Moffitt: Exactly, exactly.
47:10 Michael Kennedy: Yeah. I guess you probably got to set expectations as well. You can't say like, "Look, I know you've got super cool apps on your phone, and you've been to airbnb.com." This is not what we're talking about. What we're talking about is a few steps, reproducibility really quick that you don't, just setting expectations that you can't start by building some kind of super glamorous app.
47:34 Chris Moffitt: Absolutely. It's a really good point. That is one of the challenges. As much as we rail on Excel and maybe Access, which we haven't talked about the similar sort of thing that it's still really intuitive for someone new to pick it up and do something. You can get results and it looks pretty, like you said, whereas, are people going to be impressed that I ask you to open up the command prompt and type python, this, this and this, and then a file is output somewhere. I mean, it doesn't have that wow factor that you would with more of a GUI-type application. So I agree 100% that you need to be able to get people comfortable with what you really can and can't do.
48:18 Michael Kennedy: Right. So it feels like success, not failure, when it's working. Right?
48:23 Chris Moffitt: Exactly.
48:24 Michael Kennedy: I think there are a few simple things you could do to help. Like right now as our discussions gone, the solution is I go to the command prompt and type python, maybe python3 space, the script that does the magic, maybe I set up a Windows task scheduler that does that, I don't know. In some way that thing is running right but it's basically a command CLI type of experience. You could use something like gooey, G-O-O-E-Y, which is a really simple way to turn a CLI interface into a GUI cross platform interface, like G-U-I, GUI. The other one that you brought up that's pretty straightforward is Jupyter Notebooks.
49:02 Chris Moffitt: Yes, yes.
49:03 Michael Kennedy: Alright do you think that helps a little bit with the acceptance layer, like you can visualize the graph? You could do a bokeh or matplotlib thing.
49:11 Chris Moffitt: Absolutely. And it's interesting you mentioned GUI, which is hard to talk about but when you see it, you'll understand what it is. I wrote an article about it and I've used it a lot. I think it's a fabulous tool for people that have you kind of have this command line, and trying to teach people how to use a command line can be frustrating. But GUI, you can just easily put that on top of it, and especially where it's selecting files, selecting dates, selecting a couple values from a dropdown.
49:40 Michael Kennedy: I've got some combo backdrop, exactly. The dropdowns, yeah.
49:43 Chris Moffitt: Exactly.
49:43 Michael Kennedy: It's so easy, right.
49:44 Chris Moffitt: It is. And the other thing is it's WX Windows, so it looks native. It's not an ugly GUI and I personally use that when I put some things together and people just know how to use it and they don't really care what you did, just that, "Oh, it looks like I would expect a Windows app to work." So, that's great.
50:04 Michael Kennedy: Yeah, the fact that they have a thing, they double click that. If you use pyinstaller or pyoxidizer or something, you get like an .exe or a .app file, you put it in your dock or your taskbar, you double click it, all of a sudden, it's almost a first class citizen in your world.
50:18 Chris Moffitt: Exactly, yeah. Totally. And then Jupyter Notebooks I think, are certainly another option. I don't have experience with building a Jupyter Notebook hub that you would share out with everybody, but I do agree that having that where you tell people, "Okay, run the notebook. Go here in your browser. Use some of the widgets to populate some information." I think that's absolutely a valid, valid option.
50:40 Michael Kennedy: Yeah. It seems like that would really help with the acceptance. You're like, "Okay, this is not so scary. It's not the terminal or a command prompt." I know in reality it's like not that big of a deal, but psychologically, it's a massive deal for some people. They're just like, "Forget that. This is not the way we work here. This is not okay."
51:01 Chris Moffitt: Exactly. And actually where I see Jupyter Notebook, what I have been trying to do is almost like when I have a data challenge, so I get an Excel file or I need to do a project, and I wrote about it on my blog, I have a cookiecutter that creates a couple standard Jupyter Notebook directory or directory of some Jupyter Notebook's stubs, and my in and out directories, and the raw data file. So it's a consistent methodology. I know if I go back six months later and open it up that I have kind of that Jupyter Notebook history of what I've done and all the files are where I expected, and I put notes in there about where the files come from, and what am I trying to accomplish, all those kinds of things. And I think Jupyter Notebook could really be useful for that type of analysis instead of, in my perfect world, instead of reaching for Excel every time you need to do analysis, "Hey I'm just going to use a Jupyter Notebook to start looking at the data and treating it almost like a data science problem." Versus "I'm just going to to hack together something in Excel."
52:03 Michael Kennedy: That's really cool. I love the cookiecutter idea as well, that's nice.
52:06 Chris Moffitt: Yeah, and it's really worked out well. Once I've started to use it, and now I've been using it for a few months and go back and look at projects where I've used the cookiecutter strategy. I'm like, "Oh yeah, this is really cool. I kind of know where everything is and I know what to expect." and, I can then create that feedback loop where there are maybe snippets of code that I want to include in future ones so I can just kind of keep populating that cookiecutter with the most recent codes.
52:34 Michael Kennedy: Right, you take care of learnings from the last one, and you kind of make the project a little bit better, a little bit better, a little bit better, and I love it.
52:39 Chris Moffitt: Exactly. The other thing we talked a lot about Excel, one of the things that I think is I've started to learn more about data science and use some of these tools. There is a different approach to looking at data when you looked at it almost from this data science approach versus looking at it from Excel, and the data science approach of, you do exploratory data analysis on the data set, and you start to learn things that are easy to do with Panda's data frame that are maybe hard to do with Excel. So just simple things about where are my null values? Where do I have duplicates? Whereas maybe the data clean as I would expect. You can kind of go through that as part of that process that I think is maybe a little more robust, a little more natural for data analysis than just trying to open up an Excel file and do a bunch of auto filters and pivot tables.
53:33 Michael Kennedy: Yeah, and it doesn't matter how big the data is or how wide it is whereas in Excel, it's like you kind of got to live visually in the data. You know what I mean? You got to have your chart and stuff swim there rather than, it's fine it has 200 columns. We load it and it's just, like you show it below, where the data went, right? It's all good.
53:50 Chris Moffitt: Yeah, exactly. You start to learn the data science concept, so the concept of tidy data is certainly something I never really understood just as an Excel user.
54:01 Michael Kennedy: Define that for us.
54:03 Chris Moffitt: So essentially, I knew you were going to ask me that, and I just should have looked up a better definition. Instead of when you have a row of data, it's like a single observation in that row versus having multiple columns of data. So think about if you're observing cars passing on street, you would have a car and then maybe you'd have the make and the model versus multiple columns with all the different makes and models and yes/no's in each column. And the reason it's important is a lot of these data science tools then make it very easy to manipulate and more importantly visualize the data when it's in tidy format, and once you get it in that format, then suddenly the opportunities for visualization that are available to you are really powerful that you just can't do an Excel very well.
54:52 Michael Kennedy: Yeah, that's cool. I see a lot of interesting things sort of coming from this. Somebody moves a little bit out of Excel, they kind of get in with Jupyter, they kind of learn Pandas, and all of the sudden they start thinking, "You know what, how hard would it be to train a machine learning model to predict rather than just to report?" There's just a few little steps. Those things are not that hard but you need tidy data and you need good, got to get the right factors or aspects of your data you want to feed to the model and all that kind of stuff, right?
55:22 Chris Moffitt: Absolutely. One of the things that I put in the notes for us to talk about was Facebook's Prophet tool.
55:29 Michael Kennedy: Yeah, tell us about that.
55:30 Chris Moffitt: I think this is a really interesting example of a really powerful tool that lots of analysts need. And if you just learn a little bit of Python, you can do some really cool things. So the project, for those not familiar with this, it's an open source project that Facebook put out a couple years ago, and it's a tool for making predictions about time series data. So think about if you wanted to predict like how many hits are you going to have on your website in the future? Or, their example is about hits on Wikipedia, hits on Peyton Manning, is the thing that they walk through. But I think any organization is going to to have a lot of can you predict the future? Can you predict what we need to manufacture? Can you predict what our sales are going to be like? Can you predict what this marketing campaign is going to do?
56:21 Michael Kennedy: Yeah, imagine you are the person who had that answer?
56:24 Chris Moffitt: Yeah, exactly.
56:25 Michael Kennedy: You would just be like, you are the magician!
56:27 Chris Moffitt: Exactly. What I suspect happens is, most big organizations maybe in supply chain, you do have someone that is really smart on this type of prediction. But you have to predict all across the organization. So what do most people do? You plot out your history and maybe you kind of do some sort of regression line and kind of make a guess. But if you start to look at all the different mathematical options you have for predicting time series data, it's mind boggling. How could you figure all that out? So Facebook essentially had this problem, so they built this library where they put some fairly sophisticated math behind it, but a very simple API where you can only change a few variables and get some really nice predictions and some really nice visibility to the trends in your data. I've been really impressed with it and think it's a really cool tool because it strikes a really nice balance of power but easy to use for someone that's not a PhD Math professor.
57:33 Michael Kennedy: Yeah, that's cool and it seems like it's a good fit for the people we're talking about in the show.
57:36 Chris Moffitt: Yeah absolutely.
57:38 Michael Kennedy: Yeah, nice. Alright, we're getting a little short on time. I do want to ask you a few more things real quick before we go. Let's close this out. We talked about all the benefits, we talked about all the good things here, but maybe let's just talk on a few areas. And what Python needs to do better to better take advantage of this group and just go farther.
57:55 Chris Moffitt: Sure. So a couple things. We did have a good discussion about should we use Miniconda, should we use Python from the Windows Store? I think that is still a good area for us to figure out what is the preferred way to get Python out and then installed on a system? And then more importantly, how do you keep it updated? There's still some kind of wonkiness between installing with Conda and pip, and I've been bitten by before where suddenly I'm installing with pip and it's unwinding, so my Condas install, and that is something that a new user is not going to figure out.
58:33 Michael Kennedy: That's not going to be okay. It's not going to go so well.
58:35 Chris Moffitt: So, I think that's something that needs to be solved, and I know folks are working on that. I think the other one is this whole, we talked a little bit about data visualization, and it could probably be expanded more broadly as like once you start getting into Python, there are some areas where it's hard to make that decision about which direction I should go so dataviz, is a great example. Like, I want to plot something, what do I do? Do I do a matplotlib? Do I do bokeh? Do I seaborn? All the various options and a lot of folks have talked about that.
59:06 Michael Kennedy: You come from having one choice, the little charts, insert chart dropdown to the paradox of choice of open source, right?
59:12 Chris Moffitt: Exactly, exactly. And then, I think the other one, and I know you've talked about a little bit, is the whole distribution problem. So, how do you get these Python scripts, programs, notebooks, whatever they are out there, so that other people can use? And do you have to go through some sort of .exe conversion? Do you have to have some other way to distribute the files? In my mind and vision, wouldn't it be cool that you had kind of a little launcher type thing that would go and check and download the most recent files and put a little GUI around it. But something like that needs to be in place, otherwise you're going to have a whole different type of maintenance nightmare.
59:52 Michael Kennedy: Yeah, I mean we talked about Gooey and pyinstaller, but these are sort of not super-perfect projects. They're great, they're really awesome, they exist, but they're not a general solution, right?
01:00:04 Chris Moffitt: Right. And that's the reason Excel works so well is I can put an Excel file together, and once I give you that Excel file, for the most part it's going to work because it's got all of that Microsoft Office guts behind it.
01:00:18 Michael Kennedy: Yeah, absolutely. All right, one other thing. Are you familiar with the Microsoft UserVoice, where they ask for features and they have one on Excel, one where somebody suggested that Python replace VBA as the Excel scripting language or maybe in addition to. Have you seen this?
01:00:33 Chris Moffitt: I have seen that, yes. And there's a lot of excitement, yeah.
01:00:36 Michael Kennedy: It blew up. There's like 5800 votes and just people going wow please, please, please make it happen, you know what I mean?
01:00:43 Chris Moffitt: Yeah. And I'm probably like most people when I first saw it I thought wow this is awesome wouldn't that be perfect? But the more I thought about it, I don't know if that's really a good thing or not because it probably depends on the implementation and I'm not smart enough to understand how they would do it. I think, like we talked about the power of Python is the ecosystem and so that I can use Conda or pip to install something, like am I going to to have all that available to me in Excel or would it just be?
01:01:11 Michael Kennedy: Yeah, that's really interesting. I mean you can. And I think you can embed .dll's into Excel through like .NET and then there's Python and SQL server. There may be a way but, you know as you say that I also kind of, some of the advantages we spoke about, version control like not a whole bunch of goto's but straightforward go, you know, like all those things would still be absent.
01:01:32 Chris Moffitt: Right, yes.
01:01:33 Michael Kennedy: So I'm kind of on the fence as well, I mean I'd sure rather see Python than VBA, maybe that VBA is needed there to chase them out of that area into a better place.
01:01:41 Chris Moffitt: Yeah I think so. But we'll wait and see I don't know if they're making progress with it or not.
01:01:46 Michael Kennedy: Yeah I don't know. My feeling is there's not a lot of action. There's kind of like "Wow." people really want this more than we thought. "I don't know if we want to do this." but, that's just my looking from the outside.
01:01:56 Chris Moffitt: I agree. I agree.
01:01:57 Michael Kennedy: Cool, alright. Chris it's been great to talk about, before you get out of here. Let me hit you with the final two questions. If you're going to to write some Python code, what editor do you use?
01:02:07 Chris Moffitt: Mostly Sublime or Jupyter Notebook.
01:02:09 Michael Kennedy: Okay. That's a good fit. And then notable PyPI package?
01:02:13 Chris Moffitt: Yeah we talked quite a bit about Pandas. You know I'm a huge Pandas fan. I assume most of your listeners are familiar with that but, I'll put a plug in from a data viz perspective. So I've been using Altair quite a bit recently and I know you've talked about it a little bit and I'm really starting to like that package it seems to fit well with the types of business analysis that I'm involved with. Still enjoy Seaborn for some of the more complicated statistical analysis but Altair is pretty cool and I'm hopeful that it will continue to develop and get more and more powerful and more used.
01:02:51 Michael Kennedy: Yeah for sure it looks really cool and there's some nice stuff and also shout out to Altair recipes.
01:02:56 Chris Moffitt: Yes!
01:02:56 Michael Kennedy: Yeah it's quick little things you can use. That's a nice project to even make it simpler right.
01:03:00 Chris Moffitt: Yes absolutely.
01:03:00 Michael Kennedy: Alright, well, final call to action people surely work in their company with folks who have this Excel hell problem. What do you tell 'em?
01:03:08 Chris Moffitt: Yeah, you know reach out to if you know Python and you're in a big organization, there's probably people out there that could really benefit from your knowledge and you don't necessarily need to quit your day job but reach out to those folks spread the word a little bit about Python and be good resource for them to get up to speed and understand what Python can do for them.
01:03:27 Michael Kennedy: Yeah, awesome. I agree. I think you can make a huge difference by writing software but sort of empowering these folks to get to go explore what they've got to do.
01:03:36 Chris Moffitt: Yeah, and be patient, as they learn.
01:03:39 Michael Kennedy: Yeah absolutely be patient. It's going to to take some but I'm sure it's worth it. Alright, yeah thanks for being on the show. It's great to chat with you.
01:03:45 Chris Moffitt: Thanks a lot, I appreciate it. Bye.
01:03:45 Michael Kennedy: This has been another episode of Talk Python to Me. Our guest on this episode was Chris Moffitt and it's been brought to you by Linode and Stellares. Linode is your go to hosting for whatever you're building with Python. Get four months free at talkpython.fm/linode That's L-I-N-O-D-E. Find the right job for you with Stellares, the AI powered talent agent for the top tech talent. Visit talkpython.fm/stellares to get started. That's talkpython.fm/S-T-E-L-L-A-R-E-S, Stellares Want to level up your Python? If you're just getting started try my Python Jumpstart by Building 10 apps course. Or if you're looking for something more advanced. Check out our new Async course that digs into all the different types of async programming you can do in Python. And of course if you're interested in more than one of these, be sure to check out our Everything Bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite pod catcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, the Google Play feed and /play and the direct RSS feed at /rss on /talkpython.fm This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.