Monitor performance issues & errors in your code

#173: Coming into Python from another Industry (part 1) Transcript

Recorded on Wednesday, Jul 25, 2018.

00:00 Michael Kennedy: Not everyone comes to software development and Python through four year Computer Science programs at universities. This episode highlights one alternative journey into Python. Over the course of two episodes, you will meet people who started in other industries and specializations and now make Python part of their daily experience. Some of them have used programming to power up their specialization. Others decided they'd rather just be doing programming full-time, made that switch over. This episode is part one of a two part series. Our guests this time are Derrick Chambers, Jim Taysom, Arsh Soheili, and Rob Ward. This is Talk Python to Me, Episode 173 recorded July 25th, 2018. Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host Michael Kennedy. Follow me on Twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via @talkpython. This episode is brought to you by Linode and brilliant.org. Checkout what they're offering during their segments. It really helps support the show. Derrick, Jim, Arsh, and Rob, welcome all of you to Talk Python.

01:24 Panelists: Thanks for having us. Thank you. Great to be here. Thank you.

01:27 Michael Kennedy: You're welcome. It's good to have you all here. I'm really excited about this short series of shows that I'm doing on people getting into programming and Python from other disciplines like Chemical Engineering or Accounting, things like that. So I guess we'll get started by kind of setting the stage and you know talking about where you're coming because you're not really full-time developers, you're doing other stuff and using development as kind of a super power. So I guess the first thing I want to ask you guys is what is your industry, what is your specialization? What do you study? What are you doing day to day? And I guess we'll start with Derrick.

02:00 Panelists: Well I'm a mining engineer who studies earthquakes and works for the Centers for Disease Control. So I usually get a lot of blank stares when I introduce myself that way. Essentially the Centers for Disease Control has a group that does occupational safety and health research, and I work within that group studying the mining industry.

02:15 Michael Kennedy: That sounds really interesting. What kind of background do you have to work in this? Did you go and get like a Bachelors degree in Mining and Geology?

02:25 Panelists: Yeah so I have a Bachelors degree in Mining Engineering, and then I did my masters work in Geophysics.

02:29 Michael Kennedy: Very awesome. Jim, how 'about yourself?

02:31 Panelists: I work for Radiant Solutions. We're federal contractors in the DC area. My background is as a geographer. I got my Bachelors degree in Geography with a focus in Geographic Information Systems. And more recently, I've been transitioning into a full-time developer role because of picking up Python and all the things that's led me to.

02:52 Michael Kennedy: Yeah, that's pretty awesome. You know I had a similar path. I studied something else, and I just did more and more programming till I'm just like why am I doing that other stuff? But are you still doing programming around GIS? I know Python and GIS are like a really good fit. And then you know ArcGIS and the Esri folks do a lot with Python for example.

03:09 Panelists: Yeah so I'm still very involved in the GIS community. The project that I'm on is very GIS centric although I'm doing more of like the web backend for the project right now.

03:20 Michael Kennedy: Okay, cool. Arsh, how 'about you?

03:23 Panelists: Yeah, I'm working for Market Smart. And we do like software and services for nonprofit organizations exclusively. And I was traditionally trained as an organic chemist, but I've pretty much transitioned over now. So I'm not really involved in doing really chemistry anymore. Maybe in the future I'll maybe get back to a cross section of doing something with software and chemistry. But I'm pretty much doing software engineering now fulltime and not really doing any chemistry anymore.

03:51 Michael Kennedy: That's pretty interesting. How do you feel that studying like a science, like a hard science background sort of helped you become a programmer, or did it not help at all?

04:01 Panelists: I think it helped at what I realized the skill I had as a scientist is basically the scientific method and the problem solving approach is actually exactly the same as software engineering. When I came over I realized that, the approach to solving problems, you know forming a hypothesis, researching what's out there, testing something, feedback loop. That's literally what I did as an organic chemist everyday. And so the same process is really applied to software engineering. It's really the toolset that's changed. Obviously chemistry, you're in the lab messing with chemicals trying to make stuff. And in software engineering you know you have a computer, and you're trying to build software. But the approach to how you're going about solving a problem was actually the same. And it actually really kind of helped me be able to make that transition.

04:46 Michael Kennedy: Yeah I had like I said a similar background. I feel like you know people will joke like Michael you almost got your PhD in Math, but does that seem like a waste, right? And I don't think it was. I think like those types of thought processes are really, really similar. And like you say the problem solving n'stuff like that. I think it prepares you pretty well. So Rob, how 'about yourself?

05:05 Panelists: Yeah so I got my Bachelors in Accounting and had some accounting jobs early in my career. And I've always been technically savvy and kind of along the way got exposed to SQL, and that kind of led to hey this is kind of cool and more interesting than accounting. And then from there decided I wanted to learn Python. And I will say like some of the others I am not doing accounting anymore. I'm a data analyst. So working my way towards data science and working on a Masters in Analytics.

05:38 Michael Kennedy: That's cool. So you're still going to school for you masters degree right now?

05:40 Panelists: Yeah, I am.

05:40 Michael Kennedy: How far along are you?

05:42 Panelists: Right, I'm about a year and a half in, and I've got a year left. So I'm just taking one class at a time just to keep it simple.

05:48 Michael Kennedy: Yeah, no reason to go full-time.

05:50 Panelists: Right, yeah, kids n'all that. But yeah from an accounting standpoint, I did actually use Python when I was still doing accounting work a few years ago which you know we can get into more a little later.

06:02 Michael Kennedy: Yeah, sounds good. So I guess maybe go back in reverse order with you, Rob. If you're in accounting, you know Excel's programmable, right? People do ridiculous stuff. I think I've seen a flight simulator created in Excel. Have you seen this?

06:16 Panelists: I have not seen that.

06:16 Michael Kennedy: There's some pretty intense stuff done in Excel. So that's probably like the natural place to do automation and stuff like that. So why did you learn Python and programming?

06:25 Panelists: Years ago, I had tried to learn C. And that failed miserably.

06:31 Michael Kennedy: Programming's hard, man.

06:33 Panelists: Yeah, I was like what is this stuff? And so then I kind of decided again you know data science was starting to become a thing, and I knew I wanted more of a business intelligence or data analyst type of role. I was doing accounting, and not to namedrop, but my brother works for GitHub. So he's like you should learn Python. I'm like, okay I'll learn Python. So you know I think it was about, it was five years ago, and I just started on I think it was that Learn Python the Hard Way which I know can be controversial with some people. But it was a good starter, and then also took some Coursera courses. They have a lot of Python courses. Got through about half of it until I got to object oriented programming and was a little dumbfounded at the time. And I was able to see the value and see that there were things that could be applied to things I was doing everyday in my accounting work which I don't know if you want to get into that right now or a little later?

07:30 Michael Kennedy: Yeah, let's get into it maybe a little bit later. But there's definitely some really cool like automation of sort of Excel and processing of those files. There's lots of cool stuff in there, yeah for sure. But we'll dig into that. So Arsh, why did you get into programming and Python? You were in the lab. Your hands have already got like gloves on. It's hard to type.

07:49 Panelists: Yeah, yeah, I know. I mean honestly I went down a path where I was trying to do in chemistry, but I got into pharmaceuticals, and I started not liking the big companies and the pharmaceuticals. I thought about academics, and it was not looking good. There was very few jobs. Funding is kind of bad. And I've always been interested in technology. And I was doing some web development, not really much programming. But I was doing that stuff. I was getting more and more, and then eventually I kind of just, I don't know, I've always been someone that kind of takes chances. And so I kind of just was like well let me try to see if I can you know work my way into this field. And I got there, and I kind of pulled myself up and learned and made it through. So it was really kind of of been the industry of chemistry, at least for me, didn't really, kind of failed me, at least for me. But actually the Excel thing you were talking about. I will say something. We actually had an entire application here we built at Market Smart out of this Excel. We had this guy was working for one of these nonprofits who had this crazy Excel file. Nonprofits would send him his data, and he would just have these basically these macros he had written. Some of it would take like a day to run, and he finally came to us, and we talked to him, he's like this is awesome. People are sending me this stuff. They love this thing, but it takes me so long to run. Can you make this software? We're like, yes. That's what it should be for. So we actually built an entire application. It's called the fundraisingreportcard.com that came born literally out of Excel files.

09:21 Michael Kennedy: That's awesome. Yeah, I think that so much programming happens in Excel honestly. So one thing I wanted to ask you coming on that path. I know a lot of people talk about imposter syndrome not coming with a computer science degree. Was that a problem for you, or were you okay with, do you feel like do I belong in this programming world? Like how do I sort of prove that I've earned my chops here?

09:43 Panelists: Yeah, yeah, no I think that's real. I mean I think people feel it. I've definitely felt it. I think the first year was really tough because there was a lot of nomenclature, a lot of words I just didn't understand. And literally there was a moment. Even though it was an accumulation, I literally remember a moment after a year where one day it just kind of like clicked, and I was like okay I understand. I don't know everything, but now I can understand. I can read an article. I can go through something and comprehend and work my way through it where before you know I would get lost sometimes. Just completely not understand concepts. So I think it's real, and you got to read, read, read. I read constantly, and you got to just always train yourself and always be training. And I think it's so important that you get over that. To a level where I think I got, and you got to work through it.

10:35 Michael Kennedy: Yeah I agree with that. I feel like once you get to the point where you've learned enough new things repeatedly, you're like there's a lot of stuff I don't know, but if I need to know it, I can learn it really quickly. Once you kind of get that skill and experience, I think the imposter syndrome really fades pretty quickly. If you don't like continuously learning, this is not the industry that's going to work out so well.

10:54 Panelists: Yeah, yeah for sure. I think learning, and things change so much in software engineering I mean almost on a weekly basis that you better be ready to be learning all the new stuff and at least keep up to date with it.

11:07 Michael Kennedy: Yeah, for sure. Jim, I think probably your introduction may be the simplest. If you're in GIS, you almost have to learn Python to do some of that stuff, right? But what was the motivation for you?

11:17 Panelists: So for me, the first couple years of doing GIS, I didn't really need to use Python at all. I ended up in some mandatory training that I didn't need.

11:26 Michael Kennedy: Interesting.

11:27 Panelists: And so I spent hours and hours and hours everyday just completely bored out of my mind. So I started going through all of the ArcGIS documentation on how to use ArcPy and started teaching myself Python enough that within my realm I was willing to put in on my resume. And then there was another position at work that needed someone who knew Python because they had this old VBA application that was written for ArcGIS that they needed to replace.

11:56 Michael Kennedy: Because it's VB 6.

11:59 Panelists: I don't even remember what it was. They needed someone that could take what they had and recreate it in Python because Esri had been telegraphing for awhile that they were going to be dropping VB support. And so it was more than I was ready for, but they couldn't find anyone else. And so I just told my boss, I was like I can do the project, you just have to give me at least a little bit of time every week, so I can do some at home Python training and figure this out. And I did it and loved it. And started doing more of it.

12:30 Michael Kennedy: That's really cool. It's cool how you were able to sort of laterally move in your company because of it too. It's not great to be stuck in mandatory training that's super boring, but you know at least you made good use of it.

12:39 Panelists: Yes, I used my time well.

12:41 Michael Kennedy: Awesome, Derrick, how 'about you? What was the motivation for getting into programming?

12:45 Panelists: Yeah, I first got into programming when I started graduate school. And I had an advisor that was really hands off. And so his charge was kind of just go figure out something new, and novel, and useful, and do it. So I ended up changing my research topic I think three or four times, but I was finding that it was very hard to use some of the existing tools that I'd known to build sort of a larger system that would do something interesting. And so I run across this Python package called ObsPy which implements a lot of format parsers and data classes for handling seismology data. And so once I found that, I got into Python. And I was able to build this bigger system that actually looked at implementing a newer method for detecting very small seismic events.

13:31 Michael Kennedy: Like micro earthquake type things?

13:33 Panelists: Yeah, exactly. The typical seismicity associated with mining is pretty small. So you have to have some clever methods for detecting it in most cases.

13:41 Michael Kennedy: Nice. You said that you're now a contributor to that project, right?

13:44 Panelists: That's right, yeah. I'm a maintainer for ObsPy. I really enjoy when I can find some spare time you know making improvements or suggesting different things we can do in the future.

13:53 Michael Kennedy: Yeah, that's awesome. So one of my feelings about sort of people becoming programmers and having these skills is it's great to just have like a computer science degree, you can solve problems and program, n'stuff. But the real value of many more people having programing skills and Python skills is like it's kind of super power for the thing that you do do. So I wanted to ask you all sort of how you saw that in your industry. I know not all of you are still doing that day to day, but you know it sounds like all of you sort of got started in programming while you were still in that industry. So Rob, what do you think about this power up idea?

14:34 Panelists: Yes, I can attest to that first hand because in my last job that was truly an accounting job, I was the accounting manager for a small company in my hometown, and they were a property manager and had over 500 homes or you know mortgages that I had to manage. So every month, I was sending out these mortgage payments. And I have to put them in our accounting package in our software. I found out there was an easy way, or there was a way to import the data, but there was no easy way to create that in Excel. Or at least not that I could find. I didn't want to manually type these out. So my first project in Python, you could say, was I figured out how to do all the loan and amortization payments and the principal and interest and make 'em basically batch load this entire file for the month. And that sped up my workflow, you know made paying the bills so much easier on a daily basis.

15:38 Michael Kennedy: That's great. There's so many examples I'm sure that are out there like that. Like even in my stuff, you know obviously I have to write code to run websites n'things. And I've done other types of programming for many years, but like one example was I have all of these videos for the courses, and I had to import them to when I load a new course. And it doesn't sound like a big deal, but you've got to set the time and like the duration. There's just tons of stuff. And I was even thinking of hiring somebody to like do that. I'm like wait a minute. Why don't I just write a program? And now it's like a five second command line, type it, and boom, it's done. It's just so nice when you have those realizations like why am I doing this by hand? Surely this can be automated right?

16:21 Panelists: That was the impetus for me. Everything was doing it by hand, and I had played around with, you know maybe people have heard of AutoHotkey. You know I'd made macros with AutoHotkey before years ago in Windows. So it's kind of a similar idea like just automating simple tasks that you have to do anyway. But it might as well just speed it up.

16:44 Michael Kennedy: Yeah, I can see somebody actually deciding I'm going to go do that from the coffee shop and taking every other Friday off as their code runs and you know does the work they did for half a day. So Jim, how 'about you with this power up idea?

16:59 Panelists: So it was actually really, really amazing some of the things that I was able to do like really early on in learning Python. So a lot of the GIS workflows are really intensive, and you're doing a lot of steps over and over. There's been a couple of things where I've done where you might be taking and doing like four things, but you do 'em 1,000 times in a row. And without something like Python, that's almost impossible to do. And my favorite example is there was a coworker who was working on trying to do some correlations between some various datasets, and his manual workflow that he was using would take him between three and four weeks. So he was spending 120 to 160 hours manually just going through the data.

17:44 Michael Kennedy: Not only is that inefficient, but that's got to be terribly ungratifying type of work.

17:48 Panelists: Oh, just terrible 'cause it was just like click, click, click, wait five minutes. Click, click, click, wait five minutes for weeks on end. And when I heard about that, I was like, fully explain what you're trying to accomplish, and I'll work something out. And a week later, I had just a little script written in Python that did his entire workflow, but instead of being able to correlate like the top three correlations, I was able to correlate all of the datasets, and the entire script would run in eight minutes. So I replaced a month's worth of work with an eight minute job, and it only took me a week to do it.

18:32 Michael Kennedy: That's incredible. What was the reaction of this guy?

18:35 Panelists: He was just completely blown away because like he went from being able to do this type of analysis 12 times a year, to being able to do it 12 times before lunch.

18:46 Michael Kennedy: Yeah, that's amazing. I worked at companies before where there were a lot of real scientists doing scientific work, and it was this kind of thing. It was like really manual involving lots of different sort of software pieces like Excel, MATLAB, or stuff glued together, and they just did the tedious little steps thinking like this is how you do it. Just you have to do it this way. And me and a couple of other folks, we sort of systematically started automating like the worst of these things like you're describing here. And every time, something amazing like that came out. It was like you're going to automate us out of a job. It was like this was my job, and now I don't do it anymore. What am I going to do? But of course the next week, they were doing higher level stuff or doing it in more depth or whatever. Those people never got automated out of a job as far as I could tell.

19:32 Panelists: There's always more problems that could be solved once you free up some time.

19:38 Michael Kennedy: This portion of Talk Python to Me is brought to you by Linode. Are you looking for bulletproof hosting that's fast, simple, and incredibly affordable? Look past that bookstore and checkout Linode at talkpython.fm/Linode. That's L, I, N, O, D, E. Plans starts at just $5 a month for a dedicated server with a gig of RAM. They have 10 data centers across the globe. So no matter where you are, there's a data center near you. Whether you want to run your Python web app, host a private Git server or file server, you'll get native SSDs on all of the machines, a newly upgraded 200 gigabit network, 24/7 friendly support even on holidays, and a seven day money back guarantee. Do you need a little help with your infrastructure? They even offer professional services to help you get started with architecture, migrations, and more. Get a dedicated server for free for the next four months. Just visit talkpython.fm/linode. Derrick, power up idea?

20:35 Panelists: Yeah, I think there's kind of three components to this. The first one is scale. You can usually do and process a lot bigger datasets with Python or some other programming language than you would normally with Excel or some other tools you might be using. The other one is maintainability. You know you can put Python scripts into proper version control like Git, see what changes and when it changes. Unlike Excel, where you might have people that are working on a common workbook and change different things, and you don't know who changed what. And then also increasing the level of automation. Just like Jim had mentioned, I've had a few experiences. Nothing quite as dramatic, but you know coworkers have come to me with a really manual process and asked for help automating it. And we've been able to do that in Python and save a lot of time.

21:17 Michael Kennedy: That's cool. Are you known as like the guy that you can come with manual stuff, and you can fix it?

21:22 Panelists: I'm trying not to let that reputation spread, but yeah, a little bit.

21:26 Michael Kennedy: Oh this podcast won't help, I'm sure. Arsh, how 'about you?

21:31 Panelists: An example we have, and I think yeah it's very important to kind of get everybody to understand that a little bit of programming knowledge can really make you very powerful. We have account managers who, similar story, were doing some manual stuff that some campaign data that wasn't available via API. So they were literally going to every account and like copy and pasting into an Excel. And of course, that made it tedious. So eventually they came to us. And I kind of with somebody helped them write a web scrapper with Selenium to just basically scrape that data. And so something that took her basically eight hours or a whole day, or you know would just bog her down. We were able to do it, like basically the same thing in five minutes which they were very thankful. And she is actually somebody who is starting to learn Python. And I think that's the thing like for people to get into using programming to make themselves super powerful, they do need somebody. Somebody to really kind of show them the benefits of it, and kind of get 'em in the direction 'cause I think that's not the first thing they're going to think about 'cause it's not natural for them, and also they don't really know how to get into it. And so I think it really to kind of make that happen, you kind of do need somebody to kind of show them the way, and hopefully that kind of gets 'em excited enough and realize the benefits that they can start to go down that path and realize yeah I could be an account manager. I could be a product manager. I could be kind of anything, but with a little bit of programming, I could be a really powerful account manager or powerful product manger, or whatever field you're in.

22:59 Michael Kennedy: I think that's a really great point about sort of seeing the light and realizing that it's not all that complicated. Because to me before I got into software development, programming computers seemed really like a deep, complicated thing that took years and years and years of studying, and you had to be the right type of person to have that kind of thinking, just that knowledge, right? And while there is some types of software that probably is that way like writing a kernel for a new OS that you invented probably is like that. But a lot of the stuff that we're talking about. What most people need is actually not that bad, but you kind of have to see the light, right? You're like wait, you did that in 15 minutes, and now my horrible job I did for eight hours once a week is now five minutes and automatic. Like maybe I could do that.

23:44 Panelists: Exactly, and I think, and I try to tell everybody in our company that it's like just because you do a little bit of programming, or you're going to use some tool doesn't mean you're a programmer, doesn't mean you're going to be an engineer. That's not the point of it. It's just a toolset that's going to help you do whatever is your job better. So you don't need to think of that. That I have to be a programmer, or I have to be a software engineer to do it. It's just a toolset that you can learn just like any other tool set.

24:10 Michael Kennedy: Yeah, maybe you're into psychology, and you don't want to be a programmer. You want to be a psychologist. Well I'm sure there's awesome stuff you could automate and discover and make you a better psychologist, right? You don't have to give that up.

24:21 Panelists: Exactly.

24:22 Michael Kennedy: All right, so let me ask Derrick and Jim this one 'cause you guys are still doing sort of what you studied day to day. How has learning programming changed what you do day to day? Has it changed your job or has it not really?

24:35 Panelists: So I guess I'll start. I would say that it's completely changed almost everything that I do because ArcGIS is very good at what it is, but it's also very expensive, and so a lot of places don't necessarily want to use it, and it's also quite slow. And so there's a lot of things that I can do in Python that are a lot faster. But one of the things that you end up missing is that interface that ArcGIS provides. And so now I'm doing a whole lot of web programming and backend development to like build up a core system, but then you still need something that like can be provided to a UI so that when my teammate makes a really beautiful UI, it can consume everything that we've processed on the back end.

25:21 Michael Kennedy: Yeah, that makes a lot of sense. I think that's really great. Derrick?

25:25 Panelists: Yeah, so a lot of people that study seismology are already kind of halfway programmers. You can't really do much in seismology without having some programming knowledge. But a lot of tools that were common there were things like Fortran and scripting things together or stitching them together with Bash, that sort of thing. So I think Python has given me maybe an advantage over a lot of those old workflows in that I can prototype things faster, maybe try out a new research idea much faster than I would if I wasn't using Python.

25:55 Michael Kennedy: Yeah, there's probably a lot of MATLAB, those types of things going on there as well there.

25:59 Panelists: Right, yeah.

26:01 Michael Kennedy: So what type of packages and toolsets, do you guys use? Is it like Anaconda? Is it Django? Derrick, keeping going with you first I guess.

26:11 Panelists: Yeah, I mostly use Anaconda. That's been a really great system for us just to avoid a lot of the compilation especially when they have to work on Windows systems.

26:19 Michael Kennedy: Yeah, a lot of the scientific tools can be a pain to install, so getting them pre-compiled and just downloading them.

26:24 Panelists: Yeah, very much so.

26:26 Michael Kennedy: VC var all bat or whatever that is.

26:30 Panelists: Yeah, I've seen that a few times.

26:31 Michael Kennedy: Yeah, for sure. Jim, how 'about you?

26:34 Panelists: So at first when I was just learning, really the only thing I had available was the Python installation that came with ArcGIS. So I had the core library and ArcPy, and nothing else for probably the first like two years of using Python. And then when I got Anaconda, it was fantastic because I had all these great extra libraries that I could use. And now at this point, it's a whole lot of web programming. So like right now, I'm working with Sanic. We're using Postgres, and the PostGIS extension. So because of that we're using Asyncpg to connect to Postgres. We're using Docker and Kafka, and all these other thing. So I'm using PyKafka to communicate with Kafka. And so it's gotten to be very, very diverse compared to where I started with Windows and ArcGIS.

27:22 Michael Kennedy: That is a serious change there. That's amazing. How do you like Sanic?

27:27 Panelists: So for the most part, I really do like Sanic. It's very responsive. Every once in awhile when I get down into a rabbit hole, and there's just not nearly as many things on Stack Overflow of like how do I fix this problem. And I'm still wrapping my head around the whole async thing. So it's definitely different, but I do enjoy it. It's fast.

27:48 Michael Kennedy: That's cool. Yeah I've definitely wanted to do more with Sanic. I actually have a little bit of that feeling. Like well not necessarily that there'll be not enough answers on Stack Overflow, but there's also all the other support stuff. Like here's an example of how you do this, or here's a library that you plugin that fills this gap and things like that. But it definitely looks like a cool framework.

28:10 Panelists: There's definitely some of that missing. And so there's been times where it's like I really wish I had just started this project using Flask instead of Sanic, but...

28:18 Michael Kennedy: But now, you could have used Flask.

28:19 Panelists: We had very good reasons for choosing Sanic at the beginning.

28:22 Michael Kennedy: Yeah absolutely. Arsh, how 'about you?

28:24 Panelists: Yeah, as far as toolset, we're mainly doing some of the typical Python web stuff. So we use Flask front end. We use Angular. We recently got into like Airflow which I don't know if everyone's familiar, but it's from Airbnb.

28:38 Michael Kennedy: Yeah, it's like a data pipeline thing, right?

28:40 Panelists: Yeah, it's data pipeline. So we originally doing a lot of our data pipeline, still is, was with Pentaho, and I really hate that software. It's good at some stuff, but it's really annoying. I hate using it. So I started looking at Airflow. And that's really nice because it's Python. We know Python. It's code, and it's just a lot easier to kind of get it up and running. That also does suffer a little bit right now from, I mean it has decent documentation, but there's still a lot of stuff there that it's kind of hard to find when you run into some issues. But it's really growing, and I think Google just brought out a product which the underlying platform of it is using Airflow. So I think you know that Airflow is probably here to stay, and it's going to probably grow. And then we're also using Falcon framework for like an API that we've set up.

29:26 Michael Kennedy: Yeah Falcon, I had those guys on the show awhile ago. It's like a really low level, low latency API framework, right?

29:33 Panelists: Yeah, I mean I set it up maybe like two years ago when maybe I still wasn't little bit understanding of all the frameworks at the time. But I'm kind of glad I did actually 'cause it was strictly for an API for our company. And it is really fast, and it's held up. Although I probably if I had to do it over again, I'd probably use one of the other frameworks that's built on top of Falcon like is it Hugs?

29:56 Michael Kennedy: Yeah, Hug that's one of 'em yeah.

29:58 Panelists: Yeah, yeah, probably would've used one of those, but still it's a really good, fast framework for where we want it.

30:03 Michael Kennedy: Okay, yeah, that sounds really interesting. I haven't really talked about Airflow on the show. And maybe give people a quick example of like what problem or what job you're accomplishing with Airflow. Describe like why you're using it.

30:16 Panelists: Yeah, so I mean basically it's kind of in the realm, it's not technically an ETL which is extract, transform, load. So in a lot of companies often software engineering, you get data, but a lot of times you need to get data from one place and kind of transform it and do some things to it, and then bring it over to another place usually to another database. And so that's where like an ETL comes in. And so those are usually workflows that you want to set up that maybe go through many steps. And there are different sort of like softwares out there to accomplish this. Airflow is technically just like a scheduler of these flows, but it's got a nice UI to it. You can kind of put tasks you know one after another. It uses directed cyclic graphs. And then the thing is it's code. So because you write Python code for those tasks, you bring a lot of programmability where a lot of the other ETL software if you use it, they're not like that. They kind of come with prepackaged sort of modules, and you kind of just have to use those prepackaged modules. And then it gets transported. Like Pentaho is really Java XML. So it's taking XML code, and it's eventually you know it's run by Java. So it's not you know. Yeah, I hate it. I love Airflow so much better. Although there are differences, and there are some things that are sort of like Pentaho is better than at Airflow. I mean right now just to give a use case which also bring in async is that one thing we're using Airflow mainly for right now is that some of our stuff is done in some other software, some email platforms. And so we need a lot of data through their API. So instead of at real time getting a data API, we're like making tasks where we hit those APIs, get all that data, transform it to whatever you want, and then put it in our database. So then when we run our application, we don't need to rely on that API anymore. We just pull it from that database. And we actually ended up using asyncio and aiohttp to speed up those tasks which came really in handy and was a great example of using async at that point because we're making thousands of API requests. And instead of like a task originally we were running was taking two and a half hours, we got it down to like 10 or 15 minutes.

32:31 Michael Kennedy: That's really awesome. Yeah, some of that aiohttp stuff, the client side stuff is really powerful.

32:36 Panelists: Yeah, so we're basically pulling all this stuff, pre-fetching all this stuff from APIs from our different softwares that we work with and then transforming it into the forms that we need and dumping it in our database so then that way we don't have to rely on them when we actually run our application.

32:50 Michael Kennedy: That sounds like a really cool use of Airflow. Nice. This portion of Talk Python to Me is brought to you by brilliant.org. Many of you have come to software development and data science through pasts that did not include a full on computer science or mathematic degree. Yet in our technical field you may find you need to learn exactly these topics. You could go back to university, but then again this is the 21st century, and we do have the internet. Why not take some engaging online courses to quickly get just the skills that you need. That's where brilliant.org comes in. They believe that effective learning is active. So master the concepts you need by solving fun, challenging problems yourself. Get started today. Just visit talkpython.fm/brilliant and signup for free. And don't wait either. If you decide to upgrade to a paid account for guided courses and more practice exercises, the first 200 people that sign up from Talk Python will get an extra 20% off an annual premium subscription. That's talkpython.fm/brilliant. Rob, how 'about you, toolset?

33:53 Panelists: Sounds like I need to learn Airflow. Yeah I primarily use Anaconda, and all the associated or many of the associated packages that come with it. So Pandas I'm using everyday, and I love Jupyter Notebooks or Jupyter Lab now. I have transitioned over to that. That's mainly what I'm using at work. And then on a personal level, you know still of course love Requests and Beautiful Soup. But that's personal projects.

34:21 Michael Kennedy: The whole web is your API. That's sweet.

34:24 Panelists: Yes, exactly.

34:25 Michael Kennedy: So you switched from Jupyter to Jupyter Lab. Why was that?

34:29 Panelists: That's kind of where Jupyter's headed from everything I've read. Eventually I think they're going to sunset Jupyter Notebooks, and it's going to move exclusively to Jupyter Lab. I don't know the timeline. It's probably a few years off still. But they had released the beta. I believe it was December or January. I can't remember for sure, but I jumped on board right away, and it works. I haven't had any trouble. There's a few features that are missing from Jupyter Lab still compared to Jupyter Notebooks. But the nice thing is if you have the latest version of Jupyter installed, you can run as long as the Jupyter sever is running, you can run both. It doesn't care.

35:11 Michael Kennedy: It's just a UI front to the server.

35:13 Panelists: Yeah, it's just a different UI than the Jupyter Notebooks'. I mean it'll look familiar, but then there's like the file browser is built in right there and so is command palette. So it's all just built in right there. The nicest thing about it is you can have multiple paths, so you can compare notebooks side by side. You can actually copy cells or drag cells from one notebook to the other.

35:39 Michael Kennedy: Oh, wow.

35:42 Panelists: Which is really nice for reproducibility 'cause you know sometimes I might get a request at work like hey can you you know run this report but change this. I actually had that happen this morning. It's basically just you know I had to drop in a lot of the same code, but then make a few small changes. So literally I just copy over three or four of the cells an drag 'em to the new notebook and run it again. It's really nice.

36:06 Michael Kennedy: It's cool. It's a little bit like an IDE version of Jupyter, right?

36:10 Panelists: Yeah, that's actually a really good description for it, yeah. Yeah, we started using that a little bit ourselves. We were also using Pandas and Jupyter Notebooks. We started using Jupyter Lab. And actually talk about empowering. This is another example. One of our colleagues who's working on this app. He would get requests of these files that they would send him that are like over a million. And so Excel actually has real trouble when you get to really large files, and he had to segment them. So it would crash constantly. So he kept sending it to me being like hey can you take care of this? So finally, I was like you know what I'm going to teach you Jupyter. It's not that hard. And I did teach him. And once every week, he comes to me and says, "Arsh, thank you so much for teaching me this. Now I can just take care of this stuff, and I feel more empowered." So that's another example of that empowerment. And Jupyter Lab, Notebook whichever one you use like really makes that really easy actually.

37:02 Michael Kennedy: I wonder if Jupyter for those types of folks almost is like a Trojan Horse for teaching them programming. Look you're not a programmer. You're not writing source code. You've got this thing, and it does computation, and you type in it. You know what I mean. Of course it's still Python n'stuff, but I wonder if it's perceived differently by people who come at it from that perspective fresh.

37:23 Panelists: Yeah, I think it could 'cause the interface. You know I guess there're some web versions or if you run it locally, you've got that kind of web looking interface. You know you don't have to pull up a dedicated editor that you have to install and things like that. So yeah, I think it probably is a good, and it deals with data. And most people in some area have to deal with manipulating data. Once in Excel, you start getting large files or having to do any kind of manipulation of sorting or any kind of complex thing in Excel, and it starts taking two minutes to do that, and you're sitting there with the spinner. Then you're like okay I've got to find something else.

37:59 Michael Kennedy: There's got to be a better way.

38:00 Panelists: I think on the flip side too though, a lot of people can use Jupyter as kind of a crutch and build some really big systems that really should be put into proper packages and things. So it's kind of hard to know once you've made the transition from exploratory data analysis into something that should be more software engineered.

38:18 Michael Kennedy: That's a really good point.

38:19 Panelists: I'm probably guilty of that myself. There's probably cases where I should just be putting them into a script and not running it in the Jupyter Notebook itself. But time is always... Yeah, it's hard to know though. It just sort of creeps up on ya. Yeah, definitely.

38:34 Michael Kennedy: Yeah, that's a really interesting that you can take it too far. Like eventually you might want to break out of it. To me doing traditional web development and database stuff and so on, and that's usually across lots of files, it's like a different way of programming then people do with exploratory data. And it took me for a little while to get my head around what is the real value? Like why do people love Jupyter and these notebooks so much? But when I saw people really doing the exploratory data stuff with it, oh this really is pretty amazing for that type of work.

39:05 Panelists: I agree. I didn't get Jupyter. I kept hearing people talking about it, and it was really only probably about a year and a half ago that I saw someone demonstrate it and then it clicked. And like oh that makes a lot of sense now. And it really has changed you know the way I can do a data analysis. I hardly ever open Excel anymore. I kind of hate it now.

39:27 Michael Kennedy: Yeah, you've got to be pretty good with Pandas for that, but that's awesome.

39:29 Panelists: Yeah, yeah for sure. I mean Pandas, but I'll echo, you know you get with large files, you know Excel just kind of starts crashing especially if you throw multiple VLOOKUPs on a really large dataset. But Pandas, unless you're dealing with gigabytes of data, it is really, really fast.

39:52 Michael Kennedy: Yeah, that's awesome. All right, so I want to sort of do a little bit of forward looking stuff with you guys here maybe by starting looking backwards. So I guess if you have people coming into your industry and they don't have any programming background what advice would you give them? Let's start with you, Derrick.

40:11 Panelists: Okay well, I think the first one. A lot of people that I've told about programming coming from my background have said things like, "Well I haven't had a class in that, or I don't know about that because no one has, I just don't have any experience with it." And so I think getting over this mental block that you have to have taken a university or a high school course on something in order to learn it is a real barrier for some people. So the first one would be just that you don't have to be taught everything formally. And the second one would be to be a little bit humble and try to be comfortable being uncomfortable. And what I mean by that is when you try something new especially when there's other people around you that are really good at it, you're kind of putting yourself in a position of inferiority, right? It can be kind of intimidating, maybe it's not the best feeling. But if you can really just kind of take it step by step and learn what you can in those situations I think you can really grow professionally and as a person as well.

41:04 Michael Kennedy: Yeah that's great advice especially in today's age. You know like if you could either take a semester long course and disrupt your life Monday, Wednesday, Friday for two hours plus homework and exams, or you could take a weekend and an online course and get pretty close honestly. You know even free stuff on YouTube, you may be able to sort of teach yourself. So it's definitely a different perspective I think that people probably got to wrap their minds around. Yeah cool.

41:31 Panelists: I think too that if you have a specific problem that you're trying to solve. That really is a good way to learn. You just lean enough to solve a problem, and then you get a harder problem, and you kind of build your knowledge of programming as you solve the problems you have to address anyway.

41:44 Michael Kennedy: Yeah, 'cause you can't learn it all at once. It just becomes overwhelming. So if you can focus it like I need to do these three things. How do I do those three things? Then it becomes way easier.

41:53 Panelists: I'll echo that you know the statement about solving a specific problem That was my advice too. I set out to learn programming just because, but I didn't have a problem to solve at first. And I didn't really make a lot of progress until I did have a problem to solve. You know once I was doing that accounting job, and it's like oh I can automate this step, and that's where I made progress because I was figuring out how to do a specific task. And then I just, like he said, you move on from there and solve bigger problems and take it step by step. Don't feel like you have to learn everything all at once 'cause you're not going to. And it takes time to put things to come automatically. I mean there's things that I do now with Python that you know just come naturally, but a year or two years ago I would've had to look it up on Stack Overflow again you know for the hundredth time.

42:41 Michael Kennedy: It's hard to realize that things like that will become automatic, and you'll sort of go on autopilot like driving a car. It's like there's so many inputs and things and stuff you got to juggle, but eventually most of it becomes sort of just the stuff you do automatically.

42:56 Panelists: Yeah, even Pandas, I mean I've only been using Pandas for just over a year, but because I've been using it everyday for almost a year, it's natural now. And I know you know the documentation's there. I know how to figure out how to do something even if I don't know how to do it off the top of my head.

43:13 Michael Kennedy: Yeah, that's excellent. Jim, what's your advice?

43:16 Panelists: So I would say for anyone who's working with ArcGIS, I would start by just looking at what your daily workflow is whatever you're doing day to day. Go look at Arc's help docs because they've got for every function that you can use, they've got a Python example of how you do it in Python. And it's not just not the one line of how to use the function. It's like a short little script of beginning to end how you would use this. And then look at your workflow, start grabbing all those pieces of scripts, customize it to whatever you need, and then all of a sudden you've replaced an entire workflow with a Python script that's your own work. And after you've done that, learn a little bit more about just like generic Python. And then even though this isn't necessarily Python related, I would highly recommend learning PostGIS and SQL because there's a lot of things that you can do in PostGIS that just aren't readily easy to do in ArcGIS.

44:16 Michael Kennedy: Right, have a little bit of database knowledge possibly. Yeah, great, Arsh?

44:20 Panelists: I should probably have a lot to say on this topic, but I'll try to keep it to some main points. I think one of the two main points is it's a combination of learning and doing. And I think you've got to balance it in the sense that some people might really everything on skills, and some people might just rush in and just do stuff but not really even pick up a book and learn some fundamentals. And I think you've got to kind of do both. You've got to pick up the book and lean the fundamentals, but most of the time you won't truly understand all of it until you actually start to doing some stuff. So you got to kind of do that back n'forth, and a lot of times decide to build the application from Flask while you're reading your Python book. And then after you finished up with Flask, you probably want to go back and read that Python because then a lot of the things in there will start to click better for you after you've had the experience. And so you've got to kind of go back n'forth.

45:10 Michael Kennedy: Almost an iterative learning style, right?

45:12 Panelists: Yeah, it's like yeah because I know a lot of times there's stuff I read I thought I understood it, or you think you understand it, but you don't really understand it. So you actually are somewhere stuck, and you're trying to do something. You're like, oh. And then you refer to it, and you're like, oh now I understand what they were talking about in that book. And so going back and rereading you know some of the stuff that you're trying. So try to learn the fundamentals, but not let that from stop you from trying to build stuff. Because as I said as anyone said here, especially if you have a specific problem or just decide on some project and just start working on it, and then try to read and work sort of concurrently as best at you can. I think that will set you on a good path to try to learn.

45:54 Michael Kennedy: Yeah, that's great. So the last one I guess is you know think about the industry that you came from originally, where do you see programming and things like artificial intelligence and just people having these programming skills increasingly over time pushing the industry and changing the way that people do their jobs there? Derrick, let's just start with you.

46:15 Panelists: Okay I've seen a lot of hype around artificial intelligence and machine learning and there's some bigger companies that are doing some things that are pretty significant with it. But I think there has to be a lot of footwork on the ground level. Maybe the average engineer is going to have to gain some more data literacy before you can get data into a format or a sensible database in order to be able to do these bigger analyses on them that really can add value. And so I see a lot of emphasis on data literacy, maybe simple scripting becoming part of the curriculum for a lot of engineering disciplines. And then I'd also like to see a lot more open source being used. A lot of companies will pay a lot of money for I would even say subpar proprietary packages that a lot of open source libraries can do for free if you just knew that they existed and knew how to install them.

47:04 Michael Kennedy: Yeah, that's a really good point about the open source stuff. And I think open source is definitely going to drive a lot of change for a lot of areas. Jim?

47:12 Panelists: In the GIS realm, one of the things that we've always had to deal with is the fact that we always have more data than we can actually handle. And because like the whole world is our subject. And so a lot of times we'll scale our data back to a lower resolution so that we can just even begin to do the processing. So as new things come out that lets us work with bigger and bigger datasets, we already have those datasets available. We've got petabytes of satellite imagery that just for the most part goes relatively unprocessed just because there's nothing that can process that much data all that easily.

47:50 Michael Kennedy: Yeah, and as the processing power gets better and people are more capable to take advantage of it, you know larger problems will be solved, right?

47:57 Panelists: Yep.

47:57 Michael Kennedy: Very cool, Arsh?

47:58 Panelists: Yeah actually, you know even though I'm not in organic chemistry anymore, I still read a few articles here and there about what's going on. And actually one that I saw which I was not surprised that things were heading in this direction was that at MIT, a group at MIT had come up with an artificial intelligence that was significantly better at predicting medicinal chemistry compounds for clinical trial type of stuff. So basically you know medicinal chemists if people don't know chemistry are the people who sort of investigate the type of compounds that eventually will become drugs. And that's a very slow, costly process of iteratively making compounds. Thousands of compounds to even see. Whereas eventually there is definitely a solution here for software to really step in and make that process faster and cheaper. And this MIT team came up with this article about it being vastly improved as far as predicting better compounds. And I see that's really where the direction is going. And there's already lots of really robotic automation happening in organic chemistry when I was there. And I'm sure that's expanded. So along with artificial intelligent, you can kind of see where now computers can fairly accurately design drugs for diseases and push it through the pipeline which you know hopefully should reduce, we would hope to reduce the cost of drugs and be able to discover drugs faster. I mean that's ultimately the goal. Right now the process takes anywhere from 10 to 20 years. I mean if we could discover a drug for a disease in two years, that would be a lot better.

49:25 Michael Kennedy: Yeah, that would be great. Maybe even lower the price of...

49:29 Panelists: Maybe.

49:30 Michael Kennedy: Probably not, just raise the profits probably. But still one thing I'm really surprised about is how much programming and automation are affecting medicine actually 'cause that seems like a really human, hands on sort of thing. But we've got AI doing the job of radiologists in cancer detection. Google did a thing on studying readmissions and like finding out that people were going to have to come back to the hospital sooner, I believe this example. It's really interesting how these sort of good doctor, medical jobs are almost under threat from software.

50:05 Panelists: Yeah I think it's a case where a lot of these fields like we talk medicinal chemists, I mean the lower ones, you had the old medicinal chemist guy who had been making compounds for you know 20 years, and he just knew which was was going to be the right compound, right. So that had been basically in operation for a really long time because there really wasn't an alternative. There wasn't the data to be able to crunch data, analyze it, and do the type of things predicatively that we do now. That we are starting to be able to do now just wasn't available. So you kind of had to rely on the people who had just experience in the field and just sort of their experience guiding it. But now I think we have more data. We can analyze it. We can predict it. And that's going to start to replace those unfortunately on some level replace those old guys with the feel of yeah I know this is going to be the right thing, or that's going to be you know the right way to do it. And it's like, no, the data is going to drive us now.

51:00 Michael Kennedy: That's really interesting. Rob, how 'about you?

51:02 Panelists: Yes, I'm not sure from an accounting standpoint since I'm not in the industry anymore. But I would imagine there are companies already working on you know the AI, machine learning side of things to automate the more manual sides of doing accounting and bookkeeping and things like that and probably tax-wise too I would imagine. I think, what, H & R Block already advertises how they use IBM Watson for calculating your taxes or something like that. I'm not sure all the details.

51:30 Michael Kennedy: Oh, interesting.

51:32 Panelists: Yeah, so I'm sure you know down the road there's going to be a point where you click two buttons, and all your tax data is imported, and your return is filed automatically without having to do much of anything, but we're probably still a little ways off.

51:47 Michael Kennedy: Yeah, so kind of folks can become almost like advisors more than like computers, I guess.

51:54 Panelists: Yeah I think it kind of goes back to what some of the other guys were talking about where you know oh you're going to take away my job. Well no, not really. You're just going to work on higher level you know planning type of things versus the actually manual work of calculating taxes or doing bookkeeping.

52:10 Michael Kennedy: Yeah, and that sounds like a good thing.

52:12 Panelists: Yeah, probably.

52:14 Michael Kennedy: All right, so I think we have to leave it there. We're pretty much out of time, everyone. But it wouldn't be Talk Python without the final two questions. So we'll just quickly, I'll go down the list. You guys can let me know. So Derrick, favorite editor and notable PyPI package?

52:27 Panelists: I don't really have a favorite editor. I'll use Vim or Spyder or PyCharm just depending on what I'm doing. Notable package, I found this package called Sorted Containers which basically I had run into some software that had used this custom AVL implementation, and I was able to basically get rid of a lot of the compiled C code using this package. So check that out if you're in the need of something like that.

52:49 Michael Kennedy: That's sweet, Jim?

52:50 Panelists: So I spend most of my time using PyCharm if I can. And my notable package would be Hopper which is a great development package where if you've got like a web server running and anytime you make a change to your code, you want it to automatically restart. Hopper takes care of that for you.

53:06 Michael Kennedy: Yeah, that's really cool because normally you've got to restart the process for it to re-detect and reload the Python modules. But I think Flask does this automatically, right? You can say like --debug or reload or something, and it'll watch those changes. Maybe it's even using Hopper inside. That's cool. Definitely a nice feature. Arsh?

53:26 Panelists: I used to be a PyCharm person, but I have switch to VS Code. I'm really loving VS Code. So I think I'm pretty set on sticking with it. And then for a package, I'll reword Airflow. I think it's still fairly new. I don't know how may people are using it, but if you have data pipeline workflows and you need to do some data processing and you want to do some scheduling, checkout Airflow. It's a really good package, and it can solve a lot of those kind of problems.

53:52 Michael Kennedy: Nice, it sounds like you guys are doing cool stuff with it. Rob?

53:55 Panelists: Favorite editor, if I'm doing, I mean a lot of my work is you know Jupyter now. But if I'm doing just stuff locally, it's Sublime generally or occasionaly Vim.

54:06 Michael Kennedy: Okay great, and package?

54:07 Panelists: Yeah package, a little one called Pyperclip which is really simple. You just give it a path or anything, and it will copy something to your clipboard which is really handy when people send you files that you have to load into Pandas, and the name is never consistent. So literally just copy the filename and have the path set up, and you concatenate together, and you're all set to go.

54:33 Michael Kennedy: Oh, that's cool. I really like Pyperclip. I use it for some automation as well. And just like the results will pop out, and I don't want to save 'em to a file, but I just need to paste them somewhere else. And so the last bit in that step is just copy to clipboard, and you know if I want it, it's there. If not, like you know I just ignore what's in my clipboard.

54:51 Panelists: Yeah, I usually just use it to grab something and put it in my clipboard and grab it from there without having to paste it, so very nice.

54:57 Michael Kennedy: Great. Very nice, good recommendation. All right everyone, thank you for being on the show. This was a really interesting conversation, and looking forward to share it with everyone.

55:06 Panelists: Thanks Thank for having us. Thank you. Thanks.

55:08 Michael Kennedy: You bet, bye. This has been another episode of Talk Python to Me. Our guest on this episode have been Derrick Chambers, Jim Taysom, Arsh Soheili, and Rob Ward. It's been brought to you by Linode and brilliant.org. Linode is bulletproof hosting for whatever you're building with Python. Get four months free at talkpython.fm/linode. That's L I N O D E. Brilliant.org wants to help you level up your math can science through fun guided problem solving. Get started for free at talkpython.fm/brilliant. Want to level up your Python, if you're just getting started, try my Python Jumpstart by Building 10 Apps or our brand new 100 Days of Code in Python. And if you're interested in more than one course, be sure to check out the everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon