Learn Python with Talk Python's 270 hours of courses

#392: Data Science from the Command Line Transcript

Recorded on Monday, Nov 28, 2022.

00:00 When you think of data science, jupyter notebooks and associated tools probably come to mind. But I want to broaden your toolset a bit and encourage you to look around at other tools that are literally at your fingertips. The terminal and the shell command line tools. On this episode, you'll meet Jeroen Janssens, who wrote the book Data Science at the commandline, there are a bunch of fun and useful small utilities that will make your life simpler that you can run immediately in the terminal. For example, you can query a CSV file with SQL right on the command line. That and much more on this episode. 392 of Talk Python to Me recorded November 28, 2022.

00:50 Welcome to Talk.

00:52 Python to Me, a weekly podcast on Python. This is your host, Michael Kennedy. Follow me on Mastodon where I'm @mkennedy and follow the podcast using @talkpython, both on fosstodon.org. Be careful with impersonating accounts on other instances.

01:06 There are many. Keep up with the show and listen.

01:08 To over seven years of past episodes at Talkpython.FM. We've started streaming most of our episodes live on YouTube. Subscribe to our YouTube channel over at Talkpython.com/YouTube to get notified about upcoming shows and be part of that episode.

01:24 This episode is sponsored by Sentry. Don't let those errors go unnoticed. Use Sentry. Get started at Talkpython.FM/sentry, and it's brought to you by Microsoft for startups foundershub. Check them out at talkpython.fm/Foundershub.

01:40 You get early support for your startup.

01:42 Transcripts for this episode are sponsored by AssemblyAI, the API platform for state of the art AI models that automatically transcribe and understand audio data at a large scale. To learn more, visit talkpython.FM/AssemblyAI.

01:57 Jeroen, welcome to Talk Python to Me.

01:59 Hey, thank you. I'm very happy to be here.

02:02 I saw your book and the title was Data Science at the command Line. I thought, okay, that's different. There's a lot of people that talk about data science tools and Jupyter Labs amazing. And if you look over the fence, like our studio and those kinds of things, and yet so much of what we can kind of do and orchestrate and create as a building block happens in the terminal. And bringing some of these Data science ideas and some of these concepts from the terminal to support data scientists, I think is a really cool idea. So we're going to have a great time talking about it.

02:33 Yeah, I love to talk about this. And yeah, you're right. I still find it in an interesting juxtaposition of these two terms, data science and the command line. One being while nowadays, let's say ten years old, at least the term is, and the other one, the command line is over 50 years old.

02:52 The command line its ancient in computer terms. It's one of the absolute, very first ways of interacting with computers. You've got cards where you programmed on paper, and then you had the shell right after that.

03:06 Exactly.

03:06 Before there were any screens.

03:08 Really?

03:08 Yeah, when computers were green, they were all green. It was amazing. So I'm looking forward to diving into that, and it's going to be a lot of fun. And I just want to also put out there for people who are like, but I'm not a data scientist, so should I check out? I actually think there's a ton of cool ideas in there just for people who do all sorts of Python and other types of programming. It's not just data scientists, right?

03:31 No, absolutely.

03:34 I don't really care much for titles, but even when you're an engineer or a developer, you would be surprised if you really think about how much data you actually work with. I mean, just log files on the server.

03:49 That's data, too. So there are still a lot of opportunities to use the command line, even if you don't consider yourself to be a data scientist per se.

04:00 Yeah, I totally agree. All right, let's start with your story. How did you get into programming and data science? I know you do Python and R and some other things. So how do you get into programming?

04:11 Yeah, it actually started when I was about twelve years old. We got this old computer. It was already old by then, 286, and I opened up this program. I wanted to write a story, and I was just typing, I was journaling. Then I got all these error messages. Turns out the program that I had opened was Q Basic, and it didn't really like what I had to say.

04:37 And then I started reading The Help, and then I realized, like, hey, I can make this computer do things. It just needs a particular language. And that's really how I got into programming. And then, of course, there's a whole range of programming languages that then come by Visual Basic at a certain point. Pascal Java C.

05:01 And you know what? I've forgotten most of it, so if this sounds intimidating, then please don't worry. But yeah, nowadays Python plays a big role in my professional career also R. Right. And those two happen to be the most popular programming languages for doing data science and JavaScript. Obviously, when you're doing some more from.

05:24 Network, JavaScript finds its way into all these little cracks. You're like, Why JavaScript?

05:28 Come on.

05:29 I was just looking at programmable Dynamic DNS as a service, and the way you program it is I know. And you jam in little bits of JavaScript to make a decision on how to route a DNS query and like JavaScript.

05:43 Oh, yeah.

05:43 I'm now using ElevenD, which is a static site generator, and ironically enough, it uses JavaScript.

05:51 Yeah, sure.

05:54 I've heard really good things about Eleven D, and I just started using Hugo, which is also a static site generator, but that one's written in Go, and I just decided I care about writing. And Markdown and I want a static site, and I don't care long as I run a command on the terminal. Actually, I want to tell the story a little bit about sort of coordinating over the shell for some of the static site things. But I don't care if the guts of it are in the language that I can program. It's a tool. If it's a good tool for me, I'm going to use it. Okay, so that's how you got into programming. How about day today? What are you doing these days?

06:25 Yeah, so at this very moment, as we're recording this, I have my own company called Data Science Workshops, where I give training at companies to developers and researchers and occasionally managers. But I have decided to stop with that.

06:43 Okay.

06:43 So in a couple of weeks, I'll actually join another company. So in the past six years, we can talk about how that company came about. And it's probably related to it's related to everything, of course. But I just want to say that this is actually the very first time that I'm talking about this, but I'm going to be a machine learning engineer.

07:05 Okay.

07:05 Two reasons why I decided to stop with my own company is first of all, I really miss working with people, working with colleagues, and secondly, I miss building things. So that's why I'm joining. Well, January 1, I'm joining Somia a consultancy based in Amsterdam, the Netherlands.

07:26 Excellent. Well, that sounds really fun. I also run my own company and I'd really enjoy it, but I completely get what you're saying.

07:34 Sometimes it's nice to be with a team and you also it makes you learn different skills or hone different skills to show up at a client company where they've got a million requests an hour, trying to answer something with machine learning versus doing some research and talking about how to improve the shell. These are two very different jobs. And so it's cool to mix up the career across those, say. Yeah, it's great to mix up your career and do some of both. Right? Because if all you do is work at a consultant, you'd be like, I can't wait to start my own company and do something else.

08:07 Right? And, you know, the company just happened. That's actually thanks to the book that I wrote a long time ago, once it was done, the first edition, that is in 2014, and we're talking about data Science at the command line here. I was asked to give a workshop, and I'd never given a workshop before, but I was asked by a games company in Barcelona to give a one day workshop, and I liked it, and I liked it so much that I started doing this more often. So I decided to do this full time. So I didn't choose the company life.

08:41 Startup life.

08:43 You don't really think it is a.

08:44 Startup, but these things are independent. The independent life. That's right.

08:49 Yes, exactly.

08:50 Yeah. Cool. All right, well, let's talk about the Terminal. People on Windows might notice the command prompt, although you as I would also recommend that people generally stay away from the command prompt in at least for some of these tools. But we do have the Windows Terminal, which is relatively new and much, much nicer, much, much closer to the way Mac PowerShell, you mean? Well, there's the PowerShell, but then there's also a new Windows Terminal application and then it can even do things like bash into Windows Subsystem for Linux. Right, so if you wanted to do some of these tools, you could fire up Windows Subsystem for Linux and then you would literally have the same tool chain because it's just Ubuntu or something.

09:32 Oh, right, yeah. I mean, I'm familiar with WSL, but I haven't tried out this new Windows.

09:39 Yeah, the new Windows Terminal is pretty nice. Let me see if I can pull it up for everybody windows Terminal, but yeah, it's just in the Windows Eleven store, I guess you call it. I don't know, but it's a lot closer to the other tools. So if you're on Windows, you owe it to yourself to not use CMD Exe, but get this instead. So what I want to talk about just real quickly to set the stage is I just went through a period of, oh, my computer has been the same set up for a couple of years. It's getting crufty. I'm going to just format it. Not restore from some backup, but format it and reset up everything so it's completely fresh and, like, better. Because I really made some mistakes whenever I set it up now, it's better, but I open up the terminal and it's this tiny font, dreadful white background with with white black text, and it has some old version of bash. And so I kind of wanted to get your thoughts on, like, what do you do to make your terminal better?

10:40 Right.

10:40 You probably do something, you probably install some extras and other things to make your experience on the Terminal nicer.

10:46 Yeah.

10:47 I'm guessing you're on macOS, then?

10:49 Yes, I do macOS and I do Linux for the servers and I think some combination thereof is pretty common for most of the listeners.

10:58 So for macOS, the biggest gain you get when you install iterm okay. A different terminal, right. The application that would launch your shell.

11:10 Yeah. The Terminal emulator, what do they call it? The macOS terminal replacement.

11:15 Yeah.

11:15 This is, I'd say, the most popular one on macOS. There are others, but yeah, that's what I start with. You mentioned the shell, which is it still bash? Is that still the default one on macOS?

11:28 Yeah, it's still bash by default, yeah. And I think it's an old version.

11:32 Of bash, so yeah, there are other shells out there. The G shell is quite popular, largely compatible with bash. I've heard good things about Fish.

11:43 Yes, fish is good. Yeah.

11:45 Which actually it's not really Posix compliant, as they say. So it's quite different from what you might get from Bash or ZSH. But from what I've seen, the syntax, especially looping, might appeal to the Python developer out there. It's closer to Python, but I haven't tried it myself.

12:06 There's also the Xonshell. Is that how you say XONSH?

12:11 If you're willing to give up Posit, then this is like literally Python in the shell. You can just type, like, import JSON and do A for loop. Right now, I've not gone this far. I'm still I'm on the Z shell side of things. I really like how that works. But if you really wanted to embrace the sort of Python in the shell.

12:30 Exactly, it's this trade off of how far do you want to go, how much do you want to deviate from what is then considered to be the default?

12:40 Right.

12:40 Because you mentioned you also work a lot on servers. And there you are, then presented with a completely different shell, perhaps instead of tools, it's a trade off. And also, how much time do you really want to spend customizing this? Because our time is precious.

12:58 Yeah.

13:00 And William out the audience says, for the Windows people, oh my Posh. Which have you done stuff with, oh, my, Posh. This is also really nice.

13:09 I guess Posh is for any shell. So not just the power shell.

13:14 Yes, I think it came out of the PowerShell. So the Posh part, I think it originally was for that. But I use this with Zshell and Omizsh together, and basically that controls my prompt. And Oh MyZsh is like, all the plugins and complete your get branches type of thing. But yeah, this is really pretty neat, too. Works well for Windows people, I say that.

13:38 Indeed.

13:40 So if your terminal is one thing where you can get a lot of benefit from customizing your prompt so that it gives you a little bit more information and context of where you are or what your state in, which state your Git repo is in or which virtual environment you're working. That can be helpful too, because that is something that you lose easily when you're working at the command line is context.

14:05 Right. I ran this command, and it's not working because actually, I forgot to activate the virtual environment. So it doesn't have the dependencies or the environmental variables that I set up in that virtual environment. Right, exactly. Let me give one more shout out for one other thing. While people are thinking about making their stuff better is Nerd fonts.

14:24 I'm always eager to learn these things. There is so much out there.

14:28 So, nerd fonts, if you're going to get like, oh, my Posh, and some of these other extensions that you want to make your shell better, so many of them depend on having what are called Nerd fonts, because if you look at, say, on the oh, my Posh page. There's like, these arrows with gaps in them. What font could possibly have, like, a git branch symbol and these connecting arrows that have colors, interwoven and all that stuff is Nerd fonts. So if you're going to try to run them, download and install one of these Nerd fonts, and then those will work. Otherwise, they're like those I don't understand Unicode square blocks, like when emojis go bad.

15:07 Oh, you still got to install individual fonts.

15:10 Yeah.

15:11 So it's kind of like you take consolata or something or some other font, and it's patched with these additional yeah, something like that.

15:20 You only need one, but you have to set your terminal to one of these to make a choice. Set it to one of them and then it'll work. But if you don't, then you'll end up with just like a lot of these extensions don't work.

15:34 This portion of Talk Python to Me is brought to you by Sentry. How would you like to remove a little stress from your life? Do you worry that users may be encountering errors, slowdowns or crashes with your app right now? Would you even know it until they.

15:48 Sent you that support email?

15:49 How much better would it be to have the error or performance details immediately sent to you, including the call stack and values of local variables and the active user recorded in the report? With Sentry, this is not only possible, it's simple. In fact, we use synthy on all the Talk Python web properties. We've actually fixed a bug triggered by a user and had the upgrade ready to roll out as we got the support email. And that was a great email to write back. Hey, we already saw your error and have already rolled out the fix. Imagine their surprise, surprise and delight your users. Create your Sentry account at Talk Python FM/sentry

16:26 And if you sign up with the.

16:28 Code Talkpython all one word, it's good for two free months of Century's business plan, which will give you up to 20 times as many monthly events, as well as other features, create better software, delight your users, and support the podcast. Visit Talk Python FM/sentry and use the coupon code Talkpython.

16:50 Yeah.

16:51 So when it comes to customizing your shell, then if you still want to.

16:54 Talk about that yeah, let's keep going.

16:56 Right. One of the things I think everybody does most often is navigating around, so moving from one directory to another, and it can be quite cumbersome to keep on retyping all these long and deeply nested directories. So there are a number of solutions that can help with that. I use FASD.

17:20 Okay.

17:20 So that keeps track of what you've been visiting most often, most recently. So I wonder if that one also allows you to set BOOKMARKS. That's what I used to do.

17:33 I would have this set of custom shell functions which actually made it into a plug in about nine years ago into OMyZsh shell. So if you have OhMyZsh shell and the jump plugin is still in there.

17:48 I see you can just jump around.

17:53 You would say, mark this directory under this alias, although it's not really an alias, but it's like a bookmark. And then you say jump to this directory. So that really helps.

18:04 Right.

18:04 So maybe the source directory for talk Python, I would just mark it as TP and I could say on the terminal, I could say JSpace PP and it would take me, this super long complex directory spam you're there, right?

18:17 Exactly. So I like it.

18:18 Okay, I might need to try this out. And it comes with OhMyZsh.

18:21 This one? No, this one it doesn't. It's a separate tool, I believe. Although it might even be a plugin by now. I don't even know. It's been a long time since I installed it. But FASD, that's what you want to look for.

18:35 Okay, very cool. I have one that I use a lot called McFly.

18:40 Have you seen McFly?

18:41 No.

18:42 So it's similar. And what you do is, you know, if you type control R, it'll give you reverse incremental search or whatever that is.

18:50 So this overrides that. So if you type control R, it brings up an emacs like autocomplete type thing that has fuzzy searching. So you could type SSH and then like part of a domain name, and it would find you typed SSH root at some that domain name and it'll give you a list of all these smart options looking through your history.

19:10 Yeah, that's amazing. And even now, as we talk, I've learnt like a dozen new things. One thing I have noticed though, is that the next time you're setting up your system, you may feel very productive and leap like when you're installing all these tools, but you still got to make use of them. You've got to turn it into some kind of habit. And what I have noticed, for me at least, what works best is to just take it one tool at a time, make a little cheat sheet for yourself on a piece of paper and just see if you like that tool, if you get any benefit from it.

19:50 Yeah, absolutely.

19:51 So related to this, actually, is the concept of aliases, right. In the more generic sense, in the pure shell sense, that you can define an alias that would then be expanded into some command with zero or more arguments. So if you have commands that you would often type, like LS for listing your files, and you have all these arguments that you don't want to keep on typing, then aliases is the way to go.

20:20 I go crazy with aliases. I absolutely love this. Yeah, I have probably 150 aliases in my RC file.

20:27 Oh, that's nice. Yeah, that's nice. So at some point, what you may have done is go through your history and then see how often you use these aliases. That's always a fun thing to do.

20:40 Yes.

20:41 For me, it's kind of frustration. I'm like, God, I want to do this. I've got to remember, I've got to type, oh, no, I got to go into this directory, and then I got to first type this command, and then I can do this other thing. So, for example, we talked about the static site generator. So one of the things I have to do in order to create new content and see how it looks in the browser is I have to go to a certain directory, not where the content is, but a couple up run Hugo Dash D server there and then it will auto reload. And as I edit the markdown, it'll just refresh. So instead of always remembering how to find that directory and then go into the right sort of parent directory and running, I just now type Hugo. Right. And that's an alias, and it does that. Boom. It just pops open, and it's okay, it's running. I do my thing. Then I got to do a whole bunch of automation in Python on top of it and then build it and ship it to the git and push it for a continuous deployment. Now I have just Hugo publish boom. And these are all, like, aliases. The other thing you talked about a single commands is maybe talk about chaining commands and multiple commands.

21:42 Yeah, because you just mentioned automation in Python. And then I, of course, immediately go, like, what's going on there? I've been calling radio.

21:51 Yeah.

21:51 So I've got a couple of I guess they're go commands because they're Hugo. And then I've got some Python code that generates a tag cloud and then a git command that will publish it. So it's like Hugo. Hugo Git. No, Hugo. Hugo Python. Hugo Git. Is that all in one alias?

22:08 Right.

22:08 Which is beautiful.

22:10 Oh, nice.

22:10 Yeah, it's beautiful. I don't know if we've exactly I guess I opened a little bit talking about your book, but one of the really core ideas of your book is that the shell can be the integration environment across technologies like Go, Python and Git.

22:24 Exactly. The command line doesn't care in what language something has been written. It's like a super blue or duct tape more, really?

22:34 That binds everything together. Yeah, to a certain extent.

22:38 Right.

22:39 Like duct tape.

22:40 Yeah.

22:40 Well, it's loosely bound, but there's a ton of flexibility in there. And if you think, well, I really just want to do these four things, maybe that would be a macro in Excel or some kind of, like, scripting replay in Windows, but it's on the terminal. Programs can run it. You can run it. It's clearly editable. It's not some weird specific type of macro.

23:02 Right.

23:02 You're like, I want to do these four things. I just type the thing and go. I'm sure many people know, but if you have multiple commands, you want to run one, then the other. You can just say && between them, and it'll say run the first thing and then run the other. Those are independent. You can also pipe inputs and outputs between them. Right. I see that you've got some really interesting ways to do that multiline stuff in your book as well.

23:24 Yeah, well, yeah. So it depends on what kind of tools you want to combine.

23:28 Right.

23:29 So you just mentioned double ampersand. So that should be used when you only want to run the second command when the first one has succeeded.

23:39 Right.

23:39 If you want to run the second one, regardless of what the first one did, you can just use a semicolon. Or if you only want to run the second command when the first one failed, there might be a situation where you want to do that. You can use double pipe.

23:54 Interesting. Okay.

23:57 You just mentioned piping, and that's well, a whole nother story. That's when you want to use the output from the first command as input to the second command. And this is where data again comes into play. And this is so you just also mentioned macros.

24:16 Right.

24:17 Another way to think of them are functions that you then combine. Yeah, incredibly powerful. But that goes a little bit beyond then, of course, you should be working with commands that produce some text that you want to then further work on.

24:34 Yes.

24:35 You also talk about creating bash scripts, which is pretty interesting. I think many people probably know about that. R shell scripts, h files. I guess it could be Z shell scripts as well. So you gave an interesting presentation back at the Strata Conference, and you had a lot of fun ideas that I think are relevant here. So maybe let me just throw out some one liners, and you could maybe riff on that a little bit.

24:57 Okay, yeah, sure.

24:59 One of the reasons you said you gave 50 reasons that the shell was awesome, and I want to just point out a couple, highlight a couple, let you speak to them. So you said the shell is like a repl that lets you just play with your data. We know the repl from Python and also from Jupyter, but I never really thought of the shell as a repl. But it kind of is, right?

25:17 Yeah.

25:17 I think the shell is the mother of all repls, the repl print loop.

25:22 Right.

25:23 Having this short feedback loop of doing things and seeing output and then elaborating on that, I think that is tremendously valuable. And Python users, of course, may recognize this from Python itself. You just execute Python, you get a repl or Jupyter console. And to a certain extent, also Jupyter notebook or Jupyter Lab is there are some similarities there where you, again, have this quick feedback loop, and it's a very different experience from writing a script from top to bottom or starting at the top and then executing that script from the start every time you want to test something. So that's a different way of working. And I'm not saying one is better than the other, but what I do want to say is that there are situations where having such a tight feedback loop can be very efficient.

26:15 Yeah.

26:16 Especially in the exploration stage. Right?

26:18 Yeah, exactly.

26:19 Once you go to production. Right. Whatever that means.

26:23 Right.

26:23 Once you want things to be a bit more stable, you don't want to just use duct tape, but you want to use a proper construction, then yeah. Then, of course, the command line can have different roles there.

26:38 Yeah.

26:38 But it's kind of the rad GUI, the rapid application development GUI, but for data exploration. Right. These reptiles, that's probably why Jupyter is so popular. It just lets you play and see and try and just that quick feedback loop is amazing. Another reason that it's awesome close to the file system.

26:55 Yeah.

26:56 I mean, in the end, it's all files, right? Whether you're producing code that lives somewhere, it's in the file. Or whether you're working with images or log files that get written to something or you have some configuration, it's all files. And we got to do things with these files. We have to move them around, we have to rename them, delete them, put them into git.

27:18 Yeah.

27:18 So you want to be close to your file system. You don't want to be importing a whole bunch of libraries before you can start doing things with these files.

27:28 Also, when you're doing data science, often it starts with this kind of ingest and understanding files, right. CSV or text or others.

27:37 Yeah.

27:38 I sometimes try to immediately do read CSV and Pandas, but then very often I get presented, I get some Unicode error or it turns out the comma is not the limiter being used. And, yeah, you can do that in a sort of trial and error way. You can fix that. But it really helps just being able to look at a file as it is. No parsing, just, boom, there's my file. And then yes, once you're comfortable, once you're confident, like, okay, this is what my file looks like, this is its structure, then, of course, you can always move on to using some other package like Pandas.

28:21 Okay. Another one that you've said, another recommendation you had a way for playing with this was to use Docker. I don't know how many people out there who haven't done this for are really familiar. But basically when you start up a docker image, you might say it or zsh, and what you get is just you get a basic shell inside the docker container, but in that space, then you can kind of go crazy and do whatever you want to the shell and try it out.

28:47 Right?

28:48 Exactly.

28:48 Yeah.

28:48 So there are two scenarios that I can think of. So when you're just starting out with the. Command line. It's a very intimidating environment and it's quite easy to wreck your system if you're not careful. So being inside an isolated environment that is sort of shielded up your host operating system can be comforting. So that's one recommendation that I would say that why I think you should use docker and the other one is reproducibility. Also in Python we're dealing with packages that get updated, that get different version numbers where APIs change, and being able to reproduce a certain environment so that you get consistent results is also very valuable.

29:36 Yeah, and I'd like to sort of highlight the converse as well. You said playing with docker containers is a cool way to experiment with the shell. If you care about docker containers, you need to know the shell to do things to it because you might think, oh, I'm just going to make a docker file, I don't need to know the shell. What goes in the docker file? A whole bunch of commands that many of them look like exactly what you would run on the shell. You just put it in a certain location or as a command argument to some configuration thing in there. So if you're going to do things with containers, the way you speak to them is mostly through shell like commands.

30:13 This portion of Talk Python to me is brought to you by Microsoft for Startups Founders Hub, starting a business is hard. By some estimates, over 90% of startups will go out of business in just their first year. With that in mind, Microsoft for Startups set out to understand what startups need to be successful and to create a digital platform to help them overcome those challenges, microsoft for Startups Founders Hub was born. Founders Hub provides all founders at any stage with free resources to solve their startup challenges. The platform provides technology benefits, access to expert guidance and skilled resources, mentorship and networking connections, and much more. Unlike others in the industry, Microsoft for Startups Founders Hub doesn't require startups to be investorbacked or third party validated to participate, founders Hub is truly open to all. So what do you get if you join them? You speed up your development with free access to GitHub and Microsoft cloud computing resources and the ability to unlock more credits over time. To help your startup innovate, Founders Hub is partnering with innovative companies like OpenAI, a global leader in AI research and development to provide exclusive benefits and discounts through Microsoft for Startups Founders Hub, becoming a founder is no longer about who you know. You'll have access to their mentorship network, giving you a pool of hundreds of mentors across a range of disciplines and areas like idea validation, fundraising, management and coaching, sales and marketing, as well as specific technical stress points. You'll be able to book a one on one meeting with the mentors, many of whom are former founders themselves. Make your idea a reality today, with a critical support you'll get from Founder Hub. To join the program, just visit talkpython.FM/FoundersHub all one word the links in your show notes. Thank you to Microsoft for supporting the show.

32:05 So one of the cool tools that you had in that presentation was you talked about explainshell.com.

32:12 Yeah.

32:12 What is this?

32:13 Well, you can try out. So what you see here on the screen is explainshell.com and it will break down a long command and start explaining. So what I think the authors have done is they have used all these manual pages and extracted bits and pieces that they then present to you in an order that corresponds to the command that you're pasting into this. So if you see on stack overflow, you see this incantation and you're like, all right, what does it mean? And you don't want to go through the manual page yourself. Right, okay, so what does F mean? What is this xzvf for the Tarcom means then explain shell can do this trick for you.

33:00 Yeah, it's amazing. When I first thought, I thought, okay, well, what this is going to be is this is going to be like the main page. If you type LS, it'll show you a simple list directory content and you click on it, it'll give you additional arguments you can pass. But right, you could then say, like you said, you could say Dash-l and it will say the LS means less content. The L means use the long listing format. Okay, hold on. What if I said get checkout main and it'll say, okay, well, git checkout main does this and then main it will actually parse it apart. And there's some really wild examples on here. That right on the page that are highlighted on the homepage of that site. You click it and boom, it gives you this cool graph of like, what the heck? It even shows like the ampersand, the double ampersand and the double or combining, as you mentioned before. Yeah, this is amazing.

33:51 It's really useful, especially when you're just getting started with the command line and you're overwhelmed like we all are in the beginning and sometimes still are. Then adding some context like this really helps. I once wrote a utility that allowed you to use explain shell.com from the command line so you wouldn't leave the command line. I don't think it works any longer, but yeah, that was a fun exercise.

34:17 Yeah, very neat. One of the things I learned was parallel.

34:23 So tell us about parallel. This is a command you can run on the terminal and it sounds like it does stuff in parallel. That sounds amazing.

34:32 Yeah, like the name implies parallel is a tool. And we're talking about new parallel here. There's another version out there that is similar but different. Parallel, this tool that doesn't do anything by itself, but it multiplies. It's a force multiplier for all the other tools. So what this tool is able to do it will parallelize your pipeline. It will be able to run jobs on multiple cores and even distribute them to other machines if you have those available.

35:06 Right.

35:07 So, Michael, you mentioned you're working on a server. Well, if you can SSH, if you can SSH into other servers as well, you can leverage those. That's something that new parallel can do. The way it works is that you feed it a list of something. Could be a list of file names, could be a list of URLs, could be your log files. If you can then think of the problem that you want to solve. If you can break it down into smaller chunks, then new parallel might be able to help you out there.

35:38 These jobs should be working independently from each other. There can be nearly impossible to have those jobs communicate with each other. But let's say you have for your blog, right, in Hugo, you have a whole bunch of ping files that you want to convert to JPEGs webpage or something.

35:57 Yeah, sure.

35:58 Yeah.

35:58 I mean, it's a bad example because this particular tool that I would then use already supports doing multiple files. But let's just assume that this tool can only handle one file at a time.

36:09 Then you would specify your command and then at certain places where necessary use placeholders.

36:17 Okay.

36:17 This is where the file name goes. And this is where the file name goes with a new extension. So it's one of my favorite tools.

36:24 Really? Yes.

36:25 That's fantastic. So, for example, if you had a bunch of web pages and you wanted to compute the sentiment analysis, right. As a data scientist, you want to download it, compute the sentiment analysis, and then save that to a CSV or a pin it to a CSV. Maybe somebody gave you that script, and it's only written to talk to one thing, and you don't want to rewrite it or touch it or get involved with it. This is your way to unlock the parallels on that.

36:49 Right.

36:49 In fact, let's talk a little bit more about this, because I think this is an important point in that I'm sure that we've all come across when we're working in Python and you're thinking like, okay, I can speed this up. I want to do things in parallel. You know what I'm going to do. Multi threading or what is it that you use these days in Python?

37:08 Yeah, async going to wait, maybe if it's IoT or something like that.

37:11 Yeah.

37:11 You got your pool of workers or I don't know, basically you're programming it yourself from the ground up.

37:17 Right.

37:17 Multi processing, potentially.

37:20 The trick then is to realize that there is already a tool out there that can do that for you. All that you need to do is make sure that your Python code becomes a command line tool, and we can talk a little bit more about that. But there are just five, six steps needed to make that happen. Once you realize that, then you can start turning existing Python code into command line tools and start combining it with all the other tools that are already available, including parallel.

37:49 Yeah, it's awesome. I think it's a really cool idea because maybe the person working with the code doesn't understand multi processing and thread synchronization and all these tricky concepts. Just give me a thing that does it once with command line arguments and I got it.

38:07 Or you picked it up from somewhere out of the audience. The question is, is there a GIL associated with this? And I mean, technically, yes, but it's not interfering with the computation because it's multiple processes, it's not spreads within the process. Right. So it should be able to just.

38:22 Run there will be one Gil process, right?

38:27 Yeah, that's right. And so it doesn't matter because if you say there's five jobs, you have five processes.

38:32 Right.

38:32 There's no contention there. Yeah, absolutely. All right, let's talk a little bit about this idea of turning Python scripts into command line tools. Yeah, I think that that's really valuable for people.

38:44 It is. And we can then put it in the show notes. I might have already given a talk about this, actually not sure if it's publicly available. Anyway, there are only a couple of steps and it's not that difficult. So first of all, let's assume that you have some Python code out there. You have it in a file and let's just, for simplicity's sake, assume it's a single file. So what would you then need to do to turn this into a commandline tool? Something that can be run on the command line. So the way that you can currently run this is by saying, okay, Python, and then the name of the file.

39:21 Right.

39:21 That doesn't sound like it's a comment line tool. So the very first thing here then, is to add one line at the very top. That would then start with a hash and an exclamation mark, or a hash bang or a zshbang, as it's called. These are two special characters and they instruct the shell. This can be executed.

39:45 What is the binary that's going to do the executing?

39:47 Right?

39:48 Yeah, exactly that's. What then would come after that? So you would have hash bang and then it would point to the Python executable. There's some details there.

39:58 It could be a certain version, it could be out of a virtual environment potential, and it could go wherever.

40:03 Right.

40:03 You don't want to overcomplicate it, probably, but like, you could point to you could point to different versions of Python. You could point because you give it a full path to executable.

40:11 Exactly.

40:13 There's some compatibility issues there, but essentially you tell yourself, OK, which program should interpret my code? And that is some Python out there that you have installed. So that's the first step. Then after you've done this, you no longer need to type Python anymore, because the file itself contains which executable should be run. But then you'll notice that you don't have the necessary permission you need to enable the Execution bit. This would give you, as the user permission to actually execute this file. You do that, of course, with a command line tool. It's called Shimod C-H-M-O-D for change mode, and then U plus x, the name of the file.

40:58 Right.

40:58 These details are if you're really interested. One place where you can find them is in chapter four of my book, data Scientist Command Line, which you can read for free. Okay, but let's say that you've enabled the Execution bit. Now you can run it. You would still need to type period and a slash, because this file is presumably not yet on your search path. So your search path is a list of directories where your shell will be looking for the executable that you want to run. Where is your tool located? Well, it should be somewhere on the search path. So either you add another path to the search path, or you move the tool to one of the existing directories out there. That's about it for making your code executable. But then you probably want to change one or two things about the code itself. So one thing to do is look for any hard coded values that you actually want to make dynamic.

41:59 Right.

42:00 These should be turned into command line arguments. And actually, you can take that one step further. If one portion of your file is doing something that can be done by another command line tool, then consider removing that. For example, downloading a file. Yes, there is, of course, a tool for that on the command line. Why would you then write this yourself? Of course, there is a time and a place for that, but let's say a very contrived example is a Python program that would count words.

42:31 If your code has some hard coded website, why, you would make your tool more generic by getting rid of that hardcoded URL and well, turn it into a command line argument. Which website would you like to download? Or to go one step further is to think, okay, you know what? I don't really care where the text is coming from. I just want to count words.

42:53 Give me text somehow. Sorry, just give me the text. Don't tell me. The URL.

42:58 So your tool should then be reading from standard input, which is a special channel from which you can receive data. And this is also where the piping would come in. So you would first use a tool that would get this text. Maybe it's some log file, so you want to count your errors. Or it's another website and you want to do stuff to that doesn't really matter, but you would then that would write to its standard output.

43:25 Yes.

43:25 And you would combine the standard output from the first tool with your standard input using the pipe operator. So that's basically it. Of course, if you want to take this further, you can think about adding some help, some nicelooking help. Think about the arguments themselves. You want to use short options or long options. Exactly right.

43:48 Something like Typer or Click or one of these formal CLI frameworks.

43:54 Python, of course, has art parse, but there are packages out there that can really help you build beautiful command line tools. Typer is one of them. I'm currently using click. Also click combination with Rich.

44:10 So of course, the author of Rich was on the show a couple of episodes ago.

44:15 Yeah, Will McGugan.

44:18 Why we're talking about that? You know, the other thing that's really pretty interesting is the Rich CLI. Have you played with Rich CLI?

44:26 Yeah.

44:26 Okay, so that's indeed a command line tool in itself that can do a whole bunch of things.

44:31 Yeah.

44:31 You want to tell us something about it?

44:33 No, I haven't done much with it. But you can do things like if you install the Rich CLI, then you can say things. There's lots of ways to install it. You could say like, Rich and then a Python file or a JavaScript file or a JSON file, and it will give you pretty printed color syntax, highlighted print out. You can say Rich, some CSV file, and it will give you a formatted table inside your terminal with colors and everything.

44:58 It understands markdown and renders markdown. And there's all sorts of you're kind of exploring files and you're happy with Python things. And installing the Rich CLI is a pretty neat way to go as well.

45:10 Yeah, it's a nifty tool, but just not to get confused. So this tool is provided by Rich and it uses Rich to produce nice looking output. But just imagine that you can write your own command line tools that would also produce this nice looking output. And for that you can then use this package called Rich.

45:30 Right.

45:30 In combination perhaps with things like Typer or Click. And the dock ops is another way you can go. There are so many tools out there.

45:40 Yeah, there absolutely are. One other thing I would like to point out that so just taking the script and making it executable and put it in the path, that's kind of a great way to take scripts that you have and make them CLI commands for you. If you want to formalize this a little bit more, I recently ran across this project called the Twitter Archive Parser, and I don't know if you've noticed, but there's a lot of turmoil at Twitter. And so what you can do is you go to Twitter and download your entire history of like thousands of tweets or whatever as HTML file and some JSON files, and you can save them for yourself. But the content of like all of the links are the shortened to.co Twitter short links and Twitter were to go away. You'd have no idea what any of your links you've ever mentioned ever were. And also the images that you get are the low res images, and you can get the high res images if you know how to download them. So this guy, Tim Hutton, created this really cool utility that you can take that downloaded archive and upgrade it to stand alone with high res images and full resolved links, not shortened links. Pretty cool, right? But if you look at the way, like, how do you use it? Okay, where does it say this? I'm not sure where it is. Yeah, so how do I use it? I download my Twitter archive and unzip it. Fine. And then I download the python file to the working directory, and then I go in there and I type python that file. Wouldn't it be better if I could just it has dependencies it has to install in order for it to run. Wouldn't it be better if I could just use this as a command? So what I did is I forked this and I said, I'm going to add a pyproject.taml to turn this into a package. And then under the pyproject.taml, you say project scripts, Twitter Archive, markdown Twitter archive images, and you map into your package and then functions that you want to call. And then once you pip install this, these commands become just CLI commands. And it doesn't matter how that happened, long as your python packages are in the path, which they generally have to be anyway, because you want to do things like Pytest in black. Then if you just pip install this project, it adopts all these commands here, which is pretty cool.

47:49 Nice. Is it unnecessary to add this bin directory once to your search path because it would live somewhere under site packages, right?

48:00 Yes, exactly. And so if you have a python installation and you try to pip install something, you'll get a warning that the site packages are not in the path. So you do have to do that. And then go one further. You could use pipx. I don't know if you played with pipx. pipx is awesome. So it'll generate the package environment and install the dependencies in an isolated environment, and it'll set up the path. If you just say intro path, then to view pipx, install the thing with the commands in it. Those automatically get managed and upgraded by Pipex as just part of your CLI, which that's a perfect chain, but you've got to have a formal package and like a place to install it from git or PyPI or whatever. But it's still like a neat pro level type of thing, I think.

48:43 Yeah, you can take this pretty far, make it really professional, and before you know it, you start maintaining it.

48:50 Yeah, exactly. Why am I doing PR is on this silly thing? I don't know.

48:54 Yes, but just to clarify, if you say for a one off or a two off, you want to make something that is reproducible.

49:02 Right.

49:02 So reusable commandline tool, not reproducible, reusable. You don't really need any other packages. You can use Sys Argv

49:12 Right.

49:13 You import sys and then you have your Sys argv.

49:15 I do that. I do that a fair amount of times. Yeah, it's only for me. I've created an alias so it always gets the right argument. There's like there's no ambiguity. Sysrv bracket one, let's go.

49:26 Exactly.

49:27 We've talked a lot about sort of around all the cool things we can do with the command line, but in your book, you actually talked about a bunch of surprising tools.

49:37 One of the things you talk about is obtaining data. And you hinted at this before, like you can just use curl for downloading those kinds of things, but if you get a little bit farther, like under Scrubbing Data, you talk about rep and AWK, that a lot of people maybe know. Then if we go a tad further over to say, exploring data, then all of a sudden you can type things like head of some CSV file and it kind of does the same thing as Jupyter. Or there's things like CSV cut and SQL CSV.

50:09 CSV.

50:10 SQL. Talk about some of these may be more direct data science tools that people can use, right.

50:15 So let's see them. Where to begin? You mentioned a couple of tools, right? The head and Auk and grep. Those are, you know, I would consider them the classic command line tools, right? I would say core utils, gunkore utils. Right.

50:33 If you have a fresh install, then you can expect those tools to be present if you're not on Windows. So those tools, they operate on text, on plain text, and they have no notion of any other structure that might be present in this data. Say, CSV for when you have some rectangular structure, or JSON, when you can have a potentially deeply nested data structure. These tools know nothing about that. That doesn't make them entirely useless, right? There are ways to work around them, around that issue, but there are nowadays plenty of tools available that are able to work with this structure. And one of them is actually a suite of tools. It's called CSV Kit, and you can install it as a Python package through Pip, which of course we do at the command line.

51:27 CSV Kit you say, yeah, exactly.

51:30 And then you get a whole bunch of tools that understand that lines are rows, the first line is a header, and all these fields are delimited by default by a comma. And then you can do things like extract columns or sort a file according to a certain column. So this is more difficult for when you're working with core data utils. And of course, all of these things you can do in Pandas, and it might even be faster in Pandas as opposed to these CSV tools, not as opposed to the classic command line tools, but in order to get started with Pandas. Right, just imagine that you're given this file by your colleague and you're asked to quickly some things together in order to just get started with Pandas. What are the things that you need to do? Fire up Jupyter Lab, import Pandas, and maybe a bunch of other things. There is of course also a time and a place for that. Definitely.

52:37 I always use the tool that gets the job done. Don't get me wrong here, but it's just so incredibly powerful.

52:44 If it solved the job to just whip up a command on the command line using a couple of tools there. If you're going that route, then CSV Kit is not the only suite of tools that you should know about. XSV written in rust. But yeah, you shouldn't care about that because the command line doesn't care. It's generally faster. One thing that CSV Kit can do, by the way, and I'm actually kind of proud that I have been able to contribute that tool to the suite of tools, is CSV SQL and it allows you to run a SQL query directly on the CSV file. So if you are familiar with SQL, then you can leverage that knowledge directly at the command line without first having to create a new database and import that CSV file in there and so forth.

53:36 So one of the things you can do on the command line is basically just give it like, here's a SQL light file database, and now go insert all the things from the CSV file into it. Here in this example, it has this create table statement. Does it figure that out from the CSV or do you need to write that?

53:53 It figures it out. Yeah, it looks at the first, say, 1000 rows and then figure out like, okay, this is a number, this is text.

54:02 I see.

54:03 Yeah.

54:03 Cool.

54:04 But I was actually talking about the other tool and that's SQL, two CSV. I always mix those up.

54:11 The reverse. Yeah.

54:12 Yes, exactly. This one and there, it still uses SQLite under the hood. But you don't need to worry about that. It takes care of all that boilerplate for you. You just say, okay, now select these columns from standard input. Order them by this column. This is the file. Or I've piped.

54:31 Yes, that's cool.

54:31 Yeah, it's pretty cool.

54:33 Yes. Maybe you've got like some production database and you want to filter out. I just need this table with this particular query, right. I only want to focus on my region of this data. Give it to me as a CSV file, then you can go work on it all you want. You don't have to be connected to the database or near it or any of those things, right? Potentially, if it doesn't have any sensitive data, you could share that, right. You would never share the connection straight to your database. That would be insane.

55:00 Yeah, exactly.

55:02 Okay, very cool. So what are some of the other tools?

55:06 If I go back to the CSV kit, you can see there's some of these you talked about there's into CSV. That one takes an Excel XSL or XSLX and converts it to a CSV just on the command or the terminal. Right?

55:22 Yes. Okay.

55:23 Also, I should point out that I'm not the author of CSV kit.

55:27 Right.

55:28 I just contributed very small portion to it because of the ingredients that were already there. Still proud of it, though. But it's being created by many other people.

55:39 Sure. Of course, some other things it has is, like, CSV stat and CSV rep. Yeah.

55:46 A lot of cool command line options to point out these things. Right.

55:50 I pulled out some others rush. So one of the areas basically plotting, we are basically out of time, but I want to talk about two things really quick.

56:01 Right.

56:03 Which chapter did you put under where you have the pictures?

56:09 Seven visualizing exploring data, and then yeah, here we go.

56:14 Tell us a little bit about this. Like you can plot stuff in your terminal.

56:18 Yeah, it's kind of crazy. I should say that Rush is a proof of concept. It's one of those projects that have a lot of potential, but don't necessarily have enough users, and I don't necessarily have enough time to maintain it properly, but it does prove the concept. Rush the name. I mean, it's for when you're in a rush. It's r on the shell. And what it does, it leverages R under the hood. And for plotting, it leverages a particular R package, ggplot Two, which is the data visualization package for when you're working.

56:57 With R. Yeah, kind of the sibling where Matplot Lib is a little bit derived from that, I believe.

57:02 Right, well, now you're mentioning that actually map lib is very different.

57:09 Matplotlib is very low level and gives you a lot of flexibility, but also requires a lot of work. Now, if you want to visualize data and Python in a similar way that ggplot uses, then I can recommend Plotlines. So that's a Python package that is modeled after ggplot. Two's API. But that was a little segue there. Now, somebody else created a back end for ggplot that allows you to create visualizations on the command line. What I then did was create this interface so something that would translate arguments and their values to the appropriate function. Call and also does a lot of boilerplate when it comes to reading in the CSV file that you provide.

58:00 Right.

58:00 If you were to do this in R itself, it would require, let's say, about five lines of code in order to get started.

58:09 Right.

58:10 And the same holds for Python.

58:12 Right.

58:12 So similar concept. Right. Import the appropriate packages or modules, reading in some file, and there's all this set up. Again, That is probably what you want when you want things to be a little bit more robust. But when you want to get stuff done quickly, it really helps to be able to do that as a one liner on the command line. And this is where Rush would then come in. So I make use of all this elaborate machinery in R just to use that at the command line.

58:45 A beautiful little wrapper around this complex thing. But how you ride the complexity, right, exactly.

58:52 So you can do beautiful, like, bar plots. There's a lot of neat stuff in here. I really like this.

58:58 It is really nice. And now that I see this again, I get excited again. There is definitely potential there. But you know, it's again, yet another open source project that has to be maintained. And unfortunately my time is limited like everybody else's.

59:17 Yeah, of course.

59:19 The last thing we have time for is this polyglot Data Science. Tell us a little bit about this.

59:24 Yes.

59:25 So polyglot data science is the idea that in order to get things done, you might need to use multiple tools, multiple languages, really. And throughout the book up until then, up until that chapter, we have mainly been focusing on using other languages from the command line. But this chapter considers the other way around. Right, using the command line from another language. So there might have been a situation where you're working in Python and then all of a sudden like, now I got to do this regular expression or I got to do some globing or I have to call this other tool that is not written in Python but can be called from the command line.

01:00:12 Right.

01:00:12 You maybe use subprocess module for that. These are situations where you want to leverage the command line, where you want to break out of Python and do parts of your computation on the command line. And in that chapter, chapter ten, I demonstrate this not only for Python itself, but also in other languages and tools, including Jupyterlab, where you can pass around, say, a variable as standard input or also retrieve the output then so that you can continue working in Python again with the output.

01:00:49 What is still very interesting to me is that even new languages and tools somehow still offer a way to leverage the command line. So Spark Apache Spark has a PyPI method where you can pass an entire data set RDD through a command line tool. And I think that is just maybe it was just a fun little hack, what the authors did. I don't know. I tried to view it as a compliment, like, okay, sometimes we just need to go back to the basics and use CMD because once you're there, you're back in this environment where you can use everything else.

01:01:32 So everything we've spoken about so far is now accessible as a command, be it go Python or your own script or whatever exactly.

01:01:40 So let's say you've come across this really nice tool, but it's written in Ruby. Oh, no. What are you going to do?

01:01:50 Are you going to all of a sudden become involved in Ruby? No. Assuming that this tool can be used from the command line, you can of course relax, just use the subprocess module and still incorporate that Ruby tool into your own script. That's the idea.

01:02:06 Yeah.

01:02:07 I do want to maybe point out just really quickly here, this has got a little bit of a little bobby tables warning asterisk by it.

01:02:16 Right. So, for example, one of the things that's awesome here is I could run Jupyter console as you show. And then if you say exclamation mark command, that pumps it straight to the shell. So you could say bang date, and it will show you the day. You could say bang, pip, install, upgrade request and that'll go and execute that command. Don't do that with user input.

01:02:36 Right. Because who knows what they're going to send you. You can also do that within Jupyter notebooks, you point out. Right. So you can do percent bash and then some interesting complicated thing there, right?

01:02:50 Yeah, that's the deal. The magic command that you can use in Jupyter notebook.

01:02:55 Right.

01:02:57 Bash yeah.

01:02:58 And so then you take what's left of that and then you head over to explain shell and figure out what the heck it means.

01:03:05 Yeah.

01:03:05 Maybe do that before you run it.

01:03:07 Yeah, that's a good idea. And then also in Python using subprocess, it's something that I've done several times. I need to automate generating some big import of, say, 150 video files across a bunch of directories to build a course that we're going to offer well into the database I have to put how long is each one of those? I have no idea how to get the duration out of an MP4 or MMOV file. You know what, there's a really cool command line program I can run. It will tell me. So I just use subprocess and call that and then I can script out the rest in Python. And subprocess is not to be underestimated, I think.

01:03:45 Yeah, exactly. Now it makes a lot of sense. At a certain point, shell scripts can get a little bit too hairy to work with being able to automate your things and use Python as your superglue.

01:03:58 Right.

01:03:58 So a little bit stronger than duct tape, I think. Makes a lot of sense.

01:04:02 Yeah.

01:04:03 We talked to the beginning about how you're in this exploration stage and you just want to just run a bunch of stuff on the command line and figure it out. But when you go to production and you said whatever that means, this could be one of one thing that it means we're going to write formal Python code and then use subprocess to kind of bring in some of this functionality, potentially.

01:04:20 Yeah, exactly. I mean, the command line is by definition very ad hoc in nature. Still, if you're doing things in production, meaning you're interacting with other environments with servers, or you have some kind of continuous integration going on, there are these places, these are places where the command line keeps popping up, right, so even there so it is useful to at least be comfortable with this stark and unforgiving environment.

01:04:51 I think it's really excellent. I think there's a lot of cool stuff that we talked about. I think there's a lot of value for people to learn this. I guess maybe we close this out with just one comment that I remember from your Strata presentation. You said the command line is like wine. Maybe it takes a while to appreciate, but it gets better with age. Certainly. My first experience was like, okay, I'm going to go from Windows and Mac dev development over to setting up and running servers over SSH. It was like, I am beyond lost. I have no idea even just how to get started right. Many years ago. And now it's like, well, of course, that's a beautiful way and it just slowly build up these skills and it's really lovely.

01:05:32 Yes, it is.

01:05:33 No, it took me a long time to get comfortable with the commandline, actually, or Linux, the more generic sense. For the longest time I was running Windows and Linux in a dual boot machine and so I just couldn't make the jump. And this was over ten years ago, but no, it definitely didn't come overnight and I wasn't born with it. So I also believe that everybody is able to embrace the command line, if you will, but you just got to make yourself a little bit comfortable there as well. We talked about that in the beginning, right? The right terminal, the right aliases can get you a long way.

01:06:09 They get you so far and tools like OhMyZsh and some of these others, the fast that will help you remember the thing you needed to type or like you said, aliases and kind of bring it all together. Like, I know I did that thing. Let me just do a quick search for there it is.

01:06:25 Five weeks ago I ran this and this is how I restart the web server. Oh, yeah. Now I remember.

01:06:30 Yeah.

01:06:31 Yeah, I can talk about this all night.

01:06:33 I think we're probably out of time though. Let me ask you the final two questions before you get out of here. You're going to do some editing or write some code. What editor do you choose?

01:06:43 These days I am torn between Visual Studio code doom, Emacs and Neovim.

01:06:50 But wherever I am in these editors, I always have my Vim key bindings set up. So it kind of depends on the project. But yeah, as long as I have my Vim key bindings, I'm happy.

01:07:02 Yeah, absolutely. And then ask notable PyPI or Library, but maybe broaden a little bit if you could recommend one tool, one library people could install for the command prompt of the shell. What would you say?

01:07:18 One tool or one command line tool that they could install on the shell?

01:07:24 Just something it doesn't have to be the most popular.

01:07:26 If I ran across this, it was delightful. People should know about X. Yeah.

01:07:31 New parallel. Let's do it.

01:07:33 We talked about it, so it doesn't require any further explanation. It's a tool that makes every other tool way cooler. So yeah, if you have that one in your arsenal, you can become very powerful.

01:07:48 Good recommendation. All right, well, final call to action. People are excited about this. They want to learn more about it. What do you tell them?

01:07:53 Yeah, a couple of things they can do. So my book, Data Scientific Command Line is freely available, so the second edition came out a year ago. You can read it for free on Datascience@commandline.com. I also offer a cohort based course that I do twice a year. The next cohort is coming up in April, and this is we have six live sessions and then I will help a group of researchers and developers embracing the command line. It's a very different experience than reading a book. If you want to know more about that, then also Datascience@commandalline.com has a link to that.

01:08:36 Apart from that.

01:08:37 Yeah, I mean, if you just follow Hacker News, you'll come across now that you're aware of all these tools, you'll come across quite a lot of tools every now and then. There's not a week in which there's not a tool being mentioned. There are tools being developed every day. Even though the technology is over 50 years old, it's impossible to keep up.

01:08:58 It's only getting cooler.

01:08:59 It is only getting cooler, definitely. So yeah, that's my recommendation there.

01:09:05 Alright, fantastic. Well, thanks for being here. It's been great. Congrats on the thank you, the book and putting all this together.

01:09:10 Yeah, thank you very much for having me.

01:09:11 Yeah, you bet.

01:09:12 Bye bye.

01:09:14 This has been another episode of Talk Python to me. Thank you to our sponsors. Be sure to check out what they're offering.

01:09:20 It really helps support the show.

01:09:22 Take some stress out of your life. Get notified immediately about errors and performance issues in your web or mobile applications with Sentry. Just visit Talk Python FM/sentry and get started for free. And be sure to use the promo code talkpython all one word. Starting a business is hard. Microsoft for Startups Founders Hub provides all founders at any stage with free resources and connections to solve startup challenges. Apply for free today at Talkpython.FM/Foundershub want to level up your Python? We have one of the largest catalogs of Python video courses over at Talkpython. Our content ranges from true beginners to deeply advanced topics like memory and Async. And best of all, there's not a subscription in site. Check it out for yourself at Training.talkpython.FM be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the itunes feed at /itunes, the GooglePlay feed at /Play and the direct RSS feed at Rss on talkpython.FM.

01:10:25 We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talk python.fm

01:10:35 YouTube. This is your host, Michael Kennedy. Thanks so much for listening.

01:10:39 I really appreciate it.

01:10:40 Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon