Monitor performance issues & errors in your code

#359: Lifecycle of a machine learning project Transcript

Recorded on Tuesday, Mar 22, 2022.

00:00 Are you working on or considering a machine learning project? On this episode, you'll meet three people from the ML Ops community, Demetrio Sprinkler, Kate Guznikov, and Vishnu Rachakonda. They're here to tell us about the life cycle of a machine learning project. We'll talk about getting started with prototype types and choosing frameworks, the development process finally deployment and moving into production. This is talk Python to Me episode 359, recorded March 22, 2022.

00:39 Welcome to Talk Python to Me, a weekly podcast on Python. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy and keep up with the show and listen to past episodes at talkpython. Fm and follow the show on Twitter via @talkpython. We've started streaming most of our episodes live on YouTube, subscribe to our YouTube channel over at talkpython. Fm /Youtube to get notified about upcoming shows and be part of that episode.

01:05 This episode is brought to you by Sentry and their awesome error monitoring product, as well as the Stack Overflow podcast bringing you stories about software development.

01:15 Transcripts for this and all of our episodes are brought to you by Assembly AI. Do you need a great automatic speech to text API? Get human level accuracy in just a few lines of code? Visit talk. Python.Fm/assemblyai.

01:28 Kate, Vishnu, and Demetris.

01:30 What's up?

01:31 Welcome to Talk Python to Me.

01:32 Hey, how are you doing, man?

01:34 I'm doing great. It's fantastic to have you all here psyched to talk about ML Ops and your community and some best practices and tooling and all that kind of stuff. And I think it's going to be a lot of fun. So hopefully you're looking forward to it as well.

01:49 Yeah, for sure. So think about one of the hottest areas of technology right now, whether it's trying to get a job or it's VC funding or whatever. Right. The machine learning and AI is like one of the peak buzzwords right now. And so I think it's going to be a fun conversation, both to talk about this hot topic, but also maybe to demystify it a little bit.

02:09 Yeah, I hope we can do that.

02:11 I think we can. Definitely think we can. And you all built this cool community, Mlops.community, which we're going to talk about a lot. But before we get into that topic, let's just start with your background and how you got into machine learning and all the stuff. Keith, let's start with you.

02:26 All right. So my background is probably not so conventional for the field, although nothing is conventional for ML, I guess, at this point. So originally I actually started studying business and economics back in the day, and then I slowly shifted into basically digital analytics for marketing and stuff like that, at which point I realized that I'm more interested in the numbers than influencing people. So I went through a boot camp, and then one thing led to another through consulting, internships and so forth. Once you get your foot in the door. I guess it's easier to establish yourself. And then I ended up continuing on that path. Mostly data science. So not crazy robots or anything.

03:06 It's still very self driving cars right now.

03:09 No selfdriving cars yet. So it's still mostly basically ML for commercial purposes, like working with sales data, with tech data and so forth. But that was it.

03:20 I think there's a lot of people that come from economics in that area and get into sort of get into computation and then get into data science. It's a bit of a gateway, I think.

03:29 Yeah. Economics is also kind of this easy choice of subject. When you're finishing school. You're kind of like, I can do maths, so I'm not going to do just social science. Okay, let's go for economics. And then you're like, what can I actually do with it? Do I want to work in a bank or maybe something else?

03:46 Yeah, exactly. I had a similar experience with my math degree, but. Exactly, yeah, fantastic. So you said you did a boot camp, which I think is interesting when I think boot camps. I think JavaScript front end not necessarily the best thing to just produce a whole world full of JavaScript front end folks. But that's sort of the main boot camp story. It sounds like yours was in data science. Was that a good experience? What was that like?

04:11 So we actually did have those paths as well with Java and front end. I just decided to go for the data science one. I think they even called it Big Data and AI or something like that. But technically it was just like pure nice Jupyter notebooks where you're massaging the data a little bit and then running your first couple of models. It was very nice in the sense that it was like all of us struggling together. It was very intense. So you kind of can test drive your idea of whether it's actually for you or not in a very short time. And hours lasted like, I think three or four weeks. And after with that internship, the same company, sorry, boot camp would get you into an internship. If you pass the whole bootcamp, you pass the last test, do the project, and then do the interview. You can be a candidate there. And I think that was very appealing to me because it had some consequence. It wasn't just, oh, let's try this out. And then I don't know what I'm going to do, who's going to hire me? Whatever. It was very well thought out. So in that sense, I think it's a great program that they have put together. I think they're not doing it anymore specifically for data scientists, because they found out it's very hard, even in consulting, to employ very junior data scientists and get value out of them. So now they're training people in data engineering and then a little flavor of data science. So that later you can develop the skill and maybe apply that knowledge down the road.

05:35 Yeah. Interesting. I do think it probably is a little bit challenging to get in as a junior because there's not a huge team of data scientists a lot of times interested.

05:44 How about you?

05:45 Yeah. So I fell into this very happenstance kind of way, which was I was working on the sales side of a company that was selling tools to machine learning engineers. And so I was that guy that would spam people. And so if I spammed you, please forgive me back in the day if I tried to reach out to you on LinkedIn and I tricked you into connecting with me. That was my past life. I have repented for my sins. And now I do the community stuff full on. But yes, the company was doing ML tooling, and we were really focused on provenance and trying to be able to reproduce your runs or your data, your models, everything that comes along with the modeling side of machine learning. And so what happened was that when the pandemic hit, that company went out of business. And just about three weeks before the company went out of business, nobody was picking up the phone, nobody was connecting with me on LinkedIn. And so our CEO said, why don't we try and do something around the community?

06:54 But people don't want us to come to them. Maybe we can make a place for them to come to, right?

06:57 Exactly. And so our CEO, I got to hand it to him. His name is Luke Marsden back in those days, and he was the CEO of this company called Dot Science, which is now defunct. He said, you know what? He has open source in his blood. Like he can't help but do the open source thing. Even though this company wasn't open source, he wanted the community to be very open source and vendor neutral and open to anyone. And in the beginning, I didn't understand that. I was like, wait a minute, you're going to let our competitors come in here and not only that, you're going to let them give a talk? No way, man. This is crazy. And so he was all about it. And he helped me to understand the value of community and what it's about. And then because of that, those first weeks, I just was interviewing people as a podcast host because I needed to ramp up. And so I was interviewing people basically, I was learning on the air and I was talking to different machine learning engineers when we would have the meetups and we would run it like almost like a podcast meetup. And it would be live podcast, kind of not too dissimilar to what we are doing right now. And then I was able to talk to enough people, and then more people started joining the community. And then I started learning more about it. And now here we are two years later, and I'm still running with the community. It became something that is like my pride and joy. And I can't see it's part of my identity now, which I don't know if that's a good or bad thing.

08:22 But it's part of it sounds really fun. And these communities are super fun to build in two years. That's a long time to be building. And so it's the kind of stuff that doesn't necessarily explode overnight. But if you sort of take a long view and look back like, wow, look what we built. That's pretty neat.

08:39 Yeah, totally. And Vishnu was in it at the very beginning, and Vishnu and I talked quite a bit and it was like I remember in the first couple of months I was sitting there and thinking, wow, if we hit 600 people in Slack, that will be like, uncomprehendible. And like you said, it's just grown from there. And you look back and you go, how did we get here?

09:05 Wow.

09:06 This is incredible.

09:07 Yes. I think one of the things that's easy to lose sight of for creators and community members and organizers and stuff, it's really easy to get tied up. You look at like a famous YouTube person with a million views on their videos or you look at some crazy TikTok thing or something, but 600 people, that's pretty close to a keynote for a lot of conference. That's like the Premier group where you try to get them all together. That's pretty amazing.

09:35 That's true.

09:36 From there, which is awesome. Yes. Vishnu, how about you?

09:39 Well, how I got started in ML, I think if I really had to trace it back, it was maybe around 2017.

09:47 Around that time, I was really kind of thinking about what I should try and do in my career. I was really passionate about I am continued to be very passionate about healthcare and biotech and sort of life Sciences field. And I was reading a lot and kind of thinking, what do I want to do with my career? It was maybe kind of similar to Dimitri. I was actually contemplating some of my reading more sales development or product management side of things in the industry, but I was like, it doesn't feel tangible enough. Doesn't feel like I'm actually helping make people healthier in a very hands on way. And so I started to read more papers as part of a master's program on the impact of machine learning in health, healthcare and biotechnology and decided, okay, this is what I want to do and kind of just spend some time teaching myself how these things work. Took some courses, became a machine learning engineer, a medical device company, and quickly after starting that, realized, well, there's a lot more to doing model, a lot more machine learning than just doing model fit. And that's when I joined the MLOps community back when it was about 300 people in May of 2020 have been in it since Dimitris and I are pretty tied to that. And that's kind of tracked by growth as machine learning, engineering understanding. Okay. How do we go from model to model and production to system to business value? And that's kind of been my journey in terms of this industry to date.

11:04 Yes. Super neat. I think around 2017 was certainly when the promise of machine learning for healthcare really started to gain public awareness. You had, like, I don't remember exactly the timing, but like, that X price around mammography images and stuff like that. We're like, this is starting to beat doctors. Right. And it's automatic. Like, could it be both fast and sort of beating radiologists estimates of whether or not someone has cancer and things like that? And that's pretty neat. I haven't been tracking it that closely, those sorts of developments, but I'm sure it's just grown from there.

11:39 Yeah.

11:39 I mean, I think around that time, it's a good memory with XPRIZE that was sort of available. There was also Chest Xray work that was open source from the NIH, big data sets around.

11:51 What is the promise of machine learning modeling for that problem? Diagnosing, pneumonia. And then also AlphaFold came out around that time for deeply and for biotech protein engineering on all sorts of different vectors. You had a lot of energy, and I guess I was sucked in.

12:05 Yeah, that's cool. I suspect that the protein folding understanding is even more important, really, than you have this problem. But, like, here we could understand how to fix it, maybe.

12:14 For sure.

12:14 Yeah.

12:15 I think the protein folding stuff, especially with the open sourcing of AlphaFold Two recently, the buzz is pretty incredible.

12:21 Yes, absolutely. Well, let's start off by talking a bit about the ML Ops community some more, and then we can dive into a couple of layers of best practices and machine learning in production and stuff like that. So maybe Demetrius, we talked a bit about it, but you want to introduce what it's all about for folks.

12:40 Yeah.

12:42 So we're really trying to just be like, this landing page is funny because it's something that we put together whatever years ago. And now we definitely need an upgrade. And so we're doing that right now because I realized after I surveyed the whole community, a lot of people are like, wow, we're in this community. We're in the slack. But we didn't realize that there is newsletters you got going on. You've got roundtable discussions happening there's, reading groups. There's all kinds of different initiatives that people from the community are doing. And we don't actually make that very clear for people when they join. Right. So maybe they know about the slack or maybe they know about the meetups or the podcasts that we have, but that's it. And they don't necessarily get to go through all of that and have a better overview. So that's what we're doing now. And really, like the community for us. It's trying to find this place or trying to define and create this space where people can come together, they can learn. They can engage with practitioners who are on the front lines and get their questions answered, but also meet others and network with people. And we're doing that in this virtual space. We're trying to really share, collaborate, learn, and trying to make it fun also. So it's not just like number crunching numbers and stale and boring.

14:11 This portion of Talk Python is brought to you by Sentry. How would you like to remove a little stress from your life? Do you worry that users may be encountering errors, slowdowns or crashes with your app right now? Would you even know it until they sent you that support email? How much better would it be to have the error or performance details immediately sent to you, including the call stack and values of local variables and the active user recorded in the report? With Sentry, this is not only possible, it's simple. In fact, we use Sentry on all the Talk Python web properties. We've actually fixed a bug triggered by a user and had the upgrade ready to roll out as we got the support email. That was a great email to write back. Hey, we already saw your error and have already rolled out the fix. Imagine their surprise, surprise and delight your users. Create your Sentry account at talkpython.fm/sentry and if you sign up with the code talkpython all one word. It's good for two free months of Sentry's business plan, which will give you up to 20 times as many monthly events as well as other features. Create better software, delight your users, and support the podcast. Visit talkpython. Fm/sentry and use the coupon code talkpython. I'll throw an idea out there. You'll tell me what you think. I already sort of hinted at this for software developers. A lot of times they're on teams where there's a group of five or ten of them working together. But I feel like that's less true for machine learning folks in more of a specialized role. Not always. I know there are teams and stuff, but a lot of companies have a data scientist or one or two data scientists. And it seems to me like communities like this would be even more valuable than, say, like a software development, one where it's maybe easier to find in your environment at a meet up or at a company that you already work at or something like that.

16:03 Totally.

16:04 I resonate with your description there, Michael, because that was me when I found the community. Right? I was a machine learning engineer who was putting together some models that I thought could be used better by the company that I was at, could be integrated into our hardware device a little bit better than I thought.

16:21 That was still currently being done and trying to talk to our software engineers and realize, hey, there's kind of just like a translation gap. What is this field of taking a model that exists and has some value in terms of accuracy and precision or whatever else, turn that into an artifact that's consumed by software teams and then ultimately by business teams. And I read about ML Ops in 2020, but was really not clear what exactly that meant. It felt more like a buzzword than it did like a deal at that time. And I joined the community and essentially became real time stack overflow for me, right where people are asking and answering questions daily about how do you actually implement production ML, what does it look like, what are the best practices, and how can we help each other do that at work and be better ML practitioners day to day? So it's been great from that standpoint.

17:09 Yeah. Fantastic. Kate, how about you? How did you come to the community?

17:13 I'm not actually sure exactly, but I think it was also the whole pandemic. He tried to connect to the world and including envelopes. And at the time, a friend of mine who was also starting a community in Riga, it wasn't MLOPs, it was data science in my hometown, because again, we couldn't meet in person. So we did that along the same time. We discovered that there are other select communities, and it was just fun.

17:36 We had a different focus. So it's just fun to be kind of part of both and in different directions. And I do agree with the stack overflow notion sometimes, especially if you're working alone on a problem, you just go there to feel like you're not crazy. First of all, to think that what you're thinking. And secondly, just basically, I can search right now, there's such a huge knowledge base because so many questions have been asked, so I can go there to quickly validate some ideas, see which angles have been taken into consideration. So in that sense, it was fun and definitely the networking aspect. Luckily, Berlin is full of ML people, and that's definitely one of the places you can reach out for questions. I had some people reach out to me ask about what are the companies supply to? Are there any jobs you want to refer me? And vice versa. So it's definitely fun.

18:24 Yeah.

18:24 Nice. I don't know why this is, but you're right, it does seem like Berlin and sort of generally Germany as well.

18:33 A lot of times I'm talking to people about data science and machine learning. I'm like, wait, you're both in Berlin, all the places in the world. It does seem like a hotbed of machine learning. That's pretty cool.

18:44 And there's a lot of companies that just source their talent here and work in completely different markets because they know that there are people here that people will be willing to relocate here as well. So that's definitely adding to the whole dynamic.

18:58 Yeah.

18:58 Absolutely.

18:58 It's a cool place to live. And I think probably a lot of folks in Germany, I know the education system there. They end up very good and strong backgrounds in math and stuff compared to other places in the world like the US. Certainly coming out of high school there, you come out with a very strong math. I know that much. So pretty cool. All right, well, let's start by talking about I wanted to cover sort of your thoughts on some different layers or stages of a machine learning project so we could structure a conversation around that. So I wanted to talk about prototyping, come up with ideas, choosing frameworks, things like that, and then talk about building the project with development and then running and maintaining over time the whole Ops, sort of machine learning and production side of things. So let's start with prototyping. So I guess the first question I wanted to ask and this might be, I don't know, you all can laugh at me or whatever, but deciding is a problem, actually a machine learning problem, because there is so much hype around AI and machine learning that sometimes I think a series of if statements is called artificial intelligence. I remember there was this scandal. I think it was in the UK. There was some airline company that was specifically finding families traveling together and booking them apart from each other and then charging the money to book themselves back like a premium so they could sit with their family like, sorry, your seat didn't get put next to your family, whoopsie but you can pay $50 or 50 quit or whatever and get back by your family. Right. And it was in the lawmakers. They were like they're using artificial intelligence to do this.

20:34 Terrible.

20:35 That sounds like an If statement.

20:36 Yeah.

20:37 I think deciding should I just encode some decision making into software, which we often call programming, versus is it a real machine learning problem where neural networks or NLP? How do you all think about this is a machine learning problem versus this is just a software problem, traditional style.

20:56 I'll say one thing just before I let Vishnu and Kate jump in, and it comes from one of the community members, Eugene Yann, who has a famous tweet and blog post about the first rule of machine learning is don't use machine learning. And so if you can avoid it at all costs, it's probably better just because you're adding additional complexity. And if you don't need to do that, then it's going to be infinitely easier on you to not that being said, there are a lot of times where you do need to add it, and there are a lot of businesses that are realizing business value from it. So that's kind of a hand wavy answer. But I'll see if Bishner or Kate have anything more the concrete.

21:43 Yeah, absolutely. Also, I just want to give a quick shout out. I did have Eugene. On the podcast in March of last year, we talked about the seven lessons that machine learning can teach us about life or something like that, which is pretty. He's a very thoughtful guy.

21:56 Yeah, he's great.

21:57 He is awesome.

21:59 Your question is about whether or not to use machine learning versus maybe more traditional approaches, is that correct? Right.

22:05 Someone comes to you with problems that we need to make our API do this or our website do this or our app do that. And how do you decide? Like, this is a machine learning problem versus just you just need to decide if statement or switch statement or whatever.

22:19 I deal with this problem a lot in my role right now. I'm a data scientist at a healthcare services company, actually the first data hire. And a lot of times I have people come to me that says, hey, this is a clinical population. This is a group of people that we want to serve.

22:33 How do you think about delivering our intervention to them? I'm speaking generalities, but I can talk a little bit more specifically some examples of this later on. And to me, what matters most is driving value on a single business metric. Because to be honest, that's the simplest way that your company does better. And you do better. Right. It's not through, let's say, implementing the most fancy algorithm. We're not working in research environments anymore when it comes to machine learning as often as we used to. Right. If you're in a research role, be my guest. Use the most advanced algorithm or whatever the coolest latest model might be. But for me, the number one thing is, what is the business model metric I'm trying to drive? Is that engagement for our population? Is that some sort of quality metric in the health care sense that I'm trying to change? And then what is the simplest possible way that I can advance that number? Right. Is it just like a standard SQL query I run? Is it some kind of template sequel query, or is it a machine learning model? And the idea is simple, is reliable and scalable, much more so than something complicated. And that's kind of the way that I triage whether or not machine learning is the right intervention for a problem to kind of recap it's like, what's the business model metric I'm trying to push? And then what's the simplest possible thing, knowing that it's something that needs to be reliable and maintainable for the long term? Yeah.

23:54 Excellent, Kate.

23:55 I agree with Vishnu. Well, my advantage as a newcomer to this career path, I guess, was that I was just unable to go very fancy in many of the cases. So it was a no brainer to go, what is the simplest way and what is actually the way I can solve the problem? And then just by having adopted that mentality early on, you basically hack the problem. Right. And then also coming from more business marketing background. Again, I'm not thinking about what is the fancy shining new tool that I want to use. Sometimes I do now because I'm bored. But this is more for the cases where nobody cares what I'm really doing and then I show them something sparkly from my side. I would say it's very important to communicate the solution rather than thinking about whether it's an ML problem or not at the end of the day. So if you give the people what they want and with the shine and sparkle that they want, they will forget whether there's AI behind it or not. Having said that, though, in consulting experience, sometimes people really come to you and they're like, I want this predictive model and just like, leave it or take it and you do it for them, even though it's not the most the best approach, or especially if you try to explain, maybe that is not the best approach, but sometimes people just need to, I don't know, maybe they have a KPI of having developed one and introduced one ML project into their teams. And this is it like, whatever way. So there are some cases I think most people can be recent with.

25:28 Yeah, the stockholders demand more AI, so we need more AI. I don't know what that means. Definitely need it. That's great. And I do resonate with the idea of just start simple and evolve. Like, so many people perceive software and software projects in the broader sense, as I have to think about it for a long time and get it right rather than let me try to build something over a day. And if that doesn't work, I'll try something else or we can evolve it over time. And maybe that comes from back when software was harder to change and the tools were criminal or whatever, but I think just jump in, try something and go with that.

26:05 Yeah, I wanted to say something real fast on machine learning especially.

26:09 And this is something that I think a lot of people don't realize until later on. And I see it time and time again from the veterans in the community coming and answering questions that are being proposed. And that is like really taking it back. When you're proposing a solution or when you're asking a question in the community, it's like, but what is it exactly what is the business value here that you're trying to affect? Because machine learning is so closely tied to business that you can't really it's like that Venn diagram is so overlapping on the business side, and it also has the technology side. But a lot of times we just take for granted that it's technology. But you can't do that. Right. Especially with machine learning. Like, if you're an SRE and you're just focusing on the slots and all of that, it's a lot easier to be like, okay, cool. This is a very technology problem, but when you're machine learning and like Vishnu said, you really got to know, what kind of metrics are you trying to move the needle on and how are you going to go about that? So that was just something that I noticed comes up quite a bit from the veterans saying, like, wait a minute, slow down before you go for that next one. What is it that you're trying to do here?

27:28 Yeah, probably someone comes in with a question, like, I'm trying to make pie Torch do this thing and I'm having a hard time, like, do you really need to do it?

27:38 Maybe it's cool or whatever, but is that really what you need in this situation? Could you just use Pandas?

27:43 That's exactly what it is. And oftentimes it's something even more complicated than that where it's like, I'm trying to use Kubernetes to do this retraining job 1000 times so I can see which of these models is most effective at this particular metric. You're like, Whoa, you're about to undertake a huge engineering project with all kinds of complicated tools. Why?

28:07 But if you ask why, people into existential crisis. So that is the interest as well.

28:13 I guess out in the audience, Kam asks or says rather, a lot of times I have researchers proposed algorithms that are too computationally complex to run at the speed we need for production. I think that's probably also an interesting thing to consider when you're doing the sort of prototyping and getting things started is maybe here's an algorithm, I'll give you a great answer, but vicinity. You talked about training 100,000 ways. Can you really spend $25,000 to decide on compute in the cloud or something to decide if this is even going to work. Right. Or could you do that in production over the data you have or something? Right. What are your thoughts on this?

28:52 I'd say, like, a lot of times in machine learning, our default in terms of technical requirements is pretty unilateral. You could say. Right. It's like increased accuracy or the precision or whatever the score is. But in software engineering and in other forms of engineering, they're much better at understanding multivariate requirements. And I think that's where you see the tension. Right. The data scientist. But this model works at the problem. It does the thing it's supposed to. It's better on this one thing. And the engineer says, well, wait, no, there are a million things I need to think about latency. I need to think about my credibility. I need to think about how big the file might be, there might be a million different things. And I think that's where you see the kind of thing that Cam is talking about.

29:30 Yeah. Interesting.

29:31 I think it comes from academia, actually, because they just recently I remember a year ago we're doing a meta science course where we discuss ML, basically model assessment and how the same researchers could not reproduce their results because they just realized they use different compute, so they will be having different results. And in each of those cases, they realize they need to bring that information in the academic paper to have comparable scenarios. So maybe you have a little compute and then this is the best model. In the other scenario, the other one is better model. And I think now that this changes in academia, it will kind of become more standard to consider elsewhere as well.

30:15 Yeah, it's a good point.

30:16 How many times have we heard of stories and I'm sure people listening have lived these stories of a data scientist coming with a model and it's like, or they get tasked with a problem, they go work on a model for a while and they come back with the accuracy or F score is like perfect, and then they give it to whoever the stakeholder is another stakeholder. And that stakeholder is like, but this is useless to me. I don't care what your accuracy score. You're not actually solving the problem here. And it just goes back to this, like make sure you're very clear on what you're optimizing for. And then another point I wanted to make, which I feel like is super important, is depending on your use case, you really have to be vigilant about what you're trying to do and what you used to get there. Because if you're doing, like, autonomous vehicles, that is a whole nother world as compared to what Vishnu is doing or what Kate is doing. Right. Even if you're just dealing with unstructured data and you have a computer vision problem that's in health care and it's deciding if someone has cancer, that takes a lot longer. You can put out like one model and it has to get approved by the FDA, and that takes a long time for it to go through that process. So you don't really care about like, updating that model in real time and gathering that data right on the fly. Right. But if you're in autonomous vehicles, that's a whole another set of problems that you need to get into. And so really, like recognizing what is your use case. What is the big things that you need to take into account as you're looking into these use cases? And where do you want to optimize?

31:53 For sure? I definitely want to talk about that trade off of building the perfect model versus evolving it later.

32:02 This portion of Talk Python to Me, is brought to you by the Stack Overflow Podcast.

32:07 There are a few places more significant to software developers than Stack Overflow, but did you know they have a podcast?

32:14 For a dozen years, the Stack Overflow podcast has been exploring what it means to be a developer and how the art and practice of software programming is changing our world. Are you wondering what skills you need to break into the world of technology or level up as a developer? Curious how the tools and frameworks you use every day were created. The Stack Overflow Podcast is your resource for tough coding questions and your home for candid conversations with guests from leading tech companies about the art and practice of programming. From Rails to React, from Java to Python, the Stack Overflow Podcast will help you understand how technology is made and where it's headed. Hosted by Ben Popper, Cassidy Williams, Matt Kiernanda, and Sierra Ford, the Stack Overflow Podcast is your home for all things code. You'll find new episodes twice a week. Wherever you get your podcast, just visit Talkpython. Fm StackOverflow and click your podcast player icon to subscribe. One more thing. I know you're a podcast veteran and you could just open up your favorite podcast app and search for the Stack Overflow Podcast and subscribe there. But our sponsors continue to support us when they see results, and they'll only know you're interested from Python if you use our link. So if you plan on listening, do use our link. talkpython. Fm StackOverflow to get started. Thank you to Stack Overflow for sponsoring the show.

33:32 So another area that I think is interesting to consider as you're getting started has to do with how much compute does it take to solve some of these problems and build some of these models? Like more than a lot of other areas of software development, training models takes a ton of energy, which can either mean money or time or both, putting aside, like, just the carbon cost of spending a lot of time on servers. But just how are you going to accomplish that? Right. So how do you think about that trade off? Like, can I train this up on my laptop or do I need to get a GPU cluster or what are your thoughts there?

34:08 It's funny. I actually think the compute part of MLOps is perhaps the most solved portion of the stack. We usually think about things in terms of data model and code in machine learning. Right. Your data is changing. You want your model to change as the data changes, and then your code is just sort of a way that you control the model that you're creating based on that data. Right. And when you think about what AWS has done over the last few years to make it more possible than ever, companies like Paperspace as well, and even Google Cloud was making cloud free and GPU instances free. It's very easy to experiment with compute, much more so than ever before. And I think that is the hardest part of the process. Right. It's freedom to experiment is usually what is the constraint.

34:56 Right.

34:57 And I'll just contrast that with data. It's very difficult to, I would say experiment in the machine learning process right now with different data sets, with trying to work with sort of synthetic data or use different cuts of data that are correlated with interpretability. That is a very much more complicated area. I would say that I think a lot of machine learning professionals spend a lot of time doing a lot of manual work on as opposed to compute nowadays, which is by and large, I think like very experimental and soft problem.

35:26 Yeah.

35:27 Excellent. Do you do any edge computing with your do you do, like, medical devices or anything, like little devices that people walk around with that you've got to put real time stuff onto?

35:37 I used to I used to work at a medical device company that was doing an imaging device, and we used to put a machine learning model on an optimistic imagery. And it was interesting. It really gave me a lot of appreciation for what some of the folks at Apple and Nvidia and Fitbit and some of the other sort of bigger companies that do a lot of machine learning on device.

36:01 It takes a particularly talented group of hardware and software and machine learning professionals to make all that stuff work together. And I would definitely suggest anybody interested in this field, like, check out what Nvidia is doing. You're doing a lot of really cool stuff from what I learned, my time doing it. I learned that simplicity is really the way that you can get things done. We were just exporting model weights to a pickle file and then doing some very low level computation to make it as fast as we need it. So that's about my experience working on that level.

36:31 Interesting. Okay. Yeah. I'm always blown away that pickles still use, but especially in this area, it's pretty interesting.

36:36 It works.

36:37 I know. It definitely does. It's easy. It's just whatever. Just save that. We'll deal with that. Let's talk real quick about one thing you have over on the community. You have a couple of things up at the top here, and one of them is called this feature store. It sounds like it might be helpful for people to get started. Do you want to tell us about this a little bit?

36:56 Yeah. So we like, set out to demystify the space. And I have so many stories of how this just was like, I didn't know what I was getting myself into when I started creating this because the ML ops first of all.

37:14 Right.

37:14 Now in machine learning, there's not clear spaces of like, oh, this is this tool, and you need this tool if you're doing X-Y-Z and here's the space that it occupies. There's like this tooling Sprawl and some tools do a little bit of this and other tools do a little bit of that, and maybe this tool does all of these things, but it doesn't get you with that. And so what I've been able to congregate on or like, I landed on a few things that are different spaces that are clearly defined. And one is the monitoring space, because I think that's just really easy for people to comprehend. Monitoring. It's like software development monitoring. But then you add the machine learning aspect to it and you add that data and then boom, you've got lots of new stuff to monitor. And then there's the feature store part. There's also like the metadata management or experiment tracking piece, and then there's the deploy piece. And like, you can ask Vishnu, I've gone back and forth on how we're going to present these things, what we're going to do in order to create should we create the framework and basically do Gartner's job for them and say, like, these are what you should be looking for in the tooling if you're looking for a feature store or if you're looking for monitoring tool. And that's what we kind of have been trying to do. And we've been working with all the different companies in the community and then also like, practitioners who use this, and we've been asking them like, hey, is this what you would expect from a feature store or a monitoring company or metadata management or deployment tool? The problem is that there's a lot of tools right now that are doing like they're specializing in optimizing compute, and then they also have deployment as a feature. So is that a deployment tool?

39:04 Yeah, it's a tricky one, but I guess the takeaway maybe is over on your website. You've got some different categories that sort of try to do that comparison to help people pick some of these tools.

39:15 Yeah, exactly. That's where me pulling my hair out for the last year and a half has been just trying to figure out because I know that if I'm struggling differentiating all of these tools, I'm not the only one. This is what I go on and people reach out to me quite a bit to tell me about their new ML Ops tool. And then I go onto their website and I'm like, what exactly do they do? It's always the same stat. You hear like 80% of machine learning models never make it into production. And that's like what they have is they're like H one on their website, and then it's like, what do you do? I don't understand. So we tried to take a non biased approach to figuring that out.

39:54 Yeah, very cool. Let's move over to the development side. So you sort of figured out your path, you decided it is machine learning problem, how you want to approach it, and so on. What are some of the recommendations or techniques that you have for people in that stage of these ML projects?

40:15 One that I hear a lot of times is for the category, I guess. Is there's a lot of software engineering practices that don't necessarily get brought into data science as often as they should, for example, like unit testing or version control or stuff like that. What are your thoughts, Kate and Vishnu especially?

40:32 I like Kate go first.

40:33 Yeah, it's a good one. I wouldn't call myself a super awesome engineer. I'm still on that patch to just be tolerant and understanding to my other folks on the team who actually know what they're doing and not to make their lives too hard. My personal super tiny hack was especially in the days where I think people are still using notebooks, but I'm using Vs code with this like hash percent sign, which turns your script into cells. And when I'm done, I just clean it up and it becomes a normal script again, or at least I can basically copy it and do that and still run it and test it out that way. And that, I think, really speeds up the process. And of course, version control. Currently I'm working with teams or just basically either data engineer or SREs who are helping me make things a bit more polished or deploy them a bit better so I don't have to think about it that much. So I'm happy to hear from Vishnu.

41:34 Yeah.

41:34 To me, components of the software engineering workflow that I suggest, data scientists and machine learning engineers heavily leverage our version control, CI, CD testing, and in general, clean code best practices. Right.

41:49 I'm not like code reuse functions rather than just top to bottom with copy paste.

41:54 Yeah. If you do those three, four things that's actually I would say like a good bit of a hard day to day stuff.

42:02 Right.

42:03 Just following those best practices and they'll really accelerate you. I try to set those things up early on in the process of running a project, setting up the repo the right way, setting up CICD to accept that makes sense. You don't want over engineering infrastructure around your project without having a project too early on. But I think those kinds of things are very helpful. And we had on a guest on our podcast, our Ops community podcast, Dimitrius and I named Jesse Johnson, who is the VP of software engineering and data scientist at a company called Due Point Therapeutics. And he has this concept known as building software from the outside in. And what that means is you start sort of at the finished start, like when you're building a product, maybe API, define what the API looks like and define what you'd like, your end user, what their experience to be like, and then build backwards to all the things that allow for that to happen. And I like to think about the same thing with the machine learning model or with whatever kind of data science output that I have, which is what I want the end experience to look like, how elegant should it feel, how reliable, maintainable and, well architected should it be, and then build inwards towards that. So that's sort of another software sort of engineering concept that I try to apply.

43:17 Yeah. A lot of people talk about how unit testing makes your code better. And I think that's actually you've touched on the key why that is because instead of just thinking about the details of the algorithm you have to think about like how is this going to look when somebody tries to use it?

43:32 Because that's what you got to kind of do in the test to use it a little bit.

43:34 Exactly.

43:35 Kate, do you want to go?

43:36 I just wanted to do a quick joke that it runs on my machine. So it must be good.

43:41 It must work. Do you know about that? It works on my machine certification program.

43:46 No. What is that?

43:48 I need this.

43:49 Oh, it's brilliant. It has some interesting rules. So yeah, anytime somebody give them this stamp and it's brilliant. To be certified, you have to compile your application because this is like a Lego compiler getting the latest version of any code changes from any other developers. Purely optional. Launch the app cause at least one path to be executed. Preferred way to do this is ad hoc manual testing. You can omit this step if the code change was fewer than five lines or in the developers, a professional opinion, the code change cannot possibly result in an error. Check in your code certified. Oh, boy. Yeah, I know. It's the problem that we all run into. Right.

44:30 Reproducibility and the shareability and so on. And you're right, that source control and CI and these things are absolutely important.

44:40 One thing dimitri is go ahead and jump in with your thought then. I want to talk about a tool here real quick. Oh, yeah.

44:44 I was just going to say that one of the guests on the community podcast made an excellent point. We've had two that have come on and talked about testing specifically for machine learning. And one of them, was talking about how when you properly test and actually Mohammed, both of them, they kind of said the same thing in different words. When you have tests set up and they're properly, there not just unit tests but also like testing the data and this kind of stuff. One of the side effects that you get from that is it's documentation. It's like you have these artifacts that are left so you as someone who's coming in and jumping in new you get to see. Okay, so this is what's happening with this slice of data. And this is when you are trying to debug something, it gives you a much better picture and it's more clearly laid out than if you were to just leave it all up to chance.

45:38 We could say, yeah. And the open source angle, if you're looking at some package or library to use for your code and you go and check it out and it has no tests, there's a good chance you're like this thing is not really ready. People are not putting the effort to make sure it's good enough to remain good enough over time and to help onboard new contributors that you really want to depend upon it. Right. So I think there's that angle that's important as well. The two tools I wanted to talk about really quick in this area is this thing called DCV, which is open source version control for machine learning projects. Maybe. And then also before that is nbdev from fast AI vishnu. You spoke about using Git and checking in your stuff and so on.

46:24 I think notebooks are great, but one of the challenges of them is they kind of save their output in the file and every run potentially be generated different output, which means some meaningless get merged, conflict problems you got to deal with. Do you have ideas to fix some of that stuff, or do you use anything like nbdev to sort of make that simpler?

46:43 Good question. I like Papermill a lot. I follow the Netflix approach for working with notebooks. I think that they're a very useful tool.

46:51 I think that if you're prescriptive give us a quick summary of what the Netflix philosophy there is.

46:57 Sure. Yeah, they should do that.

46:59 Basically.

46:59 Netflix's point is notebooks are a really effective interactive prototyping tool that if given really prescriptive guardrails, can be a part of a production process. And they use notebooks in production at Netflix, and they really encourage their use internally. And they've developed a lot of tools that help make sure that the known flaws of notebooks, such as the state element that you pointed out don't lead to things falling off the rails. And that's the way I like to think about it as well. I don't use an MB Dev myself, but I do think it's a great project that for people who are thinking about using notebooks, they should check out. I think another tool that I've heard of is called Plumber. But the most important thing is when you're using notebooks, I think rather than getting into a Holy war about whether or not the best tool or not, it's understanding that is it the right tool for my job, and I viewed the right context for that as being when iteration and speed is of the essence, there is no tool that is better than a Jupyter notebook. It's just there isn't. And then we have the empirical proof for that. So let's just find ways of putting process around it to make it work.

48:08 Sure. Yeah, that makes sense. What about tools like DCV, some of these version control systems? Part of the problem is the data sets in data science in general, but machine learning as well. It doesn't necessarily make sense to check them directly into.

48:26 Some tools like this have sprung up to allow for sort of a side by side store of your data, but it's also tracked and get, but not entirely stored and get.

48:36 Okay, this is funny because using this.

48:39 I'm trying not to, but this is exactly the space that that company that went out of business at the beginning of the podcast. This is what they were playing in and trying to make like this.

48:49 Interesting. Okay.

48:50 But yeah, I'll let Kate and Vishnu give their opinions on DVC.

48:54 It's also the kind of tool that you would have to categorize into your Gartner for Quadrant somehow.

49:02 What I found, what's so interesting is that and this is horrible to say, but it feels like people have so many other problems with getting their machine learning into production that this is almost like a secondary thing that comes afterwards. Or they don't necessarily care as much about this because it's like not mission critical or significant problems that are just like.

49:27 We got to solve this for this even matter.

49:29 That's not to say there's not a lot of people out there. I know there's a ton of people that use DVC. Their community is huge. It's a great tool. But I didn't think about that sometimes. Like, yeah, there's mission critical things that are really top of mind, and everybody likes to talk about how reproducibility is very important for them. But unless you're in banking and you need to because of the law, it's not necessarily the most important thing for you.

49:56 Sure. I would say in my experience, versioning and lidiage and stuff tends to be the province of data engineering teams a lot more than it is machine learning engineers. And the way that we think about data in a machine learning context as sort of one off artifacts that we need to associate with a particular experiment is not compatible with the way a lot of data engineering team think about data engineering teams. Think about their data sets as being sort of like a baked cake that is ready for consistent use going forward. Right. To use that sort of cake analogy that's common in data engineering, I personally use DBC. I think it's a well creative tool. It didn't exactly solve my problems. I'm a big believer. Like, if you're a machine learning engineer and you're having data version of problems, go learn a little bit about data engineering and learn about data warehouses and databases and how you can leverage the existing tools in that field, rather than trying to use maybe something that might have been spun up for a very specific purpose, or adopt a platform like a packager that has an end to end approach to thinking about the entire machine learning pipeline that does data set versioning that helps you do your experiment tracking and helps you with sort of your deployment. That's kind of my approach to thinking about data set versioning as a component of the overall ML workflow.

51:09 Yeah, we have so we don't use any specific tools. We are going as many larger companies building our own set of tools, of course, based on people who might not ever touch machine learning in their whole life with some other people who complain about how they work. So trying to make everyone happy. But essentially it's basically our data engineers trying to integrate some kind of extra metadata in our deployment tools so that we know that we can backtrack those specific training sets whatsoever to the models that we are training or deploying and so forth. So basically it's as vishnu said, in the realm of data engineers who are making things happen, they're the greatest people of the world.

51:53 Two thoughts.

51:54 One, before people get too tied up about trying to be perfect, like putting just your notebooks and your scripts into version control and straight to get is like 80%. It's got to be so much better than not doing anything. And then two, I'm fascinated by this comment, kid, about built by people who don't do machine learning for machine learning, how's that sort of interaction is that challenging to cross that boundary or whatever to work with folks at companies like that that are not quite totally experienced in what you're building.

52:25 I think it's funny for me because I come from consulting where we kind of closely work together with the engineers, and that was kind of always an overlap of your tasks. And now I'm coming to a larger organization where people's tasks are pretty well outlined and their responsibilities are more narrow. And sometimes people treat me like, I don't know, something between an alien and a dinosaur because they have never interacted with a data scientist or machine learning person and ask me questions like, I don't know, I'm going to just tell them, what's the meaning of life?

52:56 All of a sudden we have this data find the answer.

52:58 And usually it's very basic stuff, but I think it's very important to have those people in the organization. Some people call them data strategist, some people call them just find people to talk to, like the teachers who will bridge that gap and make sure that everybody's heard and translated to each other's language, to just get that set of requirements and not build what these engineers want to build or built. Just what they signed up wants to build and create that connection between the two. So that is important.

53:28 Yeah, very cool. So let's move over to the operations side now. On the upside, I guess the first thing I really want to think about is where do you run your code? Where do you run your models and APIs that back them and stuff. And I looked over around here and I see a lot of stuff about Kubernetes on the community for you all. So Kubernetes Linux virtual machines run in the cloud. Does that take too much in terms of sort of compute cost? You would rather run it locally in your own servers? What are your thoughts on this?

54:01 To me, it depends on how intensive your training efforts are and how many models you have in production. Right. Setting up a Kubernetes cluster for one model seems to me like overkill. But for a company that has tens or hundreds of models in production, it might make a lot of sense, right? To try and set up some sort of distributed training.

54:18 Architecture, et cetera, especially if someone else is maintaining that cluster for you.

54:23 It's one thing to say I want to create a babysitter Kubernetes cluster versus I want to be able to give my job to Kubernetes.

54:31 Not the same, right?

54:32 Exactly. Yeah.

54:33 So I think it really depends on what scale you're at. That's kind of what we talk a lot about in the community and confused about what scale you're at or what maturity level you're at. I invite you, please join the Slack then a lot of community slack. And I'm sure people will be able to provide some very tactical suggestions for your size. But if my answer to you is it depends.

54:52 I would say I use the path of least resistance. So whatever is there, whatever is available, what is the easiest way to source your data?

55:01 What are the connections that are already set up again? Because sometimes you work with different teams or in consulting different companies, that's about it. And if I'm lucky enough, like right now to have a cluster that is babysitted by another person, well, good. Then it's a deal.

55:16 Just take it. This runs on my machine.

55:20 Exactly.

55:20 We should like pour out some whiskey or whatever, some drink for lots of data scientists that have gone into the Kubernetes world thinking they were going to pick it up and get their models out because they needed to. For some reason they got brainwashed into thinking like, oh yeah, I should just learn Kubernetes real fast and they never were able to come back.

55:44 And it's like such a huge detour. Right. And so we talk about this a lot. And that is like, first of all, what should the team composition look like as a machine learning, like squad or team or someone trying to just get value out of it? And should a data scientist have to know Kubernetes? And so that's kind of like the joke here is like Kubernetes is a gateway drug. If you can get past it, if you can get into it, then you're probably going to go a lot deeper. But it's really hard to pick up, especially if you're coming from a data science background.

56:17 Yeah. It's a whole nother thing to learn.

56:20 Right on, Kate.

56:21 Exactly.

56:22 Pinpoint in the audience has an interesting comment that I would love to Echo as well. Also, don't forget you're not Google, right. And you're not Facebook and you're not Instagram and all these companies that are trying to run at such an insane scale that they have to come up and use very interesting deployment and DevOps, whereas maybe just running on a server is fine.

56:44 I don't know that's probably the most shared blog in the community is that you are not Google blog.

56:49 Yeah, that was a great article. Came out like a year and a half ago or something like that.

56:54 He's a community member in our community, but also very well known in the general ML community. Yakapo Talibu, who's the director at Kogyo, he's been putting out a series of articles known as ML Ops at reasonable scale. And I think it talks a lot about this entire concept, which is like a lot of what the discussion is driven by about ML is about ML at unreasonable scale. It's to Google and Uber that are doing hundreds of millions and billions of events. And for many of us, that's not the reality. And what might just pull up here, it's a great series and I encourage everyone to read feels like, hey, that's not applicable to me. What is this? Answers that. And we had them on for a podcast as well. Nice.

57:32 Yeah. The subtitle here is ML Ops without too much of a it's good. I like it. And it's the same idea. It sounds just very focused on machine learning rather than just broad software development. Do any of you all use software as a service type places? I'm thinking of stuff like Streamlit, for example, like Streamlit, Cloud, and these types of things were going to build your model and you're like, yeah, go run. I mean, Heroku would be sort of a similar website equivalent of that.

58:00 So we do use Streamline, but we don't use the cloud. We use it just as an interface.

58:06 Right. You can self host it yourself.

58:08 We offer ourselves because then identification, everything else is kind of linked to it, and then we forget. Just a nice interface.

58:16 Yeah. What are your thoughts on Streamlight?

58:17 It's awesome.

58:18 Yeah. I haven't had to use it for anything, but my experience is that the amount of code that you have to write to create these really interactive dashboards, it's super short.

58:28 Yeah. And the alternative usually in the organization is that, oh, we have this tableau dashboard. Right. And that requires licenses and training for people. And again, it has its own limitations. At some point, with certain amount of data, it just starts crashing. And this is an easy way for machine learning professionals to showcase whatever they're working on in a way. And it's been a lifesaver for us. Sometimes people just don't get what the hell you're talking about. So we were working on an LP tool that would analyze a lot of unstructured text in a lot of different ways. And we just plugged all the possible ways to visualize that results. And it just clicked for people. And then you had a nice little video next to it. It's just so flexible. It's really cool.

59:13 Yes. The nice part is it's not just a static notebook output. Right. You can sort of play with the drop down click in the places and it becomes a live little interactive thing without you having to learn some sort of front end programming framework like Vuejs or something.

59:26 Yeah, exactly. And also, on the other hand, we have developed this package to do the analysis, but this gives a no code interface for people to completely just do it by themselves. And they feel really cool about interfacing with machine learning without having that background, without having to learn PyPI.

59:43 That's cool.

59:44 Think of it like a hybrid. Right. Like build a real software development bit and then expose it to things like Streamline. Yeah. Cool. What are your thoughts now?

59:53 I agree with everything he had said.

59:55 Remember when it came out, it came out with a lot of great examples and it came out with a lot of at the time, we still TensorFlow 1.02 hadn't come out and it was really hard to work with machine learning models.

01:00:08 It was like you had these underlying graphs that you had to kind of struggle with, and people spent so much time just wrangling like, how do I get an output for this model training exercise? But to have this really neatly written package that would just take that as a back end and then let me make interactive visualization like Magic, all great tools feel like magic. And I think that's what Streamlifted. Well, as Kate said, it allows non technical users to interact with outputs of models very naturally, which I think is a really important part of connecting some of the technical business value. There's a whole set of tools like StreamLITE that have since come out that are making various parts of the sort of machine learning workflow or the operations workflow a lot easier, whether it's sort of letting you set up APIs on top of those models or visualization or all kinds of other sort of aspects. So starting revolution.

01:00:59 Yes, absolutely. For me, the magic is you write a function that looks like it just takes arguments and you don't have to deal with callbacks and interactive stuff, and it just kind of adds that into there. What are some of the other tools like this that people might be using out there to get their ML models online?

01:01:15 Fast API right now is really popular as like a way of serving your model best API.

01:01:20 Not fast. Ai right.

01:01:21 Yeah. Fast API.

01:01:22 Yeah, absolutely.

01:01:23 Fast API is really popular. This new tool I heard of called Banana.

01:01:27 Which also does this Banana Dev banana. Okay.

01:01:31 I think they're still early on. I'm struggling to remember off the top of my head right now. But just in general, this paradigm of saying like, hey, if I can write a neat sort of like modern Python library where I can write, as you said, a function with a couple of arguments and then do things that are associated with the machine learning workflow, that's a model that works. So like Great Expectations kind of does that with data testing.

01:01:53 Right.

01:01:55 It kind of works like Py test for data. And so I think what we're trying to do overall, the big picture is bring very Pythonic ways of working with code into the realm of data and machine learning models. To the extent that makes sense yeah.

01:02:09 Banana Dev is totally new to me. Interesting ML model hosting as an API instant scalable interface hosting for your machine learning models on serverless GPUs. There's a lot of words there. That interesting stuff about running in production. Are you familiar with Banana or do you have other tools like this you think you need?

01:02:30 I cannot name anything on top of my head, just like I always like out of Scotland. What's amazing.

01:02:38 Google app is the most amazing thing that ever happened. I think the data scientist apart from the original notebooks because it's right there with all your random files in the drive. You can just hook it up, test something quickly. But that's not related to this.

01:02:53 What roles do places like Google, Co Lab and these other hosted notebook places serve in the production side? I clearly see the value when I'm doing some development and prototyping and trying to figure stuff out, but would you ever try to make that the final version in production?

01:03:11 I'm not sure how it interacts with GCP production, so I tried it for my Pet projects. It was very fun because I had to transcribe some random lectures and I decided what if Google can do that for me? And I literally had it all in my drawing and ran some code in Colab, linked it to GCP and it gave me all the results. So that was pretty neat because I really didn't have to do any kind of configuration, any kind of thought into it. So for Pet projects, that was amazing for production, database sell, books kind of work again, especially if you're in Azure. I'm not sure how it interacts with other it's as convenient, but with Azure Cloud, it was just amazing. It was like Glitch. You have your credentials figured out, everything works in the background, you just do it as you would do it in a normal notebook and you can put it in Prod, no problem.

01:04:06 Fantastic. That's a good recommendation. All right, I think we are just out about it out of time. So I want to let you all get back to things before you got to run. But Kate Vishnu, thanks for being here. Demetrius had to run off, but thank you to him as well. Your final thoughts? Maybe people want to get started with the Mlops community and maybe get their models in production. Give us your final thoughts. Kate, you want to start?

01:04:29 I think the best. If you're really just starting out, go and meet people in person and the MLS community or any other community that is available in your area will give you the right boost of motivation and also knowledge and just create a support network for you to be able to get through the blocks that you will definitely have on your path. So that would be my word of advice. Don't stay alone isolated.

01:04:53 Yeah, for sure.

01:04:55 I think that's a great tip. I mean, to anybody starting out in MLOps, I would say certainly join the community and search the answer to that question because it's been asked before in the Slack community and there are a lot of great answers. I would also say like check out some established resources like I think a guy named Goku Mohindas has been like putting together a website made with ML. Made with ML is a great resource shipwean. Stanford class on machine Learning systems design. Eugene Yann Our Slack community. These are places that you can just pick up your worth of knowledge very quickly. It's organized for you and structured for you, so definitely make use of those existing resources.

01:05:32 Thank you for being here. It's been really great to hear your experience and thoughts.

01:05:36 Thanks for having me.

01:05:37 Thanks for coming by.

01:05:37 Thanks for having me too. My pleasure.

01:05:40 This has been another episode of Talk Python to me. Thank you to our sponsors. Be sure to check out what they're offering. It really helps support the show. Take some stress out of your life. Get notified immediately about errors and performance issues in your web or mobile applications with Century. Just visit Hawkpython. Fm Sentry and get started for free and be sure to use the promo code Talkpython all one word For over a dozen years, the Stack Overflow podcast has been exploring what it means to be a developer and how the art and practice of software programming is changing the world. Join them on that adventure at Talkpython. Fm StackOverflow. When you level up your Python we have one of the largest catalogs of Python video courses over at Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and Async. And best of all, there's not a subscription in site. Check it out for yourself at Training Python. Fm be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the itunes feed at /itunes, the GooglePlay Feed at /play, and the directrss feed at rss on talkpython FM.

01:06:50 We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at 'Talkpython/Youtube'. This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon