#51: SigOpt: Optimizing Everything with Python Transcript
00:00 You've heard that machine intelligence is going to transform our lives any day now.
00:03 This is usually presented in a way that's vague and nondescript.
00:07 This week, we look at some specific ways machine learning is working for humans.
00:11 On Talk Python to Me, you'll meet Patrick Hayes, the CTO of SigOpt, whose goal is to accelerate machine learning by optimizing everything.
00:19 That's a pretty awesome goal.
00:21 Listen in to this episode and learn all about it.
00:24 It's episode number 51, recorded March 3rd, 2016.
00:29 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.
01:00 This is your host, Michael Kennedy.
01:02 Follow me on Twitter, where I'm @mkennedy.
01:04 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.
01:10 This episode is brought to you by Hired and SnapCI.
01:14 Thank them for supporting the show on Twitter via at Hired underscore HQ and at Snap underscore CI.
01:20 A couple of quick updates for you before we get to the interview with Patrick.
01:25 First, I want to thank all of you who participated in helping me launch Talk Python training through Kickstarter.
01:30 I finished the course, got the website online.
01:32 It's training.talkpython.fm, and added all the Kickstarter backers who have filled out the survey to the course.
01:38 The feedback so far has been really positive.
01:40 I got this from one student just today.
01:42 Absolutely loving the course.
01:44 I learned so much, not just Python principles, but your methodology with designing an app.
01:49 Your lessons in PyCharm have really helped a ton, too.
01:52 I've bought so many books and online classes, but nothing has brought it all together like you have.
01:56 So a big thanks to you, Michael.
01:58 To celebrate the launch of my course, I'm giving away a free seat to a friend of the show.
02:01 Just enter your email on talkpython.fm to be eligible to win.
02:05 I'll draw a winner at the end of the week.
02:06 Next, I had the honor of telling the story of my career, how I went from super junior,
02:11 know-nothing developer to developmenter instructor to owning my own business and launching the show.
02:16 I was on Developer on Fire podcast last week.
02:19 If you understood these kinds of stories, check out Dave Reel's podcast at developeronfire.com.
02:23 My episode was 112.
02:26 Now, let's get to this interview with Patrick.
02:28 Patrick, welcome to the show.
02:30 Thanks, Michael.
02:30 Yeah, it's super exciting you're here.
02:32 I'm ready to optimize everything.
02:34 Perfect.
02:34 That's what we're here for.
02:35 Very cool.
02:36 Very cool.
02:36 Now, before we optimize everything, let's start at the beginning.
02:40 What's your story?
02:41 How did you get into programming?
02:41 Sure.
02:42 So I started when I was a kid.
02:43 You know, I was young.
02:44 My parents' computer played around in Visual Basic or Java or things like that.
02:49 Kind of just playing around, making little toys myself.
02:51 They sent me to computer camp at one point, and that's really where I started to get excited about it.
02:55 Then I went to the University of Waterloo, where I studied computer science.
02:58 And there, of course, where I got, you know, much better fundamentals and started, you know, to build my career from there.
03:03 And then I started with Python on one of my first internships.
03:06 So I was an intern at BlackBerry, and this was probably back in 2007.
03:09 And at that point, my, you know, programming experience, I was still pretty junior.
03:12 And I was used to Java and these kind of very verbose languages.
03:15 And then one of my previous colleagues had written the script in Python, and it was one of my tasks was to update it.
03:21 So I got in there, and I started playing with it, and I realized, oh, actually, it's Python stuff.
03:25 Pretty cool.
03:25 It's much, you know, I can be more expressive, and I can work more quickly.
03:28 And that's when I started to become a big fan, and I've used it in most of my personal projects since.
03:32 A lot of people who don't work in Python have this sense that it's kind of like Bash, but you can make websites with it.
03:38 Right.
03:40 You know, they don't see the entire sort of landscape of all the amazing stuff you can do.
03:47 And so a lot of us, when we come to you, you're like, oh, there's a little script I got to maintain.
03:51 Oh, my goodness.
03:52 Look, there's this thing called PyPI.
03:54 And oh, my gosh, look what else you can do.
03:55 Exactly.
03:56 That's awesome.
03:57 Increase in productivity, you know, it was kind of like night and day from what I was used to.
04:00 So that's definitely.
04:01 So what did you study in college and work on in BlackBerry?
04:04 I studied mathematics in university.
04:06 So I did computer science and also pure mathematics.
04:08 Yeah, cool.
04:09 So I went to University of Waterloo.
04:10 And then while you're at Waterloo, you do six internships.
04:12 So I was at BlackBerry for my first one.
04:15 And there I was working on the automated testing team.
04:17 So, you know, one of the things.
04:18 Oh, that's a cool.
04:19 Yeah, it was, you know, I was just getting started kind of with my very first internship.
04:22 So it was a good way to get my feet wet.
04:23 I think it's cool that university has the sixth internship.
04:26 Yeah.
04:27 Because a lot of times you come out of college and you're like, okay, now what?
04:30 Yeah, it makes a huge difference.
04:32 Definitely.
04:32 I felt that in my career.
04:34 You know, you graduate and you have six, four month internships.
04:36 So you basically have almost two full years of work experience at this point.
04:40 So it's really kind of night and day versus if you hadn't done these internships, what kind of experience, practical experience you have.
04:45 And it really does complement the education well.
04:48 Because what you learn in class is very kind of theoretic and good fundamentals.
04:52 Yeah, it's somewhat detached some of the time.
04:54 Yeah, exactly.
04:55 But then when you get into the real, the industry, you're really learning totally different things that you, and you need both to be a good engineer in this day and age.
05:01 Right, of course.
05:01 So the company that you guys founded, SigOpt, has machine learning as its sort of central feature that it provides, right?
05:11 And so could you tell everybody a little bit about just, you know, the mechanics of machine learning and describe how some of the pieces work?
05:17 Just so, before we get into the details.
05:20 How machine learning works?
05:21 Well, yeah.
05:21 So let's just take a concrete example, right?
05:24 Suppose I've got some data about a bunch of shoppers and I've got a store and I'd like to optimize something.
05:31 You know, give me a sort of how you'd go through that with machine learning.
05:34 Yeah, sure.
05:35 So a great example, if you have a store, would be you might want to predict fraud.
05:38 But if you have, you know, much of purchaser, historical purchase data, and maybe you want to minimize the number of fraudulent purchases you receive.
05:45 Because that's, you know, affects your bottom line.
05:46 Every kind of, every fraud, fraudulent purchase costs you money.
05:49 The problem that machine learning can help you solve is, given that you have all this historic data that you've built up over the years, how can you use that to, in the future, make predictions about events you haven't seen yet?
05:58 So if I've seen a million purchases in the past, I know things about those purchases.
06:03 Like I know the country that it came from.
06:04 I know how much money the purchase was for.
06:07 Whether it was from a sketchy IP address or something like that.
06:09 And I also know the truth you need to learn from, which was that purchase fraudulent or not.
06:14 And now you have this data set of past truth, and then you can apply machine learning methods on that data to produce a model that will make predictions in the future.
06:22 Okay.
06:22 So basically, as people are checking out, you can ask, hey, should I let this go through, even though the credit card system said yes, right?
06:29 Exactly.
06:29 So even though you don't know for sure yet whether it is fraudulent, you can make a guess, a good guess, based on the data you've seen.
06:34 A good machine learning model, you know, will be accurate, as accurate as possible.
06:37 You'll always have sort of false negatives and false positives.
06:40 So, you know, saying a purchase is fraudulent when it might not be, or vice versa, thinking a fraudulent purchase is okay when it is fraudulent.
06:46 But you want to, of course, minimize that.
06:48 And a good machine learning model and machine learning process will reduce that.
06:50 Let's talk about SigOpt a little bit.
06:52 Your theme or your tagline is to help customers optimize everything.
06:57 You got it.
06:58 That's a pretty audacious goal.
07:00 What is this company you guys founded?
07:03 So SigOpt is a SaaS optimization platform that helps companies build better machine learning models with less trial and error.
07:10 So we have an API that exposes the most cutting edge optimization research to our users to help them increase the accuracy of their machine learning models as quickly as possible.
07:17 So in this example of a fraudulent, a fraud detector, our APIs would help them tune that model to make, to really reduce that, those cases where it would be inaccurate.
07:26 And we want to get our users to these most accurate models as quickly as possible.
07:29 We don't want them to spend days or weeks or months trying different possibilities kind of in the dark.
07:33 All right.
07:33 That makes sense.
07:34 It sounds like a big challenge, but it makes sense.
07:37 So you said you're a SaaS company.
07:39 Do you actually run the machine learning algorithms?
07:42 Do you provide like the compute and data processing?
07:45 Or do you somehow just provide, hey, you should feed these inputs to scikit-learn along with your data?
07:54 Yeah, that's a great question.
07:55 The way we help our customers is they have their own models that they're running on their own clusters with their own data.
08:01 So we're not actually running the models for them.
08:03 And we're also not looking at any of their sensitive data.
08:05 What we're doing is we're having our APIs are able to suggest what's the next version of your model you should try.
08:12 So the models have parameters or variables that affect how well the model works.
08:17 And finding the best version, the best values for those parameters is a very tough problem.
08:21 And that's what SIGUP solves in the most efficient way possible.
08:23 Okay.
08:24 That's really interesting.
08:25 Where did this idea for this company come from?
08:27 Yeah.
08:28 My co-founder, Scott, he was doing a lot of research on this field during his PhD at Cornell.
08:33 So he was doing a lot of research in this field and had this great idea for how to build this kind of black box optimizer.
08:38 That was the beginning of that idea.
08:40 And we've built the company around it now to put an API around it to make it really easy to use,
08:45 to administrate all the difficult computation that is required to serve these optimal suggestions.
08:51 Okay.
08:51 Do you have some interesting stories around stuff that you've helped people optimize?
08:56 I heard one of them was something like shaving cream and another was synthetic rhinoceros horns.
09:02 It's quite varied, right?
09:04 Exactly correct.
09:05 So the optimization, so I've been talking about machine learning here, but really it's actually able to optimize.
09:10 Any problem in which trying something new is very expensive because we want to get you to the best version of your product in as few tries as possible.
09:17 So that is true for a machine learning model that you have to train it or test it on real user traffic.
09:22 And that's very expensive.
09:23 But another thing that's very expensive is these chemical processes.
09:26 So we have one customer who makes shaving cream and the expensive part for him is he has to go into the lab,
09:31 mix up his ingredients into a jar or vat or something, wait for 100 hours for it to settle,
09:37 and then test the quality of the shaving cream.
09:39 And then based on that, he wants to find a shaving cream that makes him the most profit.
09:43 So in this case, there's however many ingredients that go in the shaving cream and mixing them in different amounts is a very chaotic process.
09:50 So as a chemist, he's an expert in his field.
09:52 So he's able to say, oh, the ranges should probably be somewhere.
09:56 This acid should probably be between five milliliters and seven milliliters.
09:58 But then at some point, he gets to this area where the knowledge, he doesn't have as much knowledge as possible about the interactions.
10:05 So this is where Seagat comes in.
10:07 Right.
10:07 It gets down to actual experimentation rather than theory, right?
10:11 Yeah.
10:11 Yeah.
10:12 Trial and error.
10:12 So now SigOpt, what we're able to do is we're able to model these chaotic effects and are able to suggest,
10:17 here's the next version of your shaving cream you should try.
10:19 So then he goes back in the lab, mixes up a batch with what we suggested, tells us how well it did,
10:25 and then we use that to update our model and give another best suggestion.
10:28 So before using SigOpt, this chemist had mixed up, you know, 100 different trials just by like keeping track of it in his lab notebook.
10:34 It hadn't really gotten anywhere.
10:35 But then he uploaded all his data to SigOpt, and now SigOpt is able to take in that data and analyze it and say, hey, here's what you should try next.
10:41 And then he did that.
10:42 And then within just five new batches, he already had a new version of his shaving cream that was 20% more profitable.
10:47 Wow.
10:47 That's really cool.
10:48 And you're modeling physical stuff.
10:51 It's not just like you're taking in a bunch of data and you're processing it.
10:55 Like you're helping this guy, you know, make shaving cream.
10:58 Yeah, exactly.
10:58 In this case, it's helping with this very physical world problem.
11:00 And you also mentioned this.
11:02 You're right.
11:02 There's another customer using us for synthetic rhinoceros horns.
11:05 And this is the same thing.
11:06 It's, you know, they want to make a rhino horn that has these properties, and there's various variables that go into that.
11:11 Like how much of different chemicals do you use?
11:13 And we're helping them with that process as well.
11:14 That's cool.
11:14 I have to ask.
11:15 Like, why do they want a synthetic rhinoceros horn?
11:17 It's a great question.
11:18 My understanding is there's a large black market for rhinoceros horns.
11:21 There's two things at play here.
11:23 One is they want to reduce that black market because they want to stop poaching of rhinos.
11:26 And one way to do that is to flood the market with these synthetic horns.
11:30 So right now, you actually, my understanding is you can sell these synthetic horns for a very high price because they mimic the not very expensive real rhino horns.
11:37 But if you can, but the synthetic ones can be produced at a much larger scale.
11:41 So now the market's flooded with horns that are identical chemically, but much cheaper to produce.
11:45 And now it's no longer profitable to actually poach a rhino.
11:48 That's a really good thing for everyone.
11:50 That's awesome.
11:51 What other sort of stories do you have about what you've helped people optimize?
11:54 Yeah.
11:54 We also have customers doing synthetic egg whites, music recommendation models.
12:00 There's really, it's going to be applied across the board on all these very black box, difficult problems.
12:05 Yeah.
12:05 Okay.
12:05 One of the things I think is pretty awesome is some of the stuff at the heart of what you're doing is actually open source software, right?
12:12 That's correct.
12:13 Some of the algorithms and whatnot.
12:14 And yet you're building a business on top of it.
12:18 And I've talked to a couple of companies lately, a show that's about to come out next week.
12:22 It's got a really cool sort of story of taking something open source and turn it into a business.
12:27 And I want to dig into that.
12:28 But one of the places you guys started was at Y Combinator, right?
12:31 Yeah, that's correct.
12:32 What was it like being at YC?
12:34 It was a lot of fun.
12:35 It was very kind of exciting, very, definitely a lot of emotions.
12:39 You have a lot of fun, but you also have a lot of, you know, stress and you have a lot of, you're very busy.
12:43 You learn a lot.
12:44 So it was a really fantastic experience.
12:45 You know, I think it took what me and my co-founder, when we started, you know, didn't really know much of anything about how to build or run a business.
12:52 And just the three months of the first three months of the Y Combinator, you, it's very transformative.
12:57 You know what to think about and what not to think about.
12:59 You know how to stay focused.
13:00 You know how to talk to users and build something that your customers want.
13:04 And then it culminates in demo day.
13:06 And that's when you present to a bunch of angel investors.
13:08 And that's when it really starts your company from there, getting the fundraising you need.
13:11 So it definitely makes a huge difference.
13:13 We definitely wouldn't be where we are today if we hadn't gone through Y Combinator.
13:15 That's awesome.
13:16 So how do you actually get into Y Combinator?
13:19 Is there like a call for applications?
13:21 Yep.
13:22 Are there like open demos?
13:23 Like what's the story?
13:24 Like how would someone out there is listening?
13:26 Like I want to be part of Y Combinator.
13:27 What do they do?
13:28 They may have changed it a little bit since I did it.
13:30 So this was I applied in 2014.
13:32 But essentially, yeah, twice a year they have an open call for applications.
13:35 And they say, what are you building?
13:37 Who are you building with?
13:39 Send us an application.
13:40 They ask you, do you have any customers?
13:42 Do you have any traction?
13:42 Do you have any revenue?
13:44 What have you built so far?
13:45 And then they also ask a lot of questions about you as the founders.
13:48 They ask, what have you built in the past?
13:49 Where have you worked?
13:51 One of the kind of notable questions they ask is, what's a system you've hacked in the past?
13:55 So not hacking in the kind of infiltrating server sense, but like what's the kind of creative thing you've did in the past to skirt around some, I shouldn't say rules, but not like breaking the rules, but doing something clever or creative in the past.
14:07 Yeah, that's really cool.
14:08 Would you recommend other people go in there and try it?
14:11 Like you found it to be a pretty positive experience?
14:12 Yeah, I would say it was positive.
14:13 And I mean, I can't say I can recommend it for everyone.
14:15 I don't know everyone's specific situation, but I would recommend it definitely.
14:19 It really did take, yeah, me and my co-founder from a state of no experience to, you know, competent at building a company in a very short time.
14:31 Yeah, I think one of the challenges of starting a company, especially technical companies, the founders, they're technical, right?
14:38 And part of starting a company, especially one like yours that is a highly technical science, math sort of company requires that.
14:46 But that's not enough for like user growth and business deals and marketing and growth hacking and connections.
14:56 Absolutely.
14:57 That's a real big challenge.
14:58 And YC helps with that, right?
14:59 So both me and my co-founder were technical co-founders.
15:03 So we had to pick up a lot of skills that we had no experience with.
15:06 What I commonly say is before working at Stigop, my last job, I got to work on what I was good at every day.
15:10 I was an engineer and I worked on engineering problems.
15:12 Now as a co-founder, I work on things I'm bad at every day, which is very different.
15:16 You're spending more time, you know, trying to get customers or build a company or think about culture or think about hiring or think about marketing or think about sales.
15:23 All these things that I had no experience with prior to starting the company.
15:27 So you definitely get a lot of experience with these things.
15:30 Yeah, exposure to totally new things, which as you're right, are essential to running a successful business.
15:34 But you kind of have to learn it on the fly.
15:36 So Y Combinator, they do prepare you for this.
15:38 Also, the network is really, really helpful.
15:40 So there's probably over a thousand people in the Y Combinator network at this point, you know, past founders.
15:46 And it's very friendly and welcoming community.
15:48 You can message them if you have any advice or you have questions or things like that.
15:53 Oh, yeah.
15:53 That sounds super helpful.
16:05 This episode is brought to you by Hired.
16:08 Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.
16:14 Each offer you receive has salary and equity presented right up front, and you can view the offers to accept or reject them before you even talk to the company.
16:24 Typically, candidates receive five or more offers in just the first week, and there are no obligations, ever.
16:30 Sounds pretty awesome, doesn't it?
16:32 Well, did I mention there's a signing bonus?
16:34 Everyone who accepts a job from Hired gets a $2,000 signing bonus.
16:38 And as Talk Python listeners, it gets way sweeter.
16:42 Use the link Hired.com slash Talk Python to me, and Hired will double the signing bonus to $4,000.
16:50 Opportunity's knocking.
16:52 Visit Hired.com slash Talk Python to me and answer the call.
16:55 When you got through with Demo Day, I saw that you guys actually had two seed rounds.
17:09 One for like $120,000 and one for $2 million.
17:12 Congratulations on that.
17:14 Thank you.
17:14 That's pretty exciting, right?
17:15 Was that first part sort of the end of YC?
17:18 Yeah, the first part is from Y Combinator.
17:20 And then at the end of Y Combinator, there's Demo Day.
17:22 And that's when we raised our larger round.
17:25 And that was to get us, you know, over the next, yeah, get started from there.
17:29 Yeah.
17:29 So what's it like taking VC money?
17:31 That's another skill that you don't have any exposure to before starting a company, or at least I didn't.
17:37 So yeah, fundraising is a very kind of interesting beast.
17:39 You have to have your pitch ready, and you spend a lot of time talking to investors, and you're dealing with rejection, and also, yeah, communicating in this way that you're not used to this very, yeah, salesy way.
17:51 Yeah, you have to have a very crisp message, right?
17:53 But it's also different from selling to customers, right?
17:55 You're, you know, for customers, you're talking about what's their immediate needs, and how can I satisfy them now?
17:59 And when talking to VCs, you're talking about, oh, what's kind of the grand vision of what's the app going to look like in 10, 20 years?
18:06 What's the, how big could we possibly be?
18:09 Yeah, the customer wants to know, how are you going to help me make my store or chemical process better?
18:14 The VCs want to know how you're going to reach, like, large-scale growth and continue user acquisition.
18:20 Like, the customer doesn't care at all about that, right?
18:21 Exactly.
18:22 It's good, you know, talking to investors, you know, and they can really, they can be a lot of help.
18:25 They can give you a lot of great resources, especially if you get good investors, which we were very fortunate enough to do.
18:30 Another set of connections and network, almost as much as anything, right?
18:35 Cool.
18:36 So you guys started in November 2014, right?
18:39 That's correct.
18:39 And you started with Python from day one.
18:42 What was the reaction with NYC for using Python?
18:45 They give you a lot of autonomy to build what you want, so I don't think that anyone else really would have cared what we built it in.
18:51 But it made a lot of sense for us as a company, you know.
18:53 So both me and my co-founder knew Python, but I also could tell that this is, you know, would be really great for getting it started and moving quickly.
19:00 So especially when we have this kind of short timeline of three months to get out the door before demo day, what can we use to really make sure?
19:09 As productive as possible right from day one.
19:11 And so Python was a pretty natural choice.
19:13 Yeah, I guess, especially in the sort of accelerator type scenario, speed to market is more important than CPU cycles.
19:22 Exactly.
19:22 Or memory usage or whatever.
19:23 Yeah.
19:24 But you guys are sticking with it, right?
19:25 You've been going strong.
19:26 Yeah, exactly.
19:27 So we have, yeah, almost everything is in Python.
19:30 We have some of our, we have some things in C++ for some very high end optimization that needs to be very performant.
19:36 And we use, you know, Python and C++ bindings for that to interact with the rest of our stack.
19:40 We also, you know, we have JavaScript, of course, because we have a website as well.
19:44 But everything back end is Python.
19:46 Okay.
19:47 That's cool.
19:47 And is that Python 2 or Python 3?
19:49 We are Python 2.
19:50 Okay.
19:51 Right on.
19:52 And you said you had considered maybe using Go or Scala or some other languages.
19:57 Can you just sort of discuss those trade-offs?
20:00 Sure.
20:00 So I definitely, I had some experience with both of those.
20:03 I think Scala is really nice when I did use it.
20:06 It's very safe.
20:07 The type safety is really good.
20:08 But it's very, it's still very expressive, much more expressive than Java or something, which it's based off of.
20:14 And that's really nice.
20:15 I think I would be less confident we could move as quickly as we could with Python there.
20:18 You can be very expressive, but there's a lot of kind of tooling issues, like the compile times can be very slow.
20:23 And also, it's perhaps not as well known.
20:26 So you spend some amount of time training new hires or other people to use Scala as well.
20:31 And I don't think that's necessarily a bad investment to make always.
20:34 But when we were at this stage where we want to move very quickly, then we should stick with something that we can be confident about that in.
20:42 And then I had a similar opinion about Go, which is like that was a great language and really has all nice features.
20:47 But at the time, I believe it was still, you know, less mature than it is today.
20:51 I recall wanting to use an AWS library.
20:54 And for Go, there wasn't one at the time.
20:55 I think there is now.
20:56 But Python, of course, had a great AWS library just ready to go.
20:59 So those little decisions of just like, I want to spend my time working on the core business and not tooling and not, you know, building third-party packages, which could already exist.
21:10 Right.
21:11 There's zero business benefit in you building a Go implementation of Bodo or whatever, right?
21:18 Exactly.
21:18 So I definitely think, you know, there's lots of trade-offs to be made.
21:21 And that's kind of where we were leaning at that time.
21:23 And I'm really happy with it now.
21:25 So I think it's, you know, we've definitely, we've built a lot.
21:28 And I don't think we'd be where we are today if we had perhaps started with a different language.
21:32 Sure.
21:32 So how much does SciPy and NumPy and all those pieces sort of play into this?
21:39 Was that like pretty critical to making it work?
21:40 Yeah.
21:41 So we use a lot of that for kind of, yeah, high capacity computation and optimization.
21:46 They're, you know, really great tools for that, that we love, both in the kind of prototyping phase where we want to try something new on some data set or something like that.
21:53 Using SciPy, Scikit-Learn, NumPy are fantastic.
21:55 And then also, you know, they have the performance we need to actually put them in production when we're appropriate.
22:01 So we also have, of course, things that we built ourselves that are very well tuned to our own problem.
22:04 And as I mentioned before, typically those are in C++.
22:07 But having this wide array of tools available to us in Python is essential.
22:12 Cool.
22:12 And I heard you say AWS in there.
22:14 Is that where you guys are hosting your SaaS product?
22:16 Yeah, that's correct.
22:17 Okay.
22:17 Are you doing like multiple data centers and things like this?
22:21 Or is it US East one?
22:22 US West.
22:23 Yeah.
22:24 So we're still, yeah, there's lots, I would say we're still pretty small.
22:28 So there's lots of, there's lots of improvement to do in a lot of fields.
22:31 But specifically, yeah, we're AWS is, we're big fans of it.
22:37 Okay, cool.
22:37 Yeah.
22:37 So am I.
22:52 Gone are the days of tweaking your server, merging your code, and just hoping it works in your production environment.
22:58 With SnapCI's cloud-based, hosted, continuous delivery tool, you simply do a get push, and they auto-detect and run all the necessary tests through their multi-stage pipelines.
23:08 Something fails?
23:10 You can even debug it directly in the browser.
23:12 With a one-click deployment that you can do from your desk or from 30,000 feet in the air, Snap offers flexibility and ease of mind.
23:20 Imagine all the time you'll save.
23:22 Thanks, SnapCI, for sponsoring this episode by trying them for free at snap.ci slash talkpython.
23:38 You guys have to really focus on API design because that's one of the primary ways that people interact with your entire business as a SaaS product.
23:47 How do you guys think about that?
23:48 Yeah.
23:49 Using optimization tools like SciPy or like the open source or other open source optimization tools, one of the problems with them is that they're very, very difficult to use for a lot of reasons.
23:59 One of them is, you know, the API might be, you know, very obtuse or not very well optimized to your problem or things like that.
24:08 Another problem is administration.
24:10 You have to have the servers and set them up and you have to know what kind of capacity you need to optimize your problem.
24:16 Like if you use machines that are too small, then it will be very slow and so on.
24:20 Right.
24:20 But if you bit too many, you're just going to waste money on big machines doing nothing, right?
24:24 So what we wanted to do was we wanted to make sure that we can take, you know, this very valuable tool, this optimization tool and really make it what's the easiest possible way to expose it to our customers.
24:32 And so for that reason, it's one of our top priorities is having an API that's very, very clean, very predictable, very easy to use, does what you want, a small number of endpoints.
24:42 You're not having to, you know, dig into the nitty gritty.
24:46 You're not tweaking flags in our optimization.
24:48 You're just saying, here's my problem.
24:50 Tell me what to do next.
24:52 Right.
24:52 Do you really try to optimize for the simple getting started case and then provide additional features as needed or how do you do that?
25:00 We want to make sure that, yeah, getting started is as easy as possible.
25:04 So if you have your problem, you can use our API to define your problem and say, here are the parameters that I'm searching over.
25:11 You can also use our API to say, here's what I've tried so far.
25:13 And then our API will tell you, here's what to try next.
25:16 So it's this very simple, like three calls, one to get started and then one to ask for a suggestion and one to report an observation.
25:23 And that's really all a user needs to get started with SigOpt.
25:26 And then to the extent that there are, you know, expert level flags, like if you really know that you want this particular optimization thing, we want to make sure it's possible.
25:34 But we really don't want it to be users to have to know about it or have to be confused by it just to get access to the really powerful tooling inside SigOpt.
25:42 Yeah, that seems like it's keeping with your overall mission of democratizing this machine learning and optimization in general, right?
25:51 Yeah.
25:52 Very, very cool.
25:53 So let's talk a little bit about the open source engine that's kind of at the heart of a lot of what you're doing.
25:58 So your co-founder, Scott, he used to be at Yelp.
26:01 Is that right?
26:01 Yes, that's correct.
26:02 Yeah, he worked on this thing called metric optimization engine or Moe.
26:06 Yeah, that's right.
26:07 Okay.
26:07 And so this is sort of one of the core components of your system, right?
26:11 Yeah, so we definitely have, we've built more on top of it.
26:14 And, you know, SigOpt these days is powered by like an ensemble of multiple optimization methods.
26:18 But how we got started from the very beginning was this, yeah, this Moe that Scott had built using his expertise in the field.
26:24 And that was the first prototype for can this black box optimization really be, can it be profitable?
26:29 Can it be easy to use?
26:30 Can it be something that people would want to want to interact with?
26:33 And so Moe was the first prototype for that.
26:34 And, you know, Moe was very well received on GitHub.
26:36 Lots of companies and individuals were using it.
26:38 And then some of the very common feedback was, oh, but it's really hard to use.
26:42 Or, oh, perhaps it doesn't work.
26:44 And then the reason it didn't work was because this one flag wasn't set.
26:47 So he and I knew and could see that there's real, real value here.
26:51 But the biggest problem is how can it be even easier to use so that companies of any size can have access to this really, really powerful stuff.
26:59 Yeah, that's great.
26:59 There's this opportunity.
27:01 You've got this great open source tool, but you want to build a business.
27:06 Can you kind of take me through the thinking of like, all right, we're still giving away this thing for free.
27:11 And yet we need, you know, to make something special that customers will buy and love.
27:16 What was the thought process?
27:18 Just that it's too hard.
27:19 How can we make it not so hard and accessible?
27:22 Yeah, I would say that's, yeah, the biggest barrier to using Moe is that it's very hard to use.
27:28 I think if people still want to use Moe, then they can, but they're going to have a lot of headaches with it.
27:34 Our goal is that we want to be the people who remove those headaches.
27:37 But as I said, so we kind of, we started with Moe.
27:40 And now, in addition to that, we have a wide variety of other optimization methods that we've built in-house.
27:46 Those are not open source.
27:48 So using Cigar does also give you this kind of wider array of benefits, other techniques, other optimization, kind of cutting edge research.
27:55 But you're right that like this, it is still open source.
27:58 It is still available.
27:59 And like it is still good for the community to have access to this open source stuff to kind of, you know, they can see where we're coming from and what's been built and the kind of background behind it.
28:06 Yeah, I think there's a bunch of stories that are coming out of people building really amazing businesses on top of things that they're giving away.
28:14 And I think that's becoming much more common.
28:16 But still, I think it's really special to see examples of it in action.
28:21 Like with you guys, recently I spoke to the Scraping Hub guys that took Scrapey and turned that into kind of a SaaS platform, you know, web scraping as a platform, if you will.
28:33 You know, then there's like more obvious examples like MongoDB, Red Hat.
28:38 But it's really cool to see you guys sort of turning this into a business starting from this sort of kernel of open source.
28:44 Do you still make a lot of contributions to Mo or are there a lot of other people working on it?
28:49 What's the story there?
28:49 Some people work on it.
28:50 Some individuals kind of like in the open source community have been contributing to it.
28:54 These days, our contributions are mostly focused on these other methods that we kind of have diagnosed.
28:59 Like we're able to diagnose like these are the problems that Mo worked well for.
29:02 And then these are the problems that Mo doesn't work well for.
29:04 And then how can we as a company attack those and help our customers who have those kinds of problems?
29:08 And so we spend a lot of time focusing on that.
29:10 And that's sort of perhaps tangential to the public Mo.
29:13 So we're still working on that.
29:14 At this time, that's still closed source to SIGopt.
29:16 Right.
29:17 Of course, you got to keep the secret sauce a little bit secret, right?
29:20 Awesome.
29:21 So can you talk a little bit about how some of the internal systems work?
29:25 We talked about your deployments being on AWS, but is there like interesting architecture that you might want to talk about?
29:31 Sure. So we have our web and API.
29:33 That's, as I mentioned before, all Python.
29:35 We use Flask in production to do kind of web serving.
29:39 For the most part, we have like our API, which is a very thin layer that just accepts, you know, API requests from our customers of the form I said before where they're describing their problem.
29:48 But then most of the work we do is we offload that asynchronously to these high capacity compute machines that are doing the optimization.
29:55 So for these problems that are very expensive, we want to be able to, you know, run SciPy or these custom C++ algorithms that we're working on.
30:03 Okay. Yeah. Of course, you definitely want to buy the highest end compute machine you can.
30:08 It seems to make a really big difference.
30:10 If you sort of double your VM and AWS, it seems like the low end is really low.
30:16 Yeah.
30:17 Yeah. It's great. You have this kind of flexibility to split up your architecture that way for these machines that don't need it and machines that do.
30:24 Right. Do you, I mean, that's part of the beauty of the cloud, right? Like it's a checkbox or an API call.
30:29 Are you guys doing anything with GPUs given how computational this is?
30:33 We do. We have experimented with that in the past, definitely. Like for some of these algorithms, which are, can be really parallelized over GPUs.
30:40 That's still something we're working on and experimenting with to see if it's worth productionizing.
30:43 Cool. So maybe you could just like in a few sentences, tell people like, what's the deal with like computation and GPUs? Like enough for graphics.
30:50 Yeah. So I'll be honest and say that this is not one of my deep areas of expertise.
30:55 But my understanding is that your GPUs, you know, they're very highly optimized for this widely parallel computation.
31:03 So definitely when you're doing these kind of expensive CPU bound optimization techniques, then you can use GPUs to get those done as quick as possible.
31:12 Yeah. Yeah. It's pretty amazing when you look at it. I mean, when I first heard about it, I was like, really? What?
31:17 Or doing like math on the GPU? I mean, of course, if you look at video games.
31:22 Yeah. Like for a while I worked on 3D simulators and the amount of computation those graphics cards do just to render a scene is mind boggling for one scene.
31:32 And then they do it like 60 or 100 times a second. It's crazy.
31:35 And that was, you know, so long ago, many years ago that I was doing that. I was still super impressed.
31:40 If you look at the parallels, I'm like, I've got a MacBook Pro Retina and it's got a pretty high end CPU, which I think has four real cores and each one is hyper threaded.
31:50 So it looks like eight.
31:51 But some of those graphics cards have like over a thousand cores. So trying to decide between eight and a thousand for parallels.
31:58 Wow.
31:59 Yeah. It makes a big difference.
32:00 That's pretty insane. So, yeah. I remember a few years ago I saw on AWS as one of the machine types, like a clustered GPU thing.
32:09 What is that doing in the cloud?
32:11 But yeah. Yeah. Very cool.
32:13 So you said you experimented with it. Is it looking promising or?
32:16 I mean, so far so good. Yeah. It's definitely seems like something we might might want to go forward with.
32:20 Yeah. I haven't really tried this computational stuff with it. There's some really interesting projects, but I just haven't had a use for that much computation, I guess.
32:27 But it seems like if you find a case where it works, it works crazy good.
32:32 Definitely.
32:33 But it's not like a general computer, right? You can't just give it any problem. So there's certain types of problems that are really appropriate or algorithms and certain ones that aren't. So I guess, you know, that's kind of a big decision, whether it makes sense or not. Right.
32:44 We're getting kind of near the end of the show. Let me ask you a few questions I ask all my guests. When you're going to write some Python code, what editor do you open up?
32:52 I use Vim. I've been using Vim, yeah, for about eight years now. I think I'm, you know, got it pretty well optimized. I feel more productive in that than just about anything else. So that's what I use for just about everything.
33:03 Yeah. Very cool. You know, it's very unscientific. Maybe I could pass a few data points off to your system and ask it. But I would say I think Vim seems to be winning the popularity battle among my guests anyway. I'm not sure if they're representative or not at the overall community. But yeah, very, very cool.
33:20 And, you know, PyPI has 75,000 packages. Now, just it's insane. You know, how many things are out there that you can just grab and bring into your apps in Python?
33:31 What ones would you recommend or what are really important things people should know about?
33:35 I would be self-serving and say the SigOps API client Python package.
33:39 Ones that I use in my own personal day to day. Definitely IPython. I get a huge amount of value out of just for the REPL and using IPython notebooks.
33:48 Like increasing productivity over the regular REPL is kind of astounding.
33:52 This is obviously a pretty common one, but I think requests, that's just like, it's become such a necessary part of my toolkit.
33:57 Feel shocking that it's not part of the standard library at this point.
34:00 Yeah, that's a really interesting comment. I agree that requests absolutely should be up there. It's the most popular package on PyPI.
34:07 Oh, is it?
34:07 By the way.
34:08 I believe it.
34:08 Yeah, it's so clean and so useful. And I was talking to Kenneth Wright. He was on the show.
34:15 And it was the creator. He was saying they were considering making requests part of the standard library.
34:22 But they decided not to because they wanted to be able to rev the features and security fixes and various things of requests faster than they do Python itself.
34:32 So they decided to keep it separate, but to make it the recommendation to basically not recommend using URL lib, things like that.
34:40 Go, no, no. pip install requests. This is how we do it.
34:43 Let's just all agree on that.
34:45 But yeah, that's pretty interesting, right?
34:46 They actually considered making it part of Python, but for versioning and agility.
34:51 Well, it makes a lot of sense.
34:51 Yeah, yeah, yeah. They decided not to.
34:54 Okay, awesome. If people want to get started with SigOpt, what do they do? How do they get started?
34:59 Head over to sigopt.com, sign up, you get started. We have a free trial.
35:03 And if you have any of these problems, whether it's a machine learning model that you want to make more accurate or some other kind of process that you're trying to optimize,
35:10 you just sign up, get started with our API, and you're off the races.
35:14 Okay, that sounds great. Yeah. So pip install SigOpt, sign up, off you go, right?
35:18 You got it.
35:19 All right, Patrick. It's been great having you on the show. This is really interesting.
35:22 You guys are doing some cool stuff and continue to optimize everything.
35:26 All right, we will. Thanks, Michael.
35:27 All right. See you later.
35:29 This has been another episode of Talk Python to Me.
35:32 Today's guest was Patrick Hayes, and this episode has been sponsored by Hired and SnapCI.
35:36 Thank you guys for supporting the show.
35:38 Hired wants to help you find your next big thing.
35:40 Visit Hired.com slash Talk Python to me to get five more offers with salary and equity presented right up front,
35:45 and a special listener signing bonus at $2,000.
35:48 SnapCI is modern continuous integration and delivery.
35:51 Build, test, and deploy your code directly from GitHub, all on your browser with debugging, Docker, and parallelism included.
35:57 Try them for free at Snap.CI slash Talk Python.
36:00 Are you or a colleague trying to learn Python?
36:02 Have you tried boring books and videos that just cover a topic point by point?
36:07 Check out my online course, Python Jumpstart by Building 10 Apps, at training.talkpython.fm,
36:12 for a different take on learning Python.
36:14 You can find the links from today's show at talkpython.fm/episodes slash show slash 51.
36:21 Be sure to subscribe to the show.
36:22 Open your favorite podcatcher and search for Python, which will be right at the top.
36:26 You can also find the iTunes and direct RSS feeds in the footer of the website.
36:29 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.
36:34 You can hear the entire song on talkpython.fm.
36:37 This is your host, Michael Kennedy.
36:39 Thanks so much for listening.
36:40 Smix, take us out of here.
36:42 Stating with my voice, there's no norm that I can feel within.
36:45 Haven't been sleeping, I've been using lots of rest.
36:48 I'll pass the mic back to who rocked it best.
36:51 On first developers, developers, developers, developers.
36:54 On first developers, developers.
36:56 On first developers, developers, developers.
37:00 developers.
37:02 you Bye.
37:03 you