WEBVTT

00:00:00.001 --> 00:00:03.880
You've heard that machine intelligence is going to transform our lives any day now.

00:00:03.880 --> 00:00:07.200
This is usually presented in a way that's vague and nondescript.

00:00:07.200 --> 00:00:11.820
This week, we look at some specific ways machine learning is working for humans.

00:00:11.820 --> 00:00:16.060
On Talk Python To Me, you'll meet Patrick Hayes, the CTO of SigOpt,

00:00:16.060 --> 00:00:19.920
whose goal is to accelerate machine learning by optimizing everything.

00:00:19.920 --> 00:00:21.660
That's a pretty awesome goal.

00:00:21.660 --> 00:00:24.640
Listen in to this episode and learn all about it.

00:00:24.640 --> 00:00:29.260
It's episode number 51, recorded March 3rd, 2016.

00:00:29.260 --> 00:00:57.280
Welcome to Talk Python To Me, a weekly podcast on Python,

00:00:57.280 --> 00:01:00.360
the language, the libraries, the ecosystem, and the personalities.

00:01:00.360 --> 00:01:02.460
This is your host, Michael Kennedy.

00:01:02.460 --> 00:01:04.460
Follow me on Twitter, where I'm @mkennedy.

00:01:04.460 --> 00:01:08.340
Keep up with the show and listen to past episodes at talkpython.fm,

00:01:08.340 --> 00:01:10.880
and follow the show on Twitter via at Talk Python.

00:01:10.880 --> 00:01:14.120
This episode is brought to you by Hired and SnapCI.

00:01:14.120 --> 00:01:20.880
Thank them for supporting the show on Twitter via at Hired underscore HQ and at Snap underscore CI.

00:01:20.880 --> 00:01:24.960
A couple of quick updates for you before we get to the interview with Patrick.

00:01:25.300 --> 00:01:30.020
First, I want to thank all of you who participated in helping me launch Talk Python training through Kickstarter.

00:01:30.020 --> 00:01:32.540
I finished the course, got the website online.

00:01:32.540 --> 00:01:38.240
It's training.talkpython.fm, and added all the Kickstarter backers who have filled out the survey to the course.

00:01:38.240 --> 00:01:40.760
The feedback so far has been really positive.

00:01:40.760 --> 00:01:42.620
I got this from one student just today.

00:01:42.620 --> 00:01:44.960
Absolutely loving the course.

00:01:44.960 --> 00:01:49.460
I learned so much, not just Python principles, but your methodology with designing an app.

00:01:49.460 --> 00:01:52.200
Your lessons in PyCharm have really helped a ton, too.

00:01:52.200 --> 00:01:56.660
I've bought so many books and online classes, but nothing has brought it all together like you have.

00:01:56.660 --> 00:01:58.000
So a big thanks to you, Michael.

00:01:58.000 --> 00:02:01.980
To celebrate the launch of my course, I'm giving away a free seat to a friend of the show.

00:02:01.980 --> 00:02:05.120
Just enter your email on talkpython.fm to be eligible to win.

00:02:05.420 --> 00:02:06.980
I'll draw a winner at the end of the week.

00:02:06.980 --> 00:02:11.980
Next, I had the honor of telling the story of my career, how I went from super junior,

00:02:11.980 --> 00:02:16.540
know-nothing developer to developmenter instructor to owning my own business and launching the show.

00:02:16.540 --> 00:02:19.540
I was on Developer on Fire podcast last week.

00:02:19.540 --> 00:02:23.880
If you understood these kinds of stories, check out Dave Reel's podcast at developeronfire.com.

00:02:23.880 --> 00:02:25.520
My episode was 112.

00:02:26.580 --> 00:02:28.260
Now, let's get to this interview with Patrick.

00:02:28.260 --> 00:02:30.260
Patrick, welcome to the show.

00:02:30.260 --> 00:02:30.880
Thanks, Michael.

00:02:30.880 --> 00:02:32.800
Yeah, it's super exciting you're here.

00:02:32.800 --> 00:02:34.400
I'm ready to optimize everything.

00:02:34.400 --> 00:02:34.900
Perfect.

00:02:34.900 --> 00:02:35.800
That's what we're here for.

00:02:35.800 --> 00:02:36.520
Very cool.

00:02:36.520 --> 00:02:36.920
Very cool.

00:02:36.920 --> 00:02:40.140
Now, before we optimize everything, let's start at the beginning.

00:02:40.140 --> 00:02:41.180
What's your story?

00:02:41.180 --> 00:02:41.920
How did you get into programming?

00:02:41.920 --> 00:02:42.200
Sure.

00:02:42.200 --> 00:02:43.480
So I started when I was a kid.

00:02:43.480 --> 00:02:44.280
You know, I was young.

00:02:44.280 --> 00:02:49.220
My parents' computer played around in Visual Basic or Java or things like that.

00:02:49.220 --> 00:02:51.500
Kind of just playing around, making little toys myself.

00:02:51.500 --> 00:02:55.100
They sent me to computer camp at one point, and that's really where I started to get excited about it.

00:02:55.220 --> 00:02:58.040
Then I went to the University of Waterloo, where I studied computer science.

00:02:58.040 --> 00:03:03.460
And there, of course, where I got, you know, much better fundamentals and started, you know, to build my career from there.

00:03:03.460 --> 00:03:06.480
And then I started with Python on one of my first internships.

00:03:06.480 --> 00:03:09.580
So I was an intern at BlackBerry, and this was probably back in 2007.

00:03:09.580 --> 00:03:12.500
And at that point, my, you know, programming experience, I was still pretty junior.

00:03:12.500 --> 00:03:15.420
And I was used to Java and these kind of very verbose languages.

00:03:15.420 --> 00:03:21.760
And then one of my previous colleagues had written the script in Python, and it was one of my tasks was to update it.

00:03:21.760 --> 00:03:25.000
So I got in there, and I started playing with it, and I realized, oh, actually, it's Python stuff.

00:03:25.000 --> 00:03:25.560
Pretty cool.

00:03:25.560 --> 00:03:28.120
It's much, you know, I can be more expressive, and I can work more quickly.

00:03:28.120 --> 00:03:32.140
And that's when I started to become a big fan, and I've used it in most of my personal projects since.

00:03:32.140 --> 00:03:38.820
A lot of people who don't work in Python have this sense that it's kind of like Bash, but you can make websites with it.

00:03:38.820 --> 00:03:39.980
Right.

00:03:40.080 --> 00:03:47.200
You know, they don't see the entire sort of landscape of all the amazing stuff you can do.

00:03:47.200 --> 00:03:51.400
And so a lot of us, when we come to you, you're like, oh, there's a little script I got to maintain.

00:03:51.400 --> 00:03:52.160
Oh, my goodness.

00:03:52.160 --> 00:03:54.100
Look, there's this thing called PyPI.

00:03:54.100 --> 00:03:55.880
And oh, my gosh, look what else you can do.

00:03:55.880 --> 00:03:56.240
Exactly.

00:03:56.240 --> 00:03:56.940
That's awesome.

00:03:57.220 --> 00:04:00.480
Increase in productivity, you know, it was kind of like night and day from what I was used to.

00:04:00.480 --> 00:04:01.580
So that's definitely.

00:04:01.580 --> 00:04:04.720
So what did you study in college and work on in BlackBerry?

00:04:04.720 --> 00:04:06.680
I studied mathematics in university.

00:04:06.680 --> 00:04:08.640
So I did computer science and also pure mathematics.

00:04:08.640 --> 00:04:09.080
Yeah, cool.

00:04:09.080 --> 00:04:10.480
So I went to University of Waterloo.

00:04:10.480 --> 00:04:12.720
And then while you're at Waterloo, you do six internships.

00:04:12.720 --> 00:04:15.140
So I was at BlackBerry for my first one.

00:04:15.140 --> 00:04:17.120
And there I was working on the automated testing team.

00:04:17.120 --> 00:04:18.460
So, you know, one of the things.

00:04:18.460 --> 00:04:19.220
Oh, that's a cool.

00:04:19.360 --> 00:04:22.260
Yeah, it was, you know, I was just getting started kind of with my very first internship.

00:04:22.260 --> 00:04:23.920
So it was a good way to get my feet wet.

00:04:23.920 --> 00:04:26.840
I think it's cool that university has the sixth internship.

00:04:26.840 --> 00:04:27.140
Yeah.

00:04:27.140 --> 00:04:30.500
Because a lot of times you come out of college and you're like, okay, now what?

00:04:30.500 --> 00:04:32.500
Yeah, it makes a huge difference.

00:04:32.500 --> 00:04:32.920
Definitely.

00:04:32.920 --> 00:04:34.000
I felt that in my career.

00:04:34.000 --> 00:04:36.420
You know, you graduate and you have six, four month internships.

00:04:36.420 --> 00:04:40.060
So you basically have almost two full years of work experience at this point.

00:04:40.060 --> 00:04:45.720
So it's really kind of night and day versus if you hadn't done these internships, what kind of experience, practical experience you have.

00:04:45.720 --> 00:04:47.940
And it really does complement the education well.

00:04:48.040 --> 00:04:52.460
Because what you learn in class is very kind of theoretic and good fundamentals.

00:04:52.460 --> 00:04:54.520
Yeah, it's somewhat detached some of the time.

00:04:54.520 --> 00:04:55.080
Yeah, exactly.

00:04:55.080 --> 00:05:01.100
But then when you get into the real, the industry, you're really learning totally different things that you, and you need both to be a good engineer in this day and age.

00:05:01.100 --> 00:05:01.820
Right, of course.

00:05:01.820 --> 00:05:11.220
So the company that you guys founded, SigOpt, has machine learning as its sort of central feature that it provides, right?

00:05:11.220 --> 00:05:17.820
And so could you tell everybody a little bit about just, you know, the mechanics of machine learning and describe how some of the pieces work?

00:05:17.820 --> 00:05:20.100
Just so, before we get into the details.

00:05:20.100 --> 00:05:21.280
How machine learning works?

00:05:21.280 --> 00:05:21.860
Well, yeah.

00:05:21.860 --> 00:05:23.940
So let's just take a concrete example, right?

00:05:24.020 --> 00:05:31.620
Suppose I've got some data about a bunch of shoppers and I've got a store and I'd like to optimize something.

00:05:31.620 --> 00:05:34.600
You know, give me a sort of how you'd go through that with machine learning.

00:05:34.600 --> 00:05:35.140
Yeah, sure.

00:05:35.140 --> 00:05:38.140
So a great example, if you have a store, would be you might want to predict fraud.

00:05:38.140 --> 00:05:45.220
But if you have, you know, much of purchaser, historical purchase data, and maybe you want to minimize the number of fraudulent purchases you receive.

00:05:45.220 --> 00:05:46.780
Because that's, you know, affects your bottom line.

00:05:46.780 --> 00:05:49.460
Every kind of, every fraud, fraudulent purchase costs you money.

00:05:49.460 --> 00:05:58.840
The problem that machine learning can help you solve is, given that you have all this historic data that you've built up over the years, how can you use that to, in the future, make predictions about events you haven't seen yet?

00:05:58.840 --> 00:06:03.240
So if I've seen a million purchases in the past, I know things about those purchases.

00:06:03.240 --> 00:06:04.960
Like I know the country that it came from.

00:06:04.960 --> 00:06:07.080
I know how much money the purchase was for.

00:06:07.080 --> 00:06:09.540
Whether it was from a sketchy IP address or something like that.

00:06:09.540 --> 00:06:14.220
And I also know the truth you need to learn from, which was that purchase fraudulent or not.

00:06:14.460 --> 00:06:22.000
And now you have this data set of past truth, and then you can apply machine learning methods on that data to produce a model that will make predictions in the future.

00:06:22.000 --> 00:06:22.320
Okay.

00:06:22.320 --> 00:06:29.360
So basically, as people are checking out, you can ask, hey, should I let this go through, even though the credit card system said yes, right?

00:06:29.360 --> 00:06:29.720
Exactly.

00:06:29.720 --> 00:06:34.340
So even though you don't know for sure yet whether it is fraudulent, you can make a guess, a good guess, based on the data you've seen.

00:06:34.340 --> 00:06:37.140
A good machine learning model, you know, will be accurate, as accurate as possible.

00:06:37.140 --> 00:06:39.760
You'll always have sort of false negatives and false positives.

00:06:40.120 --> 00:06:46.540
So, you know, saying a purchase is fraudulent when it might not be, or vice versa, thinking a fraudulent purchase is okay when it is fraudulent.

00:06:46.540 --> 00:06:48.000
But you want to, of course, minimize that.

00:06:48.000 --> 00:06:50.600
And a good machine learning model and machine learning process will reduce that.

00:06:50.600 --> 00:06:52.560
Let's talk about SigOpt a little bit.

00:06:52.560 --> 00:06:57.840
Your theme or your tagline is to help customers optimize everything.

00:06:57.840 --> 00:06:58.360
You got it.

00:06:58.360 --> 00:07:00.020
That's a pretty audacious goal.

00:07:00.020 --> 00:07:03.240
What is this company you guys founded?

00:07:03.240 --> 00:07:09.700
So SigOpt is a SaaS optimization platform that helps companies build better machine learning models with less trial and error.

00:07:10.020 --> 00:07:17.360
So we have an API that exposes the most cutting edge optimization research to our users to help them increase the accuracy of their machine learning models as quickly as possible.

00:07:17.360 --> 00:07:26.080
So in this example of a fraudulent, a fraud detector, our APIs would help them tune that model to make, to really reduce that, those cases where it would be inaccurate.

00:07:26.080 --> 00:07:29.140
And we want to get our users to these most accurate models as quickly as possible.

00:07:29.140 --> 00:07:33.560
We don't want them to spend days or weeks or months trying different possibilities kind of in the dark.

00:07:33.560 --> 00:07:33.940
All right.

00:07:33.940 --> 00:07:34.560
That makes sense.

00:07:34.560 --> 00:07:36.920
It sounds like a big challenge, but it makes sense.

00:07:37.300 --> 00:07:39.100
So you said you're a SaaS company.

00:07:39.100 --> 00:07:42.700
Do you actually run the machine learning algorithms?

00:07:42.700 --> 00:07:45.900
Do you provide like the compute and data processing?

00:07:45.900 --> 00:07:54.180
Or do you somehow just provide, hey, you should feed these inputs to scikit-learn along with your data?

00:07:54.180 --> 00:07:55.200
Yeah, that's a great question.

00:07:55.200 --> 00:08:01.780
The way we help our customers is they have their own models that they're running on their own clusters with their own data.

00:08:01.940 --> 00:08:03.980
So we're not actually running the models for them.

00:08:03.980 --> 00:08:05.940
And we're also not looking at any of their sensitive data.

00:08:05.940 --> 00:08:12.300
What we're doing is we're having our APIs are able to suggest what's the next version of your model you should try.

00:08:12.300 --> 00:08:17.060
So the models have parameters or variables that affect how well the model works.

00:08:17.060 --> 00:08:21.640
And finding the best version, the best values for those parameters is a very tough problem.

00:08:21.640 --> 00:08:23.920
And that's what SIGUP solves in the most efficient way possible.

00:08:23.920 --> 00:08:24.240
Okay.

00:08:24.240 --> 00:08:25.600
That's really interesting.

00:08:25.600 --> 00:08:27.780
Where did this idea for this company come from?

00:08:27.880 --> 00:08:28.000
Yeah.

00:08:28.000 --> 00:08:33.140
My co-founder, Scott, he was doing a lot of research on this field during his PhD at Cornell.

00:08:33.140 --> 00:08:38.540
So he was doing a lot of research in this field and had this great idea for how to build this kind of black box optimizer.

00:08:38.540 --> 00:08:40.660
That was the beginning of that idea.

00:08:40.660 --> 00:08:45.160
And we've built the company around it now to put an API around it to make it really easy to use,

00:08:45.160 --> 00:08:51.440
to administrate all the difficult computation that is required to serve these optimal suggestions.

00:08:51.440 --> 00:08:51.840
Okay.

00:08:51.840 --> 00:08:56.480
Do you have some interesting stories around stuff that you've helped people optimize?

00:08:56.480 --> 00:09:02.920
I heard one of them was something like shaving cream and another was synthetic rhinoceros horns.

00:09:02.920 --> 00:09:04.400
It's quite varied, right?

00:09:04.400 --> 00:09:05.300
Exactly correct.

00:09:05.300 --> 00:09:10.860
So the optimization, so I've been talking about machine learning here, but really it's actually able to optimize.

00:09:10.860 --> 00:09:17.300
Any problem in which trying something new is very expensive because we want to get you to the best version of your product in as few tries as possible.

00:09:17.400 --> 00:09:22.680
So that is true for a machine learning model that you have to train it or test it on real user traffic.

00:09:22.680 --> 00:09:23.520
And that's very expensive.

00:09:23.520 --> 00:09:26.020
But another thing that's very expensive is these chemical processes.

00:09:26.020 --> 00:09:31.200
So we have one customer who makes shaving cream and the expensive part for him is he has to go into the lab,

00:09:31.200 --> 00:09:37.100
mix up his ingredients into a jar or vat or something, wait for 100 hours for it to settle,

00:09:37.100 --> 00:09:39.000
and then test the quality of the shaving cream.

00:09:39.000 --> 00:09:42.900
And then based on that, he wants to find a shaving cream that makes him the most profit.

00:09:43.340 --> 00:09:50.520
So in this case, there's however many ingredients that go in the shaving cream and mixing them in different amounts is a very chaotic process.

00:09:50.520 --> 00:09:52.840
So as a chemist, he's an expert in his field.

00:09:52.840 --> 00:09:56.160
So he's able to say, oh, the ranges should probably be somewhere.

00:09:56.160 --> 00:09:58.640
This acid should probably be between five milliliters and seven milliliters.

00:09:58.640 --> 00:10:05.600
But then at some point, he gets to this area where the knowledge, he doesn't have as much knowledge as possible about the interactions.

00:10:05.600 --> 00:10:07.140
So this is where Seagat comes in.

00:10:07.140 --> 00:10:07.320
Right.

00:10:07.320 --> 00:10:11.240
It gets down to actual experimentation rather than theory, right?

00:10:11.620 --> 00:10:11.840
Yeah.

00:10:11.840 --> 00:10:12.340
Yeah.

00:10:12.340 --> 00:10:12.880
Trial and error.

00:10:12.880 --> 00:10:17.500
So now SigOpt, what we're able to do is we're able to model these chaotic effects and are able to suggest,

00:10:17.500 --> 00:10:19.860
here's the next version of your shaving cream you should try.

00:10:19.860 --> 00:10:25.340
So then he goes back in the lab, mixes up a batch with what we suggested, tells us how well it did,

00:10:25.340 --> 00:10:28.220
and then we use that to update our model and give another best suggestion.

00:10:28.220 --> 00:10:34.500
So before using SigOpt, this chemist had mixed up, you know, 100 different trials just by like keeping track of it in his lab notebook.

00:10:34.500 --> 00:10:35.580
It hadn't really gotten anywhere.

00:10:35.580 --> 00:10:41.660
But then he uploaded all his data to SigOpt, and now SigOpt is able to take in that data and analyze it and say, hey, here's what you should try next.

00:10:41.660 --> 00:10:42.620
And then he did that.

00:10:42.620 --> 00:10:47.500
And then within just five new batches, he already had a new version of his shaving cream that was 20% more profitable.

00:10:47.500 --> 00:10:47.980
Wow.

00:10:47.980 --> 00:10:48.840
That's really cool.

00:10:48.840 --> 00:10:51.320
And you're modeling physical stuff.

00:10:51.320 --> 00:10:55.760
It's not just like you're taking in a bunch of data and you're processing it.

00:10:55.840 --> 00:10:58.040
Like you're helping this guy, you know, make shaving cream.

00:10:58.040 --> 00:10:58.660
Yeah, exactly.

00:10:58.660 --> 00:11:00.820
In this case, it's helping with this very physical world problem.

00:11:00.820 --> 00:11:02.120
And you also mentioned this.

00:11:02.120 --> 00:11:02.520
You're right.

00:11:02.520 --> 00:11:05.260
There's another customer using us for synthetic rhinoceros horns.

00:11:05.260 --> 00:11:06.540
And this is the same thing.

00:11:06.540 --> 00:11:11.260
It's, you know, they want to make a rhino horn that has these properties, and there's various variables that go into that.

00:11:11.260 --> 00:11:13.260
Like how much of different chemicals do you use?

00:11:13.260 --> 00:11:14.500
And we're helping them with that process as well.

00:11:14.500 --> 00:11:14.840
That's cool.

00:11:14.840 --> 00:11:15.500
I have to ask.

00:11:15.500 --> 00:11:17.580
Like, why do they want a synthetic rhinoceros horn?

00:11:17.580 --> 00:11:18.240
It's a great question.

00:11:18.240 --> 00:11:21.760
My understanding is there's a large black market for rhinoceros horns.

00:11:21.760 --> 00:11:23.320
There's two things at play here.

00:11:23.400 --> 00:11:26.960
One is they want to reduce that black market because they want to stop poaching of rhinos.

00:11:26.960 --> 00:11:30.180
And one way to do that is to flood the market with these synthetic horns.

00:11:30.180 --> 00:11:37.600
So right now, you actually, my understanding is you can sell these synthetic horns for a very high price because they mimic the not very expensive real rhino horns.

00:11:37.600 --> 00:11:41.260
But if you can, but the synthetic ones can be produced at a much larger scale.

00:11:41.260 --> 00:11:45.720
So now the market's flooded with horns that are identical chemically, but much cheaper to produce.

00:11:45.720 --> 00:11:48.340
And now it's no longer profitable to actually poach a rhino.

00:11:48.340 --> 00:11:50.160
That's a really good thing for everyone.

00:11:50.160 --> 00:11:50.800
That's awesome.

00:11:51.200 --> 00:11:54.440
What other sort of stories do you have about what you've helped people optimize?

00:11:54.440 --> 00:11:54.700
Yeah.

00:11:54.700 --> 00:12:00.320
We also have customers doing synthetic egg whites, music recommendation models.

00:12:00.320 --> 00:12:05.160
There's really, it's going to be applied across the board on all these very black box, difficult problems.

00:12:05.160 --> 00:12:05.400
Yeah.

00:12:05.400 --> 00:12:05.660
Okay.

00:12:05.660 --> 00:12:12.520
One of the things I think is pretty awesome is some of the stuff at the heart of what you're doing is actually open source software, right?

00:12:12.520 --> 00:12:13.020
That's correct.

00:12:13.120 --> 00:12:14.460
Some of the algorithms and whatnot.

00:12:14.460 --> 00:12:18.300
And yet you're building a business on top of it.

00:12:18.300 --> 00:12:22.820
And I've talked to a couple of companies lately, a show that's about to come out next week.

00:12:22.820 --> 00:12:27.580
It's got a really cool sort of story of taking something open source and turn it into a business.

00:12:27.580 --> 00:12:28.740
And I want to dig into that.

00:12:28.740 --> 00:12:31.940
But one of the places you guys started was at Y Combinator, right?

00:12:31.940 --> 00:12:32.840
Yeah, that's correct.

00:12:32.840 --> 00:12:34.040
What was it like being at YC?

00:12:34.040 --> 00:12:35.500
It was a lot of fun.

00:12:35.500 --> 00:12:39.060
It was very kind of exciting, very, definitely a lot of emotions.

00:12:39.060 --> 00:12:43.060
You have a lot of fun, but you also have a lot of, you know, stress and you have a lot of, you're very busy.

00:12:43.060 --> 00:12:44.140
You learn a lot.

00:12:44.140 --> 00:12:45.920
So it was a really fantastic experience.

00:12:45.920 --> 00:12:52.900
You know, I think it took what me and my co-founder, when we started, you know, didn't really know much of anything about how to build or run a business.

00:12:52.900 --> 00:12:57.400
And just the three months of the first three months of the Y Combinator, you, it's very transformative.

00:12:57.400 --> 00:12:59.540
You know what to think about and what not to think about.

00:12:59.540 --> 00:13:00.960
You know how to stay focused.

00:13:00.960 --> 00:13:04.880
You know how to talk to users and build something that your customers want.

00:13:04.880 --> 00:13:06.560
And then it culminates in demo day.

00:13:06.560 --> 00:13:08.480
And that's when you present to a bunch of angel investors.

00:13:08.480 --> 00:13:11.420
And that's when it really starts your company from there, getting the fundraising you need.

00:13:11.420 --> 00:13:13.040
So it definitely makes a huge difference.

00:13:13.040 --> 00:13:15.880
We definitely wouldn't be where we are today if we hadn't gone through Y Combinator.

00:13:15.880 --> 00:13:16.400
That's awesome.

00:13:16.400 --> 00:13:19.420
So how do you actually get into Y Combinator?

00:13:19.420 --> 00:13:21.800
Is there like a call for applications?

00:13:21.800 --> 00:13:22.160
Yep.

00:13:22.160 --> 00:13:23.900
Are there like open demos?

00:13:23.900 --> 00:13:24.900
Like what's the story?

00:13:24.900 --> 00:13:26.380
Like how would someone out there is listening?

00:13:26.380 --> 00:13:27.840
Like I want to be part of Y Combinator.

00:13:27.840 --> 00:13:28.700
What do they do?

00:13:28.700 --> 00:13:30.320
They may have changed it a little bit since I did it.

00:13:30.360 --> 00:13:32.120
So this was I applied in 2014.

00:13:32.120 --> 00:13:35.460
But essentially, yeah, twice a year they have an open call for applications.

00:13:35.460 --> 00:13:37.980
And they say, what are you building?

00:13:37.980 --> 00:13:39.200
Who are you building with?

00:13:39.200 --> 00:13:40.240
Send us an application.

00:13:40.240 --> 00:13:42.260
They ask you, do you have any customers?

00:13:42.260 --> 00:13:42.980
Do you have any traction?

00:13:42.980 --> 00:13:44.200
Do you have any revenue?

00:13:44.200 --> 00:13:45.760
What have you built so far?

00:13:45.760 --> 00:13:48.340
And then they also ask a lot of questions about you as the founders.

00:13:48.340 --> 00:13:49.960
They ask, what have you built in the past?

00:13:49.960 --> 00:13:50.700
Where have you worked?

00:13:51.040 --> 00:13:55.700
One of the kind of notable questions they ask is, what's a system you've hacked in the past?

00:13:55.700 --> 00:14:07.900
So not hacking in the kind of infiltrating server sense, but like what's the kind of creative thing you've did in the past to skirt around some, I shouldn't say rules, but not like breaking the rules, but doing something clever or creative in the past.

00:14:07.900 --> 00:14:08.700
Yeah, that's really cool.

00:14:08.700 --> 00:14:11.140
Would you recommend other people go in there and try it?

00:14:11.140 --> 00:14:12.920
Like you found it to be a pretty positive experience?

00:14:12.920 --> 00:14:13.860
Yeah, I would say it was positive.

00:14:13.860 --> 00:14:15.840
And I mean, I can't say I can recommend it for everyone.

00:14:15.840 --> 00:14:19.660
I don't know everyone's specific situation, but I would recommend it definitely.

00:14:19.660 --> 00:14:31.720
It really did take, yeah, me and my co-founder from a state of no experience to, you know, competent at building a company in a very short time.

00:14:31.720 --> 00:14:38.680
Yeah, I think one of the challenges of starting a company, especially technical companies, the founders, they're technical, right?

00:14:38.680 --> 00:14:46.700
And part of starting a company, especially one like yours that is a highly technical science, math sort of company requires that.

00:14:46.700 --> 00:14:56.080
But that's not enough for like user growth and business deals and marketing and growth hacking and connections.

00:14:56.080 --> 00:14:57.660
Absolutely.

00:14:57.660 --> 00:14:58.940
That's a real big challenge.

00:14:58.940 --> 00:14:59.980
And YC helps with that, right?

00:14:59.980 --> 00:15:03.300
So both me and my co-founder were technical co-founders.

00:15:03.300 --> 00:15:06.160
So we had to pick up a lot of skills that we had no experience with.

00:15:06.160 --> 00:15:10.420
What I commonly say is before working at Stigop, my last job, I got to work on what I was good at every day.

00:15:10.420 --> 00:15:12.480
I was an engineer and I worked on engineering problems.

00:15:12.480 --> 00:15:16.500
Now as a co-founder, I work on things I'm bad at every day, which is very different.

00:15:16.580 --> 00:15:23.160
You're spending more time, you know, trying to get customers or build a company or think about culture or think about hiring or think about marketing or think about sales.

00:15:23.160 --> 00:15:27.580
All these things that I had no experience with prior to starting the company.

00:15:27.580 --> 00:15:30.040
So you definitely get a lot of experience with these things.

00:15:30.040 --> 00:15:34.380
Yeah, exposure to totally new things, which as you're right, are essential to running a successful business.

00:15:34.380 --> 00:15:36.260
But you kind of have to learn it on the fly.

00:15:36.260 --> 00:15:38.220
So Y Combinator, they do prepare you for this.

00:15:38.220 --> 00:15:40.920
Also, the network is really, really helpful.

00:15:40.920 --> 00:15:46.140
So there's probably over a thousand people in the Y Combinator network at this point, you know, past founders.

00:15:46.140 --> 00:15:48.440
And it's very friendly and welcoming community.

00:15:48.440 --> 00:15:53.040
You can message them if you have any advice or you have questions or things like that.

00:15:53.040 --> 00:15:53.420
Oh, yeah.

00:15:53.420 --> 00:15:54.900
That sounds super helpful.

00:16:05.580 --> 00:16:08.400
This episode is brought to you by Hired.

00:16:08.400 --> 00:16:14.860
Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.

00:16:14.860 --> 00:16:24.040
Each offer you receive has salary and equity presented right up front, and you can view the offers to accept or reject them before you even talk to the company.

00:16:24.600 --> 00:16:30.400
Typically, candidates receive five or more offers in just the first week, and there are no obligations, ever.

00:16:30.400 --> 00:16:32.480
Sounds pretty awesome, doesn't it?

00:16:32.480 --> 00:16:34.520
Well, did I mention there's a signing bonus?

00:16:34.520 --> 00:16:38.620
Everyone who accepts a job from Hired gets a $2,000 signing bonus.

00:16:38.620 --> 00:16:42.960
And as Talk Python listeners, it gets way sweeter.

00:16:42.960 --> 00:16:50.540
Use the link Hired.com slash Talk Python To Me, and Hired will double the signing bonus to $4,000.

00:16:50.540 --> 00:16:52.260
Opportunity's knocking.

00:16:52.740 --> 00:16:55.860
Visit Hired.com slash Talk Python To Me and answer the call.

00:16:55.860 --> 00:17:09.840
When you got through with Demo Day, I saw that you guys actually had two seed rounds.

00:17:09.840 --> 00:17:12.900
One for like $120,000 and one for $2 million.

00:17:12.900 --> 00:17:14.080
Congratulations on that.

00:17:14.080 --> 00:17:14.200
Thank you.

00:17:14.200 --> 00:17:15.300
That's pretty exciting, right?

00:17:15.300 --> 00:17:18.460
Was that first part sort of the end of YC?

00:17:18.780 --> 00:17:20.680
Yeah, the first part is from Y Combinator.

00:17:20.680 --> 00:17:22.480
And then at the end of Y Combinator, there's Demo Day.

00:17:22.480 --> 00:17:25.760
And that's when we raised our larger round.

00:17:25.760 --> 00:17:29.100
And that was to get us, you know, over the next, yeah, get started from there.

00:17:29.100 --> 00:17:29.400
Yeah.

00:17:29.400 --> 00:17:31.560
So what's it like taking VC money?

00:17:31.840 --> 00:17:37.220
That's another skill that you don't have any exposure to before starting a company, or at least I didn't.

00:17:37.220 --> 00:17:39.740
So yeah, fundraising is a very kind of interesting beast.

00:17:39.740 --> 00:17:51.180
You have to have your pitch ready, and you spend a lot of time talking to investors, and you're dealing with rejection, and also, yeah, communicating in this way that you're not used to this very, yeah, salesy way.

00:17:51.400 --> 00:17:53.460
Yeah, you have to have a very crisp message, right?

00:17:53.460 --> 00:17:55.280
But it's also different from selling to customers, right?

00:17:55.280 --> 00:17:59.900
You're, you know, for customers, you're talking about what's their immediate needs, and how can I satisfy them now?

00:17:59.900 --> 00:18:06.880
And when talking to VCs, you're talking about, oh, what's kind of the grand vision of what's the app going to look like in 10, 20 years?

00:18:06.880 --> 00:18:09.320
What's the, how big could we possibly be?

00:18:09.440 --> 00:18:14.360
Yeah, the customer wants to know, how are you going to help me make my store or chemical process better?

00:18:14.360 --> 00:18:20.120
The VCs want to know how you're going to reach, like, large-scale growth and continue user acquisition.

00:18:20.120 --> 00:18:21.960
Like, the customer doesn't care at all about that, right?

00:18:21.960 --> 00:18:22.260
Exactly.

00:18:22.260 --> 00:18:25.780
It's good, you know, talking to investors, you know, and they can really, they can be a lot of help.

00:18:25.780 --> 00:18:30.840
They can give you a lot of great resources, especially if you get good investors, which we were very fortunate enough to do.

00:18:30.840 --> 00:18:35.340
Another set of connections and network, almost as much as anything, right?

00:18:35.340 --> 00:18:36.080
Cool.

00:18:36.080 --> 00:18:39.100
So you guys started in November 2014, right?

00:18:39.260 --> 00:18:39.640
That's correct.

00:18:39.640 --> 00:18:42.420
And you started with Python from day one.

00:18:42.420 --> 00:18:45.380
What was the reaction with NYC for using Python?

00:18:45.380 --> 00:18:51.180
They give you a lot of autonomy to build what you want, so I don't think that anyone else really would have cared what we built it in.

00:18:51.180 --> 00:18:53.580
But it made a lot of sense for us as a company, you know.

00:18:53.580 --> 00:19:00.340
So both me and my co-founder knew Python, but I also could tell that this is, you know, would be really great for getting it started and moving quickly.

00:19:00.340 --> 00:19:09.240
So especially when we have this kind of short timeline of three months to get out the door before demo day, what can we use to really make sure?

00:19:09.240 --> 00:19:11.180
As productive as possible right from day one.

00:19:11.180 --> 00:19:13.120
And so Python was a pretty natural choice.

00:19:13.120 --> 00:19:22.000
Yeah, I guess, especially in the sort of accelerator type scenario, speed to market is more important than CPU cycles.

00:19:22.000 --> 00:19:22.340
Exactly.

00:19:22.340 --> 00:19:23.680
Or memory usage or whatever.

00:19:23.680 --> 00:19:24.180
Yeah.

00:19:24.280 --> 00:19:25.400
But you guys are sticking with it, right?

00:19:25.400 --> 00:19:26.500
You've been going strong.

00:19:26.500 --> 00:19:27.060
Yeah, exactly.

00:19:27.060 --> 00:19:30.020
So we have, yeah, almost everything is in Python.

00:19:30.020 --> 00:19:36.020
We have some of our, we have some things in C++ for some very high end optimization that needs to be very performant.

00:19:36.020 --> 00:19:40.380
And we use, you know, Python and C++ bindings for that to interact with the rest of our stack.

00:19:40.380 --> 00:19:44.200
We also, you know, we have JavaScript, of course, because we have a website as well.

00:19:44.200 --> 00:19:46.200
But everything back end is Python.

00:19:46.200 --> 00:19:47.020
Okay.

00:19:47.020 --> 00:19:47.700
That's cool.

00:19:47.700 --> 00:19:49.420
And is that Python 2 or Python 3?

00:19:49.420 --> 00:19:50.860
We are Python 2.

00:19:50.860 --> 00:19:51.360
Okay.

00:19:51.360 --> 00:19:52.180
Right on.

00:19:52.180 --> 00:19:57.960
And you said you had considered maybe using Go or Scala or some other languages.

00:19:57.960 --> 00:20:00.140
Can you just sort of discuss those trade-offs?

00:20:00.140 --> 00:20:00.460
Sure.

00:20:00.460 --> 00:20:03.560
So I definitely, I had some experience with both of those.

00:20:03.560 --> 00:20:06.240
I think Scala is really nice when I did use it.

00:20:06.240 --> 00:20:07.520
It's very safe.

00:20:07.520 --> 00:20:08.900
The type safety is really good.

00:20:08.900 --> 00:20:14.060
But it's very, it's still very expressive, much more expressive than Java or something, which it's based off of.

00:20:14.440 --> 00:20:15.180
And that's really nice.

00:20:15.180 --> 00:20:18.520
I think I would be less confident we could move as quickly as we could with Python there.

00:20:18.520 --> 00:20:23.500
You can be very expressive, but there's a lot of kind of tooling issues, like the compile times can be very slow.

00:20:23.500 --> 00:20:26.140
And also, it's perhaps not as well known.

00:20:26.140 --> 00:20:31.840
So you spend some amount of time training new hires or other people to use Scala as well.

00:20:31.840 --> 00:20:34.680
And I don't think that's necessarily a bad investment to make always.

00:20:34.680 --> 00:20:41.820
But when we were at this stage where we want to move very quickly, then we should stick with something that we can be confident about that in.

00:20:42.540 --> 00:20:47.180
And then I had a similar opinion about Go, which is like that was a great language and really has all nice features.

00:20:47.180 --> 00:20:51.320
But at the time, I believe it was still, you know, less mature than it is today.

00:20:51.320 --> 00:20:54.160
I recall wanting to use an AWS library.

00:20:54.160 --> 00:20:55.740
And for Go, there wasn't one at the time.

00:20:55.740 --> 00:20:56.560
I think there is now.

00:20:56.560 --> 00:20:59.960
But Python, of course, had a great AWS library just ready to go.

00:20:59.960 --> 00:21:10.660
So those little decisions of just like, I want to spend my time working on the core business and not tooling and not, you know, building third-party packages, which could already exist.

00:21:10.660 --> 00:21:11.580
Right.

00:21:11.580 --> 00:21:18.280
There's zero business benefit in you building a Go implementation of Bodo or whatever, right?

00:21:18.340 --> 00:21:18.700
Exactly.

00:21:18.700 --> 00:21:21.380
So I definitely think, you know, there's lots of trade-offs to be made.

00:21:21.380 --> 00:21:23.860
And that's kind of where we were leaning at that time.

00:21:23.860 --> 00:21:25.380
And I'm really happy with it now.

00:21:25.380 --> 00:21:28.240
So I think it's, you know, we've definitely, we've built a lot.

00:21:28.240 --> 00:21:32.660
And I don't think we'd be where we are today if we had perhaps started with a different language.

00:21:32.660 --> 00:21:32.960
Sure.

00:21:32.960 --> 00:21:39.260
So how much does SciPy and NumPy and all those pieces sort of play into this?

00:21:39.260 --> 00:21:40.920
Was that like pretty critical to making it work?

00:21:40.920 --> 00:21:41.400
Yeah.

00:21:41.400 --> 00:21:46.260
So we use a lot of that for kind of, yeah, high capacity computation and optimization.

00:21:46.260 --> 00:21:53.500
They're, you know, really great tools for that, that we love, both in the kind of prototyping phase where we want to try something new on some data set or something like that.

00:21:53.500 --> 00:21:55.920
Using SciPy, scikit-learn, NumPy are fantastic.

00:21:55.920 --> 00:22:01.000
And then also, you know, they have the performance we need to actually put them in production when we're appropriate.

00:22:01.000 --> 00:22:04.980
So we also have, of course, things that we built ourselves that are very well tuned to our own problem.

00:22:04.980 --> 00:22:07.120
And as I mentioned before, typically those are in C++.

00:22:07.120 --> 00:22:12.340
But having this wide array of tools available to us in Python is essential.

00:22:12.340 --> 00:22:12.620
Cool.

00:22:12.620 --> 00:22:14.540
And I heard you say AWS in there.

00:22:14.540 --> 00:22:16.700
Is that where you guys are hosting your SaaS product?

00:22:16.700 --> 00:22:17.560
Yeah, that's correct.

00:22:17.560 --> 00:22:17.880
Okay.

00:22:17.880 --> 00:22:21.200
Are you doing like multiple data centers and things like this?

00:22:21.200 --> 00:22:22.340
Or is it US East one?

00:22:22.340 --> 00:22:23.200
US West.

00:22:23.200 --> 00:22:24.300
Yeah.

00:22:24.300 --> 00:22:28.260
So we're still, yeah, there's lots, I would say we're still pretty small.

00:22:28.260 --> 00:22:31.580
So there's lots of, there's lots of improvement to do in a lot of fields.

00:22:31.580 --> 00:22:37.060
But specifically, yeah, we're AWS is, we're big fans of it.

00:22:37.060 --> 00:22:37.660
Okay, cool.

00:22:37.660 --> 00:22:37.920
Yeah.

00:22:37.920 --> 00:22:38.380
So am I.

00:22:52.380 --> 00:22:58.820
Gone are the days of tweaking your server, merging your code, and just hoping it works in your production environment.

00:22:58.820 --> 00:23:08.740
With SnapCI's cloud-based, hosted, continuous delivery tool, you simply do a git push, and they auto-detect and run all the necessary tests through their multi-stage pipelines.

00:23:08.740 --> 00:23:10.060
Something fails?

00:23:10.060 --> 00:23:12.260
You can even debug it directly in the browser.

00:23:12.600 --> 00:23:20.220
With a one-click deployment that you can do from your desk or from 30,000 feet in the air, Snap offers flexibility and ease of mind.

00:23:20.220 --> 00:23:22.040
Imagine all the time you'll save.

00:23:22.040 --> 00:23:28.860
Thanks, SnapCI, for sponsoring this episode by trying them for free at snap.ci slash talkpython.

00:23:38.020 --> 00:23:47.480
You guys have to really focus on API design because that's one of the primary ways that people interact with your entire business as a SaaS product.

00:23:47.480 --> 00:23:48.700
How do you guys think about that?

00:23:48.700 --> 00:23:49.100
Yeah.

00:23:49.280 --> 00:23:59.900
Using optimization tools like SciPy or like the open source or other open source optimization tools, one of the problems with them is that they're very, very difficult to use for a lot of reasons.

00:23:59.900 --> 00:24:08.920
One of them is, you know, the API might be, you know, very obtuse or not very well optimized to your problem or things like that.

00:24:08.920 --> 00:24:10.160
Another problem is administration.

00:24:10.160 --> 00:24:16.220
You have to have the servers and set them up and you have to know what kind of capacity you need to optimize your problem.

00:24:16.220 --> 00:24:20.480
Like if you use machines that are too small, then it will be very slow and so on.

00:24:20.480 --> 00:24:20.720
Right.

00:24:20.720 --> 00:24:24.580
But if you bit too many, you're just going to waste money on big machines doing nothing, right?

00:24:24.580 --> 00:24:32.340
So what we wanted to do was we wanted to make sure that we can take, you know, this very valuable tool, this optimization tool and really make it what's the easiest possible way to expose it to our customers.

00:24:32.460 --> 00:24:42.660
And so for that reason, it's one of our top priorities is having an API that's very, very clean, very predictable, very easy to use, does what you want, a small number of endpoints.

00:24:42.660 --> 00:24:46.240
You're not having to, you know, dig into the nitty gritty.

00:24:46.240 --> 00:24:48.860
You're not tweaking flags in our optimization.

00:24:48.860 --> 00:24:50.940
You're just saying, here's my problem.

00:24:50.940 --> 00:24:52.220
Tell me what to do next.

00:24:52.220 --> 00:24:52.460
Right.

00:24:52.460 --> 00:25:00.680
Do you really try to optimize for the simple getting started case and then provide additional features as needed or how do you do that?

00:25:00.680 --> 00:25:04.640
We want to make sure that, yeah, getting started is as easy as possible.

00:25:04.640 --> 00:25:11.000
So if you have your problem, you can use our API to define your problem and say, here are the parameters that I'm searching over.

00:25:11.000 --> 00:25:13.860
You can also use our API to say, here's what I've tried so far.

00:25:13.860 --> 00:25:16.760
And then our API will tell you, here's what to try next.

00:25:16.760 --> 00:25:23.320
So it's this very simple, like three calls, one to get started and then one to ask for a suggestion and one to report an observation.

00:25:23.320 --> 00:25:26.300
And that's really all a user needs to get started with SigOpt.

00:25:26.300 --> 00:25:34.980
And then to the extent that there are, you know, expert level flags, like if you really know that you want this particular optimization thing, we want to make sure it's possible.

00:25:34.980 --> 00:25:42.700
But we really don't want it to be users to have to know about it or have to be confused by it just to get access to the really powerful tooling inside SigOpt.

00:25:42.860 --> 00:25:51.900
Yeah, that seems like it's keeping with your overall mission of democratizing this machine learning and optimization in general, right?

00:25:51.900 --> 00:25:52.560
Yeah.

00:25:52.560 --> 00:25:53.280
Very, very cool.

00:25:53.280 --> 00:25:58.520
So let's talk a little bit about the open source engine that's kind of at the heart of a lot of what you're doing.

00:25:58.520 --> 00:26:01.300
So your co-founder, Scott, he used to be at Yelp.

00:26:01.300 --> 00:26:01.760
Is that right?

00:26:01.760 --> 00:26:02.420
Yes, that's correct.

00:26:02.500 --> 00:26:06.600
Yeah, he worked on this thing called metric optimization engine or Moe.

00:26:06.600 --> 00:26:07.400
Yeah, that's right.

00:26:07.400 --> 00:26:07.740
Okay.

00:26:07.740 --> 00:26:11.300
And so this is sort of one of the core components of your system, right?

00:26:11.300 --> 00:26:14.100
Yeah, so we definitely have, we've built more on top of it.

00:26:14.100 --> 00:26:18.140
And, you know, SigOpt these days is powered by like an ensemble of multiple optimization methods.

00:26:18.140 --> 00:26:24.640
But how we got started from the very beginning was this, yeah, this Moe that Scott had built using his expertise in the field.

00:26:24.640 --> 00:26:29.200
And that was the first prototype for can this black box optimization really be, can it be profitable?

00:26:29.200 --> 00:26:30.220
Can it be easy to use?

00:26:30.220 --> 00:26:33.400
Can it be something that people would want to want to interact with?

00:26:33.400 --> 00:26:34.940
And so Moe was the first prototype for that.

00:26:34.940 --> 00:26:36.540
And, you know, Moe was very well received on GitHub.

00:26:36.540 --> 00:26:38.880
Lots of companies and individuals were using it.

00:26:38.880 --> 00:26:42.680
And then some of the very common feedback was, oh, but it's really hard to use.

00:26:42.680 --> 00:26:44.220
Or, oh, perhaps it doesn't work.

00:26:44.220 --> 00:26:47.160
And then the reason it didn't work was because this one flag wasn't set.

00:26:47.160 --> 00:26:51.380
So he and I knew and could see that there's real, real value here.

00:26:51.380 --> 00:26:59.080
But the biggest problem is how can it be even easier to use so that companies of any size can have access to this really, really powerful stuff.

00:26:59.080 --> 00:26:59.860
Yeah, that's great.

00:26:59.860 --> 00:27:01.620
There's this opportunity.

00:27:01.620 --> 00:27:06.020
You've got this great open source tool, but you want to build a business.

00:27:06.020 --> 00:27:11.420
Can you kind of take me through the thinking of like, all right, we're still giving away this thing for free.

00:27:11.420 --> 00:27:16.920
And yet we need, you know, to make something special that customers will buy and love.

00:27:16.920 --> 00:27:18.440
What was the thought process?

00:27:18.440 --> 00:27:19.860
Just that it's too hard.

00:27:19.860 --> 00:27:22.320
How can we make it not so hard and accessible?

00:27:22.320 --> 00:27:28.220
Yeah, I would say that's, yeah, the biggest barrier to using Moe is that it's very hard to use.

00:27:28.520 --> 00:27:34.920
I think if people still want to use Moe, then they can, but they're going to have a lot of headaches with it.

00:27:34.920 --> 00:27:37.480
Our goal is that we want to be the people who remove those headaches.

00:27:37.480 --> 00:27:40.040
But as I said, so we kind of, we started with Moe.

00:27:40.040 --> 00:27:46.540
And now, in addition to that, we have a wide variety of other optimization methods that we've built in-house.

00:27:46.540 --> 00:27:47.980
Those are not open source.

00:27:48.580 --> 00:27:55.720
So using Cigar does also give you this kind of wider array of benefits, other techniques, other optimization, kind of cutting edge research.

00:27:55.720 --> 00:27:58.040
But you're right that like this, it is still open source.

00:27:58.040 --> 00:27:59.200
It is still available.

00:27:59.200 --> 00:28:06.880
And like it is still good for the community to have access to this open source stuff to kind of, you know, they can see where we're coming from and what's been built and the kind of background behind it.

00:28:06.880 --> 00:28:14.360
Yeah, I think there's a bunch of stories that are coming out of people building really amazing businesses on top of things that they're giving away.

00:28:14.360 --> 00:28:16.860
And I think that's becoming much more common.

00:28:16.860 --> 00:28:21.040
But still, I think it's really special to see examples of it in action.

00:28:21.040 --> 00:28:33.020
Like with you guys, recently I spoke to the Scraping Hub guys that took Scrapey and turned that into kind of a SaaS platform, you know, web scraping as a platform, if you will.

00:28:33.920 --> 00:28:38.380
You know, then there's like more obvious examples like MongoDB, Red Hat.

00:28:38.380 --> 00:28:44.720
But it's really cool to see you guys sort of turning this into a business starting from this sort of kernel of open source.

00:28:44.720 --> 00:28:49.040
Do you still make a lot of contributions to Mo or are there a lot of other people working on it?

00:28:49.040 --> 00:28:49.940
What's the story there?

00:28:49.940 --> 00:28:50.920
Some people work on it.

00:28:50.920 --> 00:28:54.100
Some individuals kind of like in the open source community have been contributing to it.

00:28:54.100 --> 00:28:59.780
These days, our contributions are mostly focused on these other methods that we kind of have diagnosed.

00:28:59.780 --> 00:29:02.700
Like we're able to diagnose like these are the problems that Mo worked well for.

00:29:02.960 --> 00:29:04.600
And then these are the problems that Mo doesn't work well for.

00:29:04.600 --> 00:29:08.300
And then how can we as a company attack those and help our customers who have those kinds of problems?

00:29:08.300 --> 00:29:10.240
And so we spend a lot of time focusing on that.

00:29:10.240 --> 00:29:13.340
And that's sort of perhaps tangential to the public Mo.

00:29:13.340 --> 00:29:14.280
So we're still working on that.

00:29:14.280 --> 00:29:16.900
At this time, that's still closed source to SIGopt.

00:29:16.900 --> 00:29:17.220
Right.

00:29:17.220 --> 00:29:20.900
Of course, you got to keep the secret sauce a little bit secret, right?

00:29:20.900 --> 00:29:21.680
Awesome.

00:29:21.680 --> 00:29:25.740
So can you talk a little bit about how some of the internal systems work?

00:29:25.740 --> 00:29:31.600
We talked about your deployments being on AWS, but is there like interesting architecture that you might want to talk about?

00:29:31.740 --> 00:29:33.320
Sure. So we have our web and API.

00:29:33.320 --> 00:29:35.960
That's, as I mentioned before, all Python.

00:29:35.960 --> 00:29:39.760
We use Flask in production to do kind of web serving.

00:29:39.760 --> 00:29:48.040
For the most part, we have like our API, which is a very thin layer that just accepts, you know, API requests from our customers of the form I said before where they're describing their problem.

00:29:48.080 --> 00:29:55.100
But then most of the work we do is we offload that asynchronously to these high capacity compute machines that are doing the optimization.

00:29:55.100 --> 00:30:03.700
So for these problems that are very expensive, we want to be able to, you know, run SciPy or these custom C++ algorithms that we're working on.

00:30:03.960 --> 00:30:08.900
Okay. Yeah. Of course, you definitely want to buy the highest end compute machine you can.

00:30:08.900 --> 00:30:10.600
It seems to make a really big difference.

00:30:10.600 --> 00:30:16.240
If you sort of double your VM and AWS, it seems like the low end is really low.

00:30:16.240 --> 00:30:16.680
Yeah.

00:30:17.680 --> 00:30:24.000
Yeah. It's great. You have this kind of flexibility to split up your architecture that way for these machines that don't need it and machines that do.

00:30:24.000 --> 00:30:29.540
Right. Do you, I mean, that's part of the beauty of the cloud, right? Like it's a checkbox or an API call.

00:30:29.540 --> 00:30:33.720
Are you guys doing anything with GPUs given how computational this is?

00:30:33.760 --> 00:30:40.260
We do. We have experimented with that in the past, definitely. Like for some of these algorithms, which are, can be really parallelized over GPUs.

00:30:40.260 --> 00:30:43.940
That's still something we're working on and experimenting with to see if it's worth productionizing.

00:30:43.940 --> 00:30:50.380
Cool. So maybe you could just like in a few sentences, tell people like, what's the deal with like computation and GPUs? Like enough for graphics.

00:30:50.380 --> 00:30:55.800
Yeah. So I'll be honest and say that this is not one of my deep areas of expertise.

00:30:55.800 --> 00:31:02.480
But my understanding is that your GPUs, you know, they're very highly optimized for this widely parallel computation.

00:31:03.000 --> 00:31:12.180
So definitely when you're doing these kind of expensive CPU bound optimization techniques, then you can use GPUs to get those done as quick as possible.

00:31:12.180 --> 00:31:17.240
Yeah. Yeah. It's pretty amazing when you look at it. I mean, when I first heard about it, I was like, really? What?

00:31:17.240 --> 00:31:22.060
Or doing like math on the GPU? I mean, of course, if you look at video games.

00:31:22.060 --> 00:31:32.020
Yeah. Like for a while I worked on 3D simulators and the amount of computation those graphics cards do just to render a scene is mind boggling for one scene.

00:31:32.120 --> 00:31:35.300
And then they do it like 60 or 100 times a second. It's crazy.

00:31:35.300 --> 00:31:40.440
And that was, you know, so long ago, many years ago that I was doing that. I was still super impressed.

00:31:40.440 --> 00:31:50.680
If you look at the parallels, I'm like, I've got a MacBook Pro Retina and it's got a pretty high end CPU, which I think has four real cores and each one is hyper threaded.

00:31:50.680 --> 00:31:51.600
So it looks like eight.

00:31:51.900 --> 00:31:58.800
But some of those graphics cards have like over a thousand cores. So trying to decide between eight and a thousand for parallels.

00:31:58.800 --> 00:31:59.500
Wow.

00:31:59.500 --> 00:32:00.960
Yeah. It makes a big difference.

00:32:00.960 --> 00:32:09.300
That's pretty insane. So, yeah. I remember a few years ago I saw on AWS as one of the machine types, like a clustered GPU thing.

00:32:09.300 --> 00:32:11.380
What is that doing in the cloud?

00:32:11.380 --> 00:32:13.300
But yeah. Yeah. Very cool.

00:32:13.360 --> 00:32:16.320
So you said you experimented with it. Is it looking promising or?

00:32:16.320 --> 00:32:20.960
I mean, so far so good. Yeah. It's definitely seems like something we might might want to go forward with.

00:32:20.960 --> 00:32:27.920
Yeah. I haven't really tried this computational stuff with it. There's some really interesting projects, but I just haven't had a use for that much computation, I guess.

00:32:27.980 --> 00:32:32.640
But it seems like if you find a case where it works, it works crazy good.

00:32:32.640 --> 00:32:33.040
Definitely.

00:32:33.040 --> 00:32:44.700
But it's not like a general computer, right? You can't just give it any problem. So there's certain types of problems that are really appropriate or algorithms and certain ones that aren't. So I guess, you know, that's kind of a big decision, whether it makes sense or not. Right.

00:32:44.700 --> 00:32:52.920
We're getting kind of near the end of the show. Let me ask you a few questions I ask all my guests. When you're going to write some Python code, what editor do you open up?

00:32:52.920 --> 00:33:03.720
I use Vim. I've been using Vim, yeah, for about eight years now. I think I'm, you know, got it pretty well optimized. I feel more productive in that than just about anything else. So that's what I use for just about everything.

00:33:03.720 --> 00:33:20.240
Yeah. Very cool. You know, it's very unscientific. Maybe I could pass a few data points off to your system and ask it. But I would say I think Vim seems to be winning the popularity battle among my guests anyway. I'm not sure if they're representative or not at the overall community. But yeah, very, very cool.

00:33:20.240 --> 00:33:31.380
And, you know, PyPI has 75,000 packages. Now, just it's insane. You know, how many things are out there that you can just grab and bring into your apps in Python?

00:33:31.380 --> 00:33:35.660
What ones would you recommend or what are really important things people should know about?

00:33:35.800 --> 00:33:39.740
I would be self-serving and say the SigOps API client Python package.

00:33:39.740 --> 00:33:48.320
Ones that I use in my own personal day to day. Definitely IPython. I get a huge amount of value out of just for the REPL and using IPython notebooks.

00:33:48.320 --> 00:33:52.180
Like increasing productivity over the regular REPL is kind of astounding.

00:33:52.180 --> 00:33:57.520
This is obviously a pretty common one, but I think requests, that's just like, it's become such a necessary part of my toolkit.

00:33:57.520 --> 00:34:00.480
Feel shocking that it's not part of the standard library at this point.

00:34:00.480 --> 00:34:07.180
Yeah, that's a really interesting comment. I agree that requests absolutely should be up there. It's the most popular package on PyPI.

00:34:07.180 --> 00:34:07.640
Oh, is it?

00:34:07.640 --> 00:34:08.080
By the way.

00:34:08.080 --> 00:34:08.640
I believe it.

00:34:08.640 --> 00:34:15.520
Yeah, it's so clean and so useful. And I was talking to Kenneth Wright. He was on the show.

00:34:15.520 --> 00:34:22.780
And it was the creator. He was saying they were considering making requests part of the standard library.

00:34:22.780 --> 00:34:32.760
But they decided not to because they wanted to be able to rev the features and security fixes and various things of requests faster than they do Python itself.

00:34:32.760 --> 00:34:40.400
So they decided to keep it separate, but to make it the recommendation to basically not recommend using URL lib, things like that.

00:34:40.400 --> 00:34:43.500
Go, no, no. pip install requests. This is how we do it.

00:34:43.500 --> 00:34:45.060
Let's just all agree on that.

00:34:45.060 --> 00:34:46.520
But yeah, that's pretty interesting, right?

00:34:46.520 --> 00:34:51.060
They actually considered making it part of Python, but for versioning and agility.

00:34:51.060 --> 00:34:51.860
Well, it makes a lot of sense.

00:34:51.860 --> 00:34:54.320
Yeah, yeah, yeah. They decided not to.

00:34:54.320 --> 00:34:59.460
Okay, awesome. If people want to get started with SigOpt, what do they do? How do they get started?

00:34:59.460 --> 00:35:03.860
Head over to sigopt.com, sign up, you get started. We have a free trial.

00:35:03.860 --> 00:35:10.240
And if you have any of these problems, whether it's a machine learning model that you want to make more accurate or some other kind of process that you're trying to optimize,

00:35:10.240 --> 00:35:14.160
you just sign up, get started with our API, and you're off the races.

00:35:14.160 --> 00:35:18.760
Okay, that sounds great. Yeah. So pip install SigOpt, sign up, off you go, right?

00:35:18.760 --> 00:35:19.280
You got it.

00:35:19.280 --> 00:35:22.540
All right, Patrick. It's been great having you on the show. This is really interesting.

00:35:22.540 --> 00:35:26.340
You guys are doing some cool stuff and continue to optimize everything.

00:35:26.340 --> 00:35:27.880
All right, we will. Thanks, Michael.

00:35:27.880 --> 00:35:29.300
All right. See you later.

00:35:29.300 --> 00:35:32.260
This has been another episode of Talk Python To Me.

00:35:32.260 --> 00:35:36.320
Today's guest was Patrick Hayes, and this episode has been sponsored by Hired and SnapCI.

00:35:36.320 --> 00:35:38.020
Thank you guys for supporting the show.

00:35:38.020 --> 00:35:40.140
Hired wants to help you find your next big thing.

00:35:40.240 --> 00:35:45.760
Visit Hired.com slash Talk Python To Me to get five more offers with salary and equity presented right up front,

00:35:45.760 --> 00:35:48.080
and a special listener signing bonus at $2,000.

00:35:48.080 --> 00:35:51.360
SnapCI is modern continuous integration and delivery.

00:35:51.360 --> 00:35:57.240
Build, test, and deploy your code directly from GitHub, all on your browser with debugging, Docker, and parallelism included.

00:35:57.240 --> 00:36:00.260
Try them for free at Snap.CI slash Talk Python.

00:36:00.780 --> 00:36:02.820
Are you or a colleague trying to learn Python?

00:36:02.820 --> 00:36:07.200
Have you tried boring books and videos that just cover a topic point by point?

00:36:07.200 --> 00:36:12.400
Check out my online course, Python Jumpstart by Building 10 Apps, at training.talkpython.fm,

00:36:12.400 --> 00:36:14.440
for a different take on learning Python.

00:36:14.440 --> 00:36:21.000
You can find the links from today's show at talkpython.fm/episodes slash show slash 51.

00:36:21.000 --> 00:36:22.960
Be sure to subscribe to the show.

00:36:22.960 --> 00:36:26.260
Open your favorite podcatcher and search for Python, which will be right at the top.

00:36:26.760 --> 00:36:29.800
You can also find the iTunes and direct RSS feeds in the footer of the website.

00:36:29.800 --> 00:36:34.420
Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.

00:36:34.420 --> 00:36:37.220
You can hear the entire song on talkpython.fm.

00:36:37.220 --> 00:36:39.080
This is your host, Michael Kennedy.

00:36:39.080 --> 00:36:40.240
Thanks so much for listening.

00:36:40.240 --> 00:36:42.180
Smix, take us out of here.

00:36:42.180 --> 00:36:45.980
Stating with my voice, there's no norm that I can feel within.

00:36:45.980 --> 00:36:48.820
Haven't been sleeping, I've been using lots of rest.

00:36:48.820 --> 00:36:51.680
I'll pass the mic back to who rocked it best.

00:36:51.680 --> 00:36:54.980
On first developers, developers, developers, developers.

00:36:54.980 --> 00:36:56.680
On first developers, developers.

00:36:56.680 --> 00:37:00.680
On first developers, developers, developers.

00:37:00.680 --> 00:37:02.680
developers.

00:37:02.680 --> 00:37:03.180
you

00:37:03.180 --> 00:37:03.680
Bye.

00:37:03.680 --> 00:37:03.920
you