#187: Secure all the things with HubbleStack Transcript
00:00 How do you keep track of your security, configuration states, and even out-of-date system-level packages in your servers?
00:06 Well, what if you had 40,000 or more servers to manage?
00:10 How would your process scale then?
00:12 I'll tell you, mine would take a few tweaks.
00:14 On this episode, you'll meet Colton Myers, who is a co-creator of Hubblestack.
00:18 Hubblestack is an open-source security compliance framework.
00:21 It provides on-demand profile-based auditing, real-time security event notifications, alerting, reporting.
00:27 And yes, Colton, the group, has over 40,000 servers, and Hubblestack is watching over all of them.
00:34 Learn about this cool Python-based framework on this episode of Talk Python to Me.
00:38 It's episode 187, recorded November 14th, 2018.
00:56 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.
01:02 This is your host, Michael Kennedy.
01:04 Follow me on Twitter, where I'm @mkennedy.
01:06 Keep up with the show and listen to past episodes at talkpython.fm.
01:10 And follow the show on Twitter via at Talk Python.
01:12 This episode is sponsored by Linode and Rollbar.
01:16 Please check out what they're offering during their segments.
01:18 It really helps support the show.
01:19 Colton, welcome to Talk Python.
01:22 Let's get started with your story and how you got into Python.
01:25 I've been using computers since my parents got a black and white Macintosh.
01:29 I don't actually know what it is.
01:30 I should go back and figure out what Macintosh it was so I can tell people.
01:33 But I would play Lode Runner on that when I was little.
01:36 I'm, what, 30?
01:38 Yeah, 30.
01:38 I can never remember.
01:39 So I'm younger than a lot.
01:43 I guess I'm starting to get into the middle of the pack these days.
01:45 But yeah, I remember playing with that when I was like seven on this little black and white Macintosh.
01:49 Then we had like a Windows 3.1 machine that I'd play games on.
01:52 And I mostly played games until high school.
01:54 And then in high school, I actually, half of my senior year, I did the supplied technology course in computer programming, which was really great, actually.
02:03 Like half the day I would send over here.
02:04 It was C#.
02:05 They had just switched from C++ to C#.
02:07 Wow, that's a pretty big improvement, actually.
02:09 Oh, yeah, yeah, yeah.
02:10 I'm with you on that.
02:11 And C# is a great language.
02:12 And back then I was Windows, Windows, Windows, you know, because I was played video games, right?
02:17 So it worked out just fine.
02:19 And it was a really good course and really gave me like a jump start.
02:22 And from there on, I knew that's what I wanted to do.
02:25 So went into college.
02:26 That's awesome.
02:27 Let me ask you more about that because that sounds so awesome.
02:30 This was a high school program?
02:32 Yeah, yeah.
02:33 It was like...
02:33 And how long did you get to spend programming each day?
02:36 Half the day.
02:37 Like you could either do morning.
02:39 They had a morning session or an afternoon session.
02:41 And so like basically at lunchtime, you would switch either back to your high school or to
02:45 the biotechnology center.
02:46 And it was difficult because you had to like structure your whole schedule around it.
02:50 Like you had to start planning.
02:51 You could technically do it as a junior and a senior if you like really plan.
02:54 But then you'd have to like...
02:55 I was also in like band and stuff like that.
02:57 And I'd have to have dropped that.
02:58 And I was like, no, I'm just going to do it as a senior.
03:00 But you still have to like make sure that you have all your other like things to graduate,
03:04 you know, in the rest of your schedule.
03:06 Your four years of math or three years of English, whatever, right?
03:10 But it was a super cool program.
03:11 Like and our professor, I think he's now working for Microsoft or something.
03:16 But he also used to teach at university.
03:18 Like he was really good.
03:19 And he did a great job of like teaching us the right way.
03:23 And I don't know, man.
03:24 It was a really good introduction to programming for me.
03:27 I am so jealous.
03:28 When I was...
03:30 This wasn't high school.
03:31 It was my last year in middle school, I think.
03:33 I'm pretty sure.
03:34 But, you know, it could have been the same in high school.
03:36 My teacher, you know, I took sort of an intro to programming course, right?
03:39 And my teacher was kind of okay.
03:42 And I was just really loving it.
03:44 I said, oh, I'd love to write this kind of program.
03:46 And like, how can I learn more?
03:47 And they're like, well, you can do that.
03:48 But you got to understand, like real programming is actually super hard.
03:51 And I don't know if you really can do it.
03:52 Like, I mean, that was my idea.
03:53 Like, okay, great.
03:54 Oh, well, that's awesome.
03:57 I don't know.
03:58 I'm blessed.
03:58 Programming just clicks with my brain.
04:00 Like, it's just is...
04:01 It has never been a struggle for me to learn how to program.
04:04 Like, you know, none of the conditionals, none of the advanced logic.
04:07 Like, obviously, I struggle with like advanced algorithms like anybody does.
04:10 But like, the basics of programming just came to me immediately.
04:13 And so...
04:15 And my teacher actually recognized that.
04:16 And it was super nice because he like had me helping other students, which was frustrating as much as it was.
04:21 But it was useful, right?
04:22 Like, I've always been a bad teacher because with stuff that comes so easily to me because I can't bridge that gap.
04:28 But he had me helping other students.
04:30 But then he also like had me doing some of like the second year curriculum along with the first year curriculum.
04:34 Because he's like, like, I would just finish so early on these projects.
04:38 And you just be like, well, you can either just sit here and do whatever you want.
04:41 Or I can get you started on this other stuff.
04:43 And it was a real blessing like to have that start and to have somebody so supportive.
04:48 Yeah, I've never heard of anything like that.
04:51 I've heard of like vocational stuff like shop and, you know, carpentry, but never software.
04:57 So, man, that's awesome for you.
04:59 All right.
05:00 So, that was C# back in the day and Windows.
05:03 And somehow, your project is not in C#.
05:05 No.
05:06 So, college, I went straight into computer science.
05:09 I knew exactly what I wanted to do.
05:10 And I managed to make a couple of friends there that were super useful because I went to the University of Utah.
05:16 And at the time I went, it was still, you know, there.
05:18 They would start in C#.
05:20 And then you go to C++.
05:21 There was a lot of Java involved, like, through all their classes.
05:25 But they didn't like Python hasn't hadn't at that point really pierced the, you know, the education, which, which is a whole different conversation as to whether we should actually be teaching Python as a first language.
05:36 I almost think it's almost doing a disservice because it's so easy to program in Python.
05:40 But I don't know.
05:41 It's a hard conversation.
05:42 But I made some good friends who basically we kind of pushed each other to explore outside of what our classes had.
05:50 So, like some classes where you could choose the language and most people chose C# or Java because that's what we were learning, right?
05:54 But we decided to try Python at one point.
05:56 And it was, you know, really awesome.
05:57 And we were using Git long before they, you know, taught us how to use Git and all these things that really, like, helped jumpstart that and helped me get a base in stuff that maybe I wouldn't have otherwise.
06:07 And I found Python.
06:09 I got my laptop stolen in while I was on vacation in Hawaii.
06:13 And I had a good friend at work who convinced me to buy a Mac instead of a Windows machine, which was a great decision.
06:21 I really like macOS in general, especially because it's just Unix underneath.
06:25 And so, you know, that got me into more of the Unix side of things and kind of away from Windows.
06:30 And you're like, hey, this C# stuff doesn't really work so well.
06:33 There's mono, but it kind of is clunky over there.
06:36 And so now what, right?
06:37 Well, once I found Python, like Python, you just get rid of so much of that boilerplate.
06:41 Like Java and C# have all this boilerplate and it's just so verbose.
06:45 And Python, like, any time, like, Python was really interesting because, like, I would be like, oh, I need to do this.
06:52 I need to, like, open a file or whatever.
06:53 And I'd be like, okay, so I'm betting it's, like, this is the function name.
06:57 And it turns out most of the time it was right.
06:59 Like, it just, like, it felt like it was written out of my, straight out of my brain.
07:02 And it was just so nice.
07:04 But, yeah, like, with C# back then, you couldn't really write C# in Linux.
07:08 Like, mono was just getting off, just getting started.
07:11 And so I didn't really, yeah, I basically avoided it wherever I could from then on.
07:18 That's awesome.
07:18 Yeah, it was really great.
07:19 Yeah.
07:19 And of course, you know, those other libraries didn't have nearly the package management story and the ecosystem story that Python does.
07:26 Sure.
07:27 Yeah.
07:27 And then the way, it's really interesting because I just feel like, I don't know, I just have, like, there are so many things in my life that if they'd changed just a little bit, I could have been on such a different path, right?
07:38 And I think that's true of all of us.
07:39 But, like, when I was getting close to graduating, I was on a local Python user group.
07:43 And local startup advertised on that user group, hey, let's, we're looking for Python developers.
07:49 And I was like, well, I'd love to do Python.
07:52 But, like, my internship that I'm working on is in C# and C++, right?
07:56 I'm working, like, at this, you know, on this legacy code base and this DoD contractor, this local or whatever.
08:01 And I was like, I don't have any experience.
08:05 Like, I've just written some Python for, like, class projects.
08:07 And they're like, well, come in and interview.
08:09 I was like, okay.
08:10 And apparently, I interviewed pretty well.
08:12 They were impressed.
08:13 So, they actually hired me on part-time until I graduated with the expectation that, assuming I did well, you know, they'd hire me full-time.
08:20 And it worked out.
08:20 And I, that's how I joined SaltStack.
08:22 So, I worked with, for SaltStack for three years and really, like, got my Python, you know, really solid and, you know, worked, was able to expand my ops knowledge, you know, because it's a configuration management tool.
08:35 Yeah, it's not just the Python you got to learn.
08:37 Right.
08:37 It's also all the Linux and, you know, DevOps side of things.
08:41 Yeah.
08:41 So, that was, like, that was hugely important to my career.
08:45 Because without that, like, I don't know.
08:47 I could have found something, I'm sure, is like a junior Python dev.
08:50 But it just really jump-started it and actually directly led to my position at Adobe, too.
08:54 So, yeah.
08:55 That's kind of where my career path has been.
08:57 No, that's a super interesting story.
08:59 So, speaking of where you are today, you know, so what do you do day-to-day at Adobe?
09:03 So, I'm a senior software engineer at Adobe.
09:05 And I actually mostly work on Hubble.
09:08 I joined a, I joined actually, the other creator of Hubble.
09:13 It was kind of his brainchild, but he's not really a developer.
09:15 So, he was able to, like, you know, say, hey, I want to do this proof of concept.
09:19 Can I hire somebody for it?
09:20 And he hired me.
09:21 We had met because he was actually in the SaltStack community.
09:24 And his name is Christopher Edwards, by the way.
09:26 I should give him credit.
09:27 But this is his baby.
09:29 And so, he hired me to do this.
09:30 And the idea was to base it on Salt.
09:32 And basically, we were trying to replace an existing vendor solution with this proof of concept because the vendor solution wasn't very popular and was very expensive.
09:41 And so, yeah, we were able to...
09:44 So, he brought me in on this.
09:45 And it became just almost overnight.
09:48 Like, we wrote this proof of concept.
09:49 And it was good, right?
09:50 And people liked it.
09:52 And all of a sudden, it was like, oh, hey.
09:54 Other people heard about it.
09:55 And they're like, hey, by the way, our contract with CloudPassage is up next.
09:59 And we're losing this promotional pricing.
10:02 It's going to get like three times more expensive.
10:04 So, could you maybe make it so that Hubble is everywhere at Adobe within like, you know, the next six months?
10:10 Like, that'd be great.
10:11 It's like...
10:12 No pressure or anything, right?
10:15 Oh, it was crazy.
10:17 It was a crazy, crazy time.
10:18 But we were able to...
10:20 Like, we had an open source from the beginning, which was super awesome.
10:22 And because it's grown so much inside of Adobe and is so important to our security posture,
10:28 it's my full-time job.
10:29 I get paid full-time to write open source software in Python.
10:33 So, you can't really go wrong with that.
10:35 Yeah.
10:36 I feel like that's a lot of Python developers' dreams, right?
10:40 They're like, I want to work on this open source thing I love.
10:42 And I want to get paid full-time for it.
10:44 Like, there's not that many people that get to do that.
10:46 And I just like fell into it.
10:47 Like, I...
10:48 So much, so much...
10:49 Yeah, I don't know.
10:50 I don't even know how to approach how lucky I am that way.
10:54 But it's been really great.
10:56 So, yeah, that's kind of what I do.
10:57 I just work on Hubble Stack.
10:58 Okay.
10:59 Super.
10:59 So, maybe let's start with a super high-level conversation about this.
11:05 Just even...
11:06 Not just Hubble Stack and Hubble and Hubble Stack, but just this idea in general.
11:10 So, what it is an open source security compliance monitoring and auditing system.
11:15 Like, tell us just sort of about that world.
11:17 Like, what kind of...
11:18 So many words.
11:19 What kinds of things...
11:20 Yeah, I know.
11:21 Yeah.
11:21 So, just tell us, you know, what's that world like?
11:23 And what kind of role does this try to play in there?
11:26 So, there's a couple pieces, right?
11:27 We...
11:28 Security has multiple facets.
11:29 And one of those facets is compliance.
11:32 And really, the reason compliance is so important...
11:35 I'm going to put air quotes around important.
11:37 Because the reason compliance is so important is for these certifications, right?
11:41 If you deal with customer data, especially like credit card data and that kind of stuff,
11:45 then you need like PCI.
11:46 Right.
11:47 Exactly.
11:47 You know, you might have SOX compliance.
11:50 You might have HIPAA, right?
11:51 All these different things that if you can check that box, you can have customers that you might
11:55 not otherwise, right?
11:55 And each of these has different requirements for compliance.
12:00 A lot of them are a lot less fleshed out than you'd expect.
12:03 Most of them, like, they just say, like, the audit...
12:05 The auditors come in and they...
12:07 And the audit control says something like, you have a baseline security compliance check thing that you audit against regularly and
12:16 make sure that everything is in compliance, right?
12:17 And it's just like written like that.
12:19 And so at Adobe, we use the CIS compliance checks, Center for Internet Security.
12:24 They are a nonprofit that publishes these profiles that allow you to...
12:29 You know, if you follow all these checks in here, then you'll be more secure than if you
12:33 didn't.
12:34 That's the idea, right?
12:34 Right.
12:35 So does it include, like, known vulnerabilities or is it more like guidelines on, like, hardening
12:41 of the server or what's...
12:42 Yeah, it's more along the lines of hardening.
12:44 Like, I don't think CIS...
12:46 I don't actually know if they do, but I don't think they publish, like, a vulnerability list
12:49 or a CVE list or anything like that.
12:50 It's more like, hey, do you have root login enabled on your server, right?
12:55 In SSH.
12:56 And if you do, that's a problem, right?
12:57 Do you have password list login?
13:00 Do you have password login even, right?
13:02 Like, or is it just SSH?
13:03 Like, you know, those are easy ones to point out for SSH, but it's also like, hey, do you
13:08 have Telnet installed, right?
13:09 Do you have all these different things that are just like, you know, do you have FTP?
13:13 Are you serving anything via FTP as opposed to SFTP, right?
13:17 Like, just the gimmies, right?
13:19 And obviously, they get a lot worse, like...
13:22 Or a lot worse.
13:23 A lot more stringent.
13:24 Like, oh, you need to have SSH timeouts on your host set to these low values, right?
13:29 So that it quickly kicks people off if they're not typing and that kind of stuff.
13:32 And those, you know, obviously, as you get further and further into these checks, these
13:36 different levels, they actually split them up into levels.
13:38 It becomes less convenient, right?
13:40 Because security is always a compromise between convenience and security, right?
13:44 Right.
13:45 So we basically choose a subset of those and we audit against them.
13:47 And that is actually where like the big money is.
13:49 Like, that's where we get like customers, right?
13:51 But if I'm being honest, it's not actually making us a lot more secure.
13:56 Like, obviously, if we have root login enabled, we should not, right?
13:59 But like you fix that once.
14:00 But that doesn't mean you're secure now, right?
14:03 And so the other piece, the rest of Hubble is just to collect other data that our blue team
14:08 can use to watch for actual like ongoing attacks, big holes, you know, that maybe an audit wouldn't
14:15 have caught otherwise.
14:16 Right.
14:16 So give me an example.
14:18 This is something I wonder all the time.
14:20 If you don't know, that's fine.
14:21 But give me an example of what an attack looks like, right?
14:26 Like if I log into my server, it seems okay.
14:28 You know, nobody's saying, hey, our data seems to be gone.
14:31 But like, how do I, you know, shared, right, like stolen, but how do I look at the server
14:36 and know outside of say web logs?
14:39 Yeah, I mean, that's kind of a hard question.
14:41 And I and that's actually not my expertise.
14:42 Like I'm not a security engineer, but I work with them a lot.
14:46 One that's pretty common in 2018 is if somebody gets access to your server, they might install
14:51 a cryptocurrency miner on your server, right?
14:53 And so, you know, one of the ways we can look for that kind of stuff is, oh, you know, in
14:58 one piece of Hubble, we collect like the established outbound connections on your host, right?
15:02 Because certain services have outbound connections, like that's a thing that they should
15:09 have.
15:09 But a lot of them don't.
15:09 A lot of them just accept inbound, right?
15:11 Especially if it's just like a web server, right?
15:12 It's accepting inbound connections, but not reaching out.
15:15 So if we see an outbound connection from a process we don't recognize or an uncommon process,
15:21 then we might look into that and say, hey, you know, what's going on here?
15:25 I see.
15:25 That makes a lot of sense.
15:26 Yeah, I mean, there are a lot of like tells like you can.
15:28 You could have increased CPU usage or like, you know, stuff slowing down.
15:32 Like, obviously, they're trying not to leave tracks, right?
15:34 So they will tend, you know, attackers will tend to try to not let you know that they're
15:40 there.
15:40 So it's hard.
15:41 Like, it's not it's a hard problem.
15:42 And that's why we have a whole security team.
15:44 We have the team I work with just with the digital marketing.
15:47 They must have 30 security engineers, both like red team and blue team, like just constantly
15:52 combing this data looking for patterns.
15:54 And we have a whole security operations center looking for these alerts.
15:57 And it's like it's a big thing because it's harder and harder, you know, in 2018 to keep
16:03 your stuff secure.
16:03 It definitely is.
16:04 So maybe just to find some terms for folks who are not super up to date on the security
16:09 side of things.
16:10 Red team is the penetration testers and the folks trying to break in and the blue team
16:15 are people like using Hubble to say, you know, wait, wait, wait.
16:18 Why is it making all these connections to this VP over this VPN to somewhere we don't
16:22 recognize?
16:23 Yeah, exactly.
16:23 The red team is trying to break it.
16:25 They're the hackers.
16:26 They're the white hat hackers, right?
16:27 That say, how can we we're just going to spend all day trying to get into Adobe's hosts.
16:31 And then if we find a way in, then we can patch that hole.
16:34 The blue team can patch that hole.
16:35 And they actually have like these whole things where like red team will set up these different
16:38 things and make sure the blue team catches them.
16:40 Like sometimes red team will work with access to the box and just say, let's assume I get access
16:45 to the box.
16:46 I don't you know, I don't have a way right now.
16:47 We've patched all those holes, but let's assume and I'm going to just set up these breadcrumbs
16:51 and make sure that blue team is catching them because blue team says they're watching for
16:54 these things, but are they actually watching?
16:55 You know?
16:55 Yeah.
16:56 And it's this back and forth.
16:57 And then blue team is trying to catch red team.
16:59 But then they're also trying to catch like black, you know, like like the actual attackers,
17:03 you know, because people are always trying to get into our data, right?
17:07 Like they it could be really lucrative.
17:08 And so we've had so many hacks that we see in the in the industry.
17:11 And, you know, nobody doesn't want to be next on that list.
17:14 So we have a substantial team.
17:16 You do not want to be in those headlines.
17:17 That's right.
17:17 So no, no.
17:19 This portion of Talk Python to me is brought to you by Linode.
17:25 Are you looking for bulletproof hosting that's fast, simple and incredibly affordable?
17:29 Look past that bookstore and check out Linode at talkpython.fm/Linode.
17:34 That's L I N O D E.
17:36 Plans start at just $5 a month for a dedicated server with a gig of RAM.
17:41 They have 10 data centers across the globe.
17:43 So no matter where you are, there's a data center near you.
17:45 Whether you want to run your Python web app, host a private get server or file server, you'll
17:51 get native SSDs on all the machines, a newly upgraded 200 gigabit network, 24 seven friendly support, even on holidays and a seven day money back guarantee.
18:01 Do you need a little help with your infrastructure?
18:03 They even offer professional services to help you get started with architecture, migrations and more.
18:09 Get a dedicated server for free for the next four months.
18:12 Just visit talkpython.fm/Linode.
18:17 Do you think containers and Docker and things like that make this harder or easier to protect?
18:23 It's a combination.
18:24 One thing that we've been struggling with is the so, you know, Docker is 12 factor apps, right?
18:30 Like, are you familiar with the 12 factor thing?
18:31 Yeah.
18:32 But maybe give the listeners a quick, just a super quick rundown.
18:35 What is the site?
18:36 It's like there's like a site 12 factor dot net with like one, two factor dot net.
18:40 And basically it's like this, the systems of things that like make, let's read their introduction.
18:45 Software as a service, 12 factor app is a methodology for building software as a service inputs that have these different things.
18:51 And really it's where containers come from and where these microservices come from.
18:55 Like you want everything to be immutable and like easy to, you know, set up fresh, you know, instead of changing a host when it starts to misbehave, you just burn it down and create a new one.
19:03 Don't embed all the configuration information and put that like in the environment.
19:07 And so it's separate from, say, your source code and your app code and whatnot.
19:11 And there are some great advantages to that because if you start with an image that like, for example, in auditing, right?
19:16 If you start with an image that has all of those audit checks already passing, right?
19:20 Then you're just starting from a much better place, right?
19:23 And you can make sure that everybody is starting from this, you know, hardened, secure image, right?
19:26 And then past that, when you're not messing with hosts, you have a lot fewer snowflakes.
19:31 I mean, you have a lot fewer inconsistencies across your hosts in theory, right?
19:35 And really where the attackers get in is in those snowflake servers that like have kind of been forgotten about, you know, sometimes they get in via dev, you know, whatever.
19:43 Nobody wants to upgrade it because it might break the thing.
19:46 Right.
19:46 That person used to maintain, but they left the company and no one wants to make it their problem.
19:50 So they don't touch it.
19:51 But it has problems too.
19:52 Like, for example, the established way to pass secrets into a container via the 12 factor method is via environment variables.
20:00 And security engineers don't like that.
20:02 They don't like secrets and environment variables.
20:04 Because soon as you're on the box, you can just, you know, query the environment variables and see all the logins and stuff, right?
20:10 Exactly.
20:10 And theoretically, you know, you don't have the ability to log into the actual container, right?
20:16 Because you don't ever need to.
20:17 And so theoretically, and then of course, you're using CoreOS or something that is theoretically more secure, right?
20:23 So like there's frameworks around that to try to prevent that from being as bad of a problem as it is.
20:27 But it still is, you know, a hard thing that we're trying to deal with.
20:29 So I do think 12 factor and containers and immutable images and those kinds of things do have a lot of advantages.
20:37 But and probably the advantages override the disadvantages.
20:40 But the other problem is that not every use case fits into that, right?
20:45 Containers don't like, you know, we have 40,000 some odd servers across our analytics product.
20:51 And like using those, like it's just not, you know, it would take a huge rewrite of that product to make it a microservices, you know, product kind of a thing.
21:00 So like, so like it's going to take many years.
21:02 And even then there are certain applications where like just being on bare metal is just better, more performant, etc.
21:09 So it's never going to be like the only thing that we do.
21:12 I don't think, I don't know.
21:13 I'm not going to try to guess.
21:14 But I do think from a security perspective, it is better overall.
21:18 Yeah, interesting.
21:19 I think it probably is as well.
21:20 It's easier to set one baseline and just create the images from that.
21:24 But still, I think it also, it seems like things change more often.
21:28 And somebody could flip a switch in that image that you don't realize or in a base image, you know, somewhere down that chain.
21:33 And, you know, all of a sudden, everything's like that.
21:36 Not just that snowflake server, right?
21:38 But that's also easy to undo when you find it, right?
21:40 Because again, you just switch it on a higher level.
21:43 And the next time they build their image, they get that, right?
21:44 So I don't know.
21:45 But yeah.
21:47 Yeah, a bit of a diversion.
21:48 But so let's, if you don't mind, tell us as much as you're happy to share.
21:53 Just a little bit about, you talked about 40,000 servers, something like that.
21:57 When I think of Adobe, I think of Photoshop and things like that that are more apps on like a desktop or a tablet or something.
22:06 Give us a sense of what a cloud and server world looks like over there.
22:10 So I have less of a sense on the, so we have two major business units.
22:14 We have our digital media, which is what you referred to, the Photoshop, etc, etc.
22:20 And then we have our digital marketing or digital experience as it's been rebranded recently.
22:23 And that actually started from an acquisition of a company called Omniture here in Utah.
22:28 And it turned into like, Adobe is one of the big players in marketing analytics.
22:32 They do everything from helping customers see what's working as far as advertising on their sites to helping serve social media stuff.
22:42 There's a whole suite of stuff that you can get from Adobe.
22:44 And it's much less known, even though it's a pretty big business unit.
22:47 And that's actually, on the digital media side, we have servers, right?
22:50 Because we have now Creative Cloud, right?
22:52 And we have to serve all that data.
22:53 We have to keep your...
22:54 You can keep your files in our Creative Cloud, all that stuff, right?
22:57 Right.
22:57 But it's much bigger on the digital marketing side because we're doing the software as a service for all these big companies.
23:03 Yeah.
23:04 So on the digital marketing side, we have something like 100, 120,000 servers or so across a whole bunch of teams.
23:10 We just acquired...
23:11 Adobe just finished acquiring Marketo, which added a whole bunch of new infrastructure to our...
23:16 That we have to worry about.
23:18 So how does that kind of stuff sort of shape what you're building and what Hubble is, where somebody outside of the company says,
23:28 Oh, we just acquired this other company.
23:29 And now they use this technology and here's all their servers.
23:31 It changed a lot.
23:32 It affected us a lot.
23:33 LiveFire, we acquired a few years ago.
23:36 And that one was a big one because they did a lot of auto scaling in AWS.
23:39 And our original Hubble was built on top of Salt directly, right?
23:44 You would deploy it as a series of modules inside of Salt.
23:48 And that worked well.
23:50 Not nearly as well as today's product, but they worked well.
23:53 But then we get to LiveFire and it's going to be prohibitively expensive for them to actually deploy SaltStack everywhere.
23:58 Because what they use, they use AWS a lot like containers, right?
24:02 They use AMIs.
24:03 They use images, right?
24:05 And they don't really modify their host.
24:07 They don't really have a good configuration management story in place because they don't need to, right?
24:10 They use it like containers.
24:11 So we basically realized that we needed a product that could be standalone, right?
24:15 That did not rely on Salt.
24:16 Because as this went wider and wider, a lot of teams at Adobe don't use Salt.
24:20 And we couldn't necessarily force them to.
24:22 Or it wasn't going to be prudent, right?
24:24 So we created Hubble, which we use PyInstaller to compile.
24:30 So that it basically, we do our best to compile everything into one directory.
24:35 All of our dependencies, our packages have no dependencies, except glibc, which we can assume is going to be on a Linux host.
24:42 It has no dependencies.
24:44 It ships with its own Python.
24:45 It doesn't mess with existing Pythons.
24:47 It doesn't mess with existing Salt.
24:48 It doesn't mess with anything.
24:49 And that was hugely advantageous because now we can just say, hey, teams, all you need to do is install this package, put the configuration in place once, and restart Hubble.
25:00 And all of a sudden, you were getting all of your reporting, right?
25:03 And we're not adding a whole bunch of new stuff to overwhelm your Salt masters, right?
25:08 We talked directly from those hosts to Splunk, which is the tool we use to collate all of those results.
25:14 And it turned out to be just way better.
25:18 And so eventually, we actually...
25:19 Oh, and by the way, that version of Hubble does still use Salt.
25:23 It just uses it as a library.
25:24 It just imports Salt.
25:25 Right.
25:25 But we were able to just slim it down and make it so much more targeted and make it work so much better and make the onboarding story so much better that we eventually deprecated Hubble Salt and just focused on the standalone.
25:38 Right.
25:39 So Hubble Salt was the original one where it was just completely assuming this is going to be built on SaltStack, right?
25:46 And now you kind of...
25:47 Yeah, you just add it as like a remote in your SaltStack master and it just like pulls down the stuff.
25:51 Right, right.
25:52 But with all these acquisitions and disparate infrastructure, it's like, okay, well, that's not working.
25:57 So let's not require that.
25:59 Let's just do this straight Hubble.
26:01 And what I'm really impressed with is your ability to take this and deploy it.
26:06 No Python dependency on the machine.
26:09 No pip install dash R requirements dependency on the machine.
26:14 Just here is a binary, run it.
26:16 Like that seems just golden for Python.
26:18 Oh, it's amazing.
26:19 Like I, you know, yeah, pip is great.
26:21 But like being able to like a lot of if you talk to SREs, they don't like pip in general.
26:27 Yeah.
26:27 They want to install via system packages.
26:30 Why wouldn't they want to just download arbitrary code from the internet and run it?
26:33 That's so, so paranoid.
26:35 It's just, I mean, and pip has gotten so much better.
26:38 But like, you know, it still doesn't even, it has like some uninstall issues.
26:41 Like these days, it's a lot better than it used to be.
26:43 But yeah, like they just want to do it as a package because that's what they're doing with everything else.
26:47 Right.
26:47 They just want to do RPM or deb.
26:49 Like that's it.
26:50 And so it's been really advantageous to do that.
26:52 And we actually, we also use, we cheat a lot, right?
26:55 We actually don't use proper Debian standards for packaging.
26:59 We just use FPM because FPM is amazing and just makes it so easy to package for any operating system.
27:05 And it's just been really good.
27:07 And it was so much better.
27:08 Like even if you're using salt, like my team uses salt.
27:11 And it was like upgrading Hubble salt was much more of a pain to get consistent across our infrastructure than just doing this package.
27:18 And so, you know, I immediately switched us to Hubble instead.
27:22 And I was like, this is so much better.
27:23 So, you know, we're actually in the process of completely phasing out Hubble salt.
27:27 It's mostly gone out of Adobe's infrastructure and Hubble is Hubble's king.
27:31 So is Hubble salt still a thing in open source?
27:34 Or are you trying to push everyone over to Hubble outside Adobe as well?
27:37 It's not a thing.
27:38 Well, if somebody wanted to fork it, it's Apache 2.
27:40 Go for it.
27:41 But we have stopped maintaining it.
27:43 I put a note in the readme and, you know, said we're deprecating this.
27:48 Like I encourage you to try Hubble because it's just better.
27:52 Like it's just a better story for users.
27:54 So obviously, I don't know if we said it explicitly, but Hubble is written in Python.
27:58 Yeah.
27:58 Yes.
27:59 So that's great.
28:00 But I do love that you don't have to, you know, as a consumer, of it for your servers.
28:06 You don't have to care that it's in Python.
28:08 You just take the thing and run it.
28:09 It's great that we have pip and it's in Python and all that for developing it.
28:14 But it's, I just don't feel like pip is the right answer for consumers and users in general.
28:20 And it's cool that you guys made that work.
28:22 Yeah.
28:22 We like it a lot.
28:23 Yeah.
28:23 So when I go to the GitHub organization, I see three things at least.
28:28 I see Hubble and Hubble salt.
28:30 Those are the two we talked about, right?
28:31 And then there's also Hubble stack data.
28:34 Now, do you want to maybe tell us about that?
28:36 So the idea behind Hubble is that we, because upgrading is expensive, even though it shouldn't
28:41 be, even though it should just be a, an RPM upgrade, right?
28:43 It's expensive to work with all of these teams to upgrade because they're busy with their own
28:46 stuff, right?
28:47 So the idea was to try to make it as much as possible that we could safely change the information
28:53 that we're gathering from a source that, you know, security engineering controls.
28:58 Right.
28:58 So this is something like the vulnerable, new vulnerabilities or new things to check
29:03 for.
29:04 Exactly.
29:04 So we find out that we need to check for this new file or whatever.
29:08 We want to watch this file that we haven't been watching, or we want to add this new audit
29:11 check or whatever.
29:11 We don't want to necessarily have to deploy new code to the servers, right?
29:15 But on the other hand, we also don't want to give ourselves the ability to run arbitrary
29:19 code on the servers.
29:21 It's this compromise, right?
29:22 We work as hard as we can to give ourselves as much flexibility in data gathering, but we also
29:26 work as hard as we can to make sure that we can't modify the server from Hubble stack data
29:30 and that any changes to Hubble stack data will be safe, right?
29:34 So, you know, we have an agreement with our customers, which are other users at Adobe, that
29:39 we won't push code to that Hubble stack data repo.
29:42 And then we also, you know, we'll notify them and do it with a change request, a change management
29:46 request inside of Adobe whenever we merge to like the master branch, right?
29:50 So we treat it almost like a code deploy, even though it's not.
29:52 And that way, you know, we mitigate the risk and we do testing around it before we merge
29:58 those things.
29:58 And that has just allowed us to create trust with our customers, right?
30:02 That we're not going to, that Hubble is not going to mess up their servers.
30:04 And so, yeah, Hubble stack data is where all of the profiles is the word we've chosen.
30:09 That basically tells Hubble what data to collect.
30:12 And inside of there, we have a couple of different components inside of Hubble.
30:16 I guess I should have given like a high level review earlier than this, probably right?
30:20 35 minutes in here.
30:21 It's all right.
30:22 Like I mentioned, Hubble does a couple of things, right?
30:24 It does these audits, which are success fail checks against like, say, CIS or Red Hat Stig,
30:31 any, you know, these different compliance standards.
30:34 And that the code name for that is Nova.
30:36 Hubble, you know, is named after the telescope.
30:38 The idea is where we are looking with our telescope into the black hole of our infrastructure,
30:42 right?
30:43 We hope it's not a black hole, but sometimes it feels that way.
30:45 But it can be vast, right?
30:46 I mean, like trying to maintain 40,000 servers is not a not an obvious thing to do.
30:52 It's crazy.
30:52 And again, you know, yes, most of the servers are fine, but it's the edges that are hard.
30:56 And those edges are the hardest thing for SREs to keep track of, right?
30:59 So we everything is space themed.
31:01 We may be moving away from that in the near future just because it's really hard to keep
31:04 track of.
31:05 Like if I say Nova, you're like, I don't know what that is.
31:07 But if I say audit, you can say, oh, I get an idea of what that is.
31:11 So Nova is the audit piece and that's all success fail stuff.
31:15 It's built on this.
31:16 You basically write these YAML profiles in kind of an ugly format.
31:20 We're going to hopefully improve that in the near future.
31:23 But but it works well.
31:24 Just returns, you know, you can see all the failures.
31:27 And then, you know, we then can ticket, you know, in JIRA, the teams that need to make changes
31:32 based on those failures.
31:33 And then we have Nebula, which is kind of our like data gathering segment of Hubble.
31:39 It primarily uses OS query, which is a really cool open source project by Facebook.
31:45 And it basically OS query allows you to query your system as if it was an SQL database.
31:51 So you can query everything.
31:53 Right.
31:53 I've seen that.
31:54 That is amazing.
31:55 Yeah, it's really cool.
31:56 You can query everything from like network connections to like installed packages to
32:01 all sorts of stuff.
32:02 It's crazy.
32:03 It's actively developed.
32:03 They fixed for a while.
32:05 Facebook had we had some drama with Facebook around there.
32:08 Oh, it was like a patents clause and like they're in their projects.
32:12 And it was it was a bigger deal for Angular's Angular that they do.
32:16 Or is it?
32:16 I don't remember what JS project it is.
32:19 It's react.
32:19 I think react.
32:20 Thank you.
32:21 And it was this big deal because it's like if you if Facebook ever ends in ends up in
32:25 litigation with your company, right, then we this is revoked and you can't use our
32:29 thing.
32:30 Right.
32:30 And so it's this big thing that like, you know, a bunch of companies were like,
32:33 we got to get react out of our infrastructure because of all this stuff like the lawyers
32:37 were like, no, it cannot be that.
32:38 And anyway, Facebook has fixed that since including with those queries.
32:41 So it's it's been good.
32:42 But yeah, it's a really cool product.
32:44 And we use that to just collect terabytes of data every day from our hosts, everything like
32:49 from the easy gimmies like established outbound connections to like all the packages installed
32:55 on the system to all like the Docker processes running to shell histories to like we just collect
33:00 all this stuff and shove it into Splunk.
33:02 And then our blue team can sift through it and try to try to find actionable insights.
33:05 That's not what makes us the money.
33:07 But that's where the blue team is super excited about the data, right?
33:10 That's the data that's useful to them.
33:12 Right.
33:12 And then we have Pulsar, which is our file integrity monitoring or FIM.
33:16 We use iNotify on Linux and NTFS journaling on Windows.
33:21 By the way, Hubble runs on Windows.
33:22 Yeah, I was going to ask you that.
33:23 So obviously it runs on Linux, but also runs on like Windows server.
33:27 Yeah, yeah.
33:27 Mac, macOS.
33:28 It could run on Mac.
33:30 There's nothing like there's nothing stopping it from running on Mac.
33:33 It runs great.
33:33 We don't have any audit profiles that are Mac specific in Hubble stack data yet.
33:38 And we don't have like packages and like all the P list stuff for services.
33:43 Yeah.
33:43 I mean, there's very few servers running macOS, but there are some, I think, like some of these
33:49 build systems.
33:50 Yeah.
33:50 Well, and theoretically, like some people, some people would like to use Hubble for like,
33:55 like laptops and that kind of stuff.
33:58 And it's really not like desktop management for like, I want to know what my machines
34:02 are up to.
34:03 And it could be useful that way.
34:04 It would take some work to make it like more user friendly.
34:06 Like you can't just like this is meant to be run on the CLI and doesn't have a UI at all.
34:12 You know what I mean?
34:12 Like it would take some work, but potentially, but I've tried to push back on that because
34:15 I don't think it's useful.
34:17 But anyway, so Pulsar.
34:20 Okay.
34:20 Yeah.
34:21 Keep going on the infrastructures.
34:22 Yeah.
34:22 Yeah.
34:22 Pulsar, FIM.
34:23 We just basically collect file changes on the host and you can have it.
34:28 Like send checksums of the new files.
34:29 And that way you can compare, you know, is this, does this match our whitelist of checksums
34:33 for the specific binary or whatever?
34:34 We don't have a good answer on that one for like changes that are supposed to happen.
34:40 Right.
34:40 There's no, I don't have a good way to turn off our monitoring during say a configuration
34:45 management push.
34:46 Right.
34:47 Like, obviously I could say, oh, if you lay down this file, then we'll say, oh, that
34:50 file's there.
34:50 Let's stop collecting for a minute.
34:51 But like then an attacker could lay down that file.
34:53 Right.
34:53 So there's like, we haven't figured out a secure way for like a salt master to tell Hubble to
34:58 stop collecting because Hubble doesn't accept incoming connections.
35:02 Right.
35:02 It only shoves data out and pulls data from, from GitHub.
35:05 So, or from S3 or from wherever else.
35:08 And so those are the big pieces.
35:10 The other code name is Quasar, which is, I don't know if it qualifies as being its own
35:15 section.
35:15 It's basically just, that's the returners in salt.
35:18 That's the salt stack terminology.
35:20 But basically that's the glue that allows us to send the data to a destination.
35:25 We support Splunk.
35:26 That's what we use in Adobe.
35:27 We also support elk and log stash.
35:29 So that's what most of our open source users are using, I think, because Splunk is very
35:33 expensive.
35:34 Yeah.
35:34 Yeah.
35:35 So that's, that's kind of the high level overview of the pieces of Hubble.
35:38 This portion of Talk Python to Me is brought to you by Rollbar.
35:44 Got a question for you.
35:45 Have you been outsourcing your bug discovery to your users?
35:48 Have you been making them send you bug reports?
35:50 You know, there's two problems with that.
35:52 You can't discover all the bugs this way.
35:54 And some users don't bother reporting bugs at all.
35:57 They just leave sometimes forever.
35:59 The best software teams practice proactive error monitoring.
36:02 They detect all the errors in their production apps and services in real time and debug important
36:07 errors in minutes or hours, sometimes before users even notice.
36:11 Teams from companies like Twilio, Instacart, and CircleCI use Rollbar to do this.
36:16 With Rollbar, you get a real-time feed of all the errors so you know exactly what's broken
36:21 in production.
36:22 And Rollbar automatically collects all the relevant data and metadata you need to debug the
36:27 errors so you don't have to sift through logs.
36:29 If you aren't using Rollbar yet, they have a special offer for you and it's really awesome.
36:33 Sign up and install Rollbar at talkpython.fm/Rollbar.
36:38 And Rollbar will send you a $100 gift card to use at the Open Collective, where you can donate
36:43 to any of the 900 plus projects listed under the Open Source Collective or to the Women Who
36:49 Code organization.
36:49 Get notified of errors in real time and make a difference in Open Source.
36:53 Visit talkpython.fm/Rollbar today.
36:56 All right.
36:59 So give me an example of maybe how once this data is collected, like what kinds of questions
37:06 would you ask?
37:07 So I'm not familiar with Splunk.
37:09 You said this is not an Open Source project.
37:11 It's like a paid commercial thing that's kind of pricey.
37:13 Yeah.
37:14 So Splunk is...
37:15 How do they...
37:16 What do they call themselves?
37:16 Describe themselves.
37:17 Search, analyze, and visualize the machine data generated by your organization.
37:22 Right?
37:22 So the idea behind Splunk is it's kind of the same story as like an elk stack, right?
37:27 You have the collection, you know, with Logstash.
37:30 You have the coordination with Elasticsearch that...
37:33 Sorry, not coordination.
37:34 Queryability with...
37:35 Yeah.
37:36 Correlations, I guess, is what I meant to say.
37:38 Yeah, yeah.
37:38 Okay.
37:39 With Logstash...
37:39 Or sorry, with Elasticsearch.
37:41 And then you have Kibana for like reports and stuff.
37:45 And Splunk does all of those things, right?
37:46 You can do these elaborate searches and correlations across the data.
37:52 And then you can turn that into, you know, alerts.
37:55 Or you can turn that into dashboards.
37:57 And they have, you know, machine learning aspects that try to help you do some of these security
38:02 things without having to hire your own machine learning team.
38:06 They're still young, but it's pretty expensive.
38:09 I mean, I think your first few gigabytes are free or whatever.
38:12 But then it's basically billed by the amount of data that you send to it.
38:17 And, you know, it can get out of hand pretty quickly.
38:20 So you save your company money by only upgrading your servers like once a year, things like that?
38:24 Yeah.
38:26 I mean, I don't deal with the Splunk stuff.
38:28 They just tell me I can send as much data as I want to them, basically.
38:31 Send it here and they go do what they do, right?
38:33 Right.
38:33 And so, yeah, we send terabytes of data to our Splunk clusters.
38:38 Like the gimmies.
38:39 I keep going back to this established outbound connections because it's such an easy thing
38:42 for people to see.
38:44 Like if something is calling home that shouldn't, right?
38:47 That's actually pretty easy to spot in general, right?
38:50 Yeah.
38:50 Obviously, on something like, what's a good example of something that reaches out?
38:54 Well, let's talk about, say, a web server, right?
38:56 So it's got like a micro whiskey or a G unicorn worker process.
39:00 But that worker process is probably talking to a database.
39:02 Maybe it's talking to like a Stripe credit card API or a S3 bucket or something like that.
39:10 Maybe that thing, right, would be reaching out.
39:12 Maybe you've got, obviously, all the upgrades, right?
39:16 If you're going to upgrade, just apt upgrade, you know, that sort of thing that's going to
39:20 be reaching out.
39:21 But those are the few, right?
39:23 Those are easy to like limit to a specific whitelist, right?
39:26 Like, you know, the server needs to reach out to the database servers and you know what all
39:30 the database servers are.
39:31 So you say, okay, in the search, I'm going to eliminate all of those.
39:34 We don't care about those.
39:34 You know, it's going to be reaching out to the package servers, right?
39:38 And theoretically, hopefully you're rehosting those package servers, right?
39:41 So that you know, so you can keep that list, you know, you don't have to whitelist every
39:45 mirror in the world, right?
39:46 Right.
39:46 Or maybe your cloud server is, right?
39:48 Like something like that.
39:50 And then, you know, all these other things.
39:52 But then past that, like the only thing that it accepts, like from any location is incoming
39:57 connections, right?
39:58 On 80 or 443 or whatever, right?
40:00 So if you can create a search such that it says, hey, we know about all these different
40:05 things on these servers that are good reaching out or okay reaching out.
40:09 But if we see something from a new process that isn't in our whitelist or from an existing
40:15 process, because somebody could say, oh, I'm a I'm Apache too, right?
40:18 Like, you know, if you see see something from an existing process that is reaching out to
40:23 somewhere we don't know about, then we send an alert to the security operations center and
40:26 they contact the team and say, hey, is this normal?
40:29 Like, is this something we should add to our whitelist?
40:31 Can we look into this?
40:31 You might find that you have a cryptocurrency miner on there or something, right?
40:36 Like it's, you know.
40:36 Sure.
40:37 So that's like the easy one.
40:38 Yeah.
40:39 So you start with, let's just see what connections are being made.
40:42 All right.
40:42 These look valid.
40:43 Hide them.
40:44 Don't, you know, don't don't worry about those.
40:46 And then you just slowly either go through till there's no more.
40:49 There's none in your list or you start to get worried and figure out what the abnormal ones
40:54 are.
40:54 Exactly.
40:55 And it's a hard problem, right?
40:57 Right.
40:57 Like I mentioned, we have like 30 security engineers.
41:00 And it's because every time that you put something on that whitelist, you've given yourself a
41:05 blind spot.
41:05 Right.
41:06 Yeah.
41:06 So like, you still like, you know, ideally, you know, if we would have like some crazy AI
41:12 that could go through and be like, hey, let's investigate this data.
41:15 And it could notice things that we couldn't, you know, and that's and, you know, that's something
41:20 that Adobe is working on.
41:21 And so many companies are working on, right?
41:22 Machine learning and AI is huge everywhere.
41:24 Right.
41:24 But yeah, it's it's it's a really difficult problem.
41:27 And one that I'm not super qualified to talk about because yeah, no worries.
41:31 Not what I do all day.
41:31 So what we've spoken about so far to me sounds like one, a configuration of the machine and
41:39 two, oh crap, something is bad here.
41:41 At that point, I feel like you're it's not necessarily too late, but you're way far down the road of
41:47 it's not good.
41:48 You know, if there's a process connecting to something else on your server and it really is
41:54 malicious, like that's a long ways down the path to not good.
41:56 So one of the things that might be really helpful is to just say, I'm using this version
42:02 of Django or that version of request.
42:04 And it turns out that there's a critical security problem found in it.
42:08 We had better upgrade that before someone else, you know, figures it out.
42:11 Right.
42:12 Does it support searching for that kind of stuff?
42:14 Yeah.
42:14 So we don't actually use it for that in Adobe.
42:17 We have a different tool that they've been using for many years.
42:20 And so we didn't see the need to change on that one yet.
42:22 But we do have support for it in Hubble.
42:24 It's not perfect.
42:25 We use a site called Vulners, Vulners.com.
42:28 Originally, we were we were downloading like their original API had the ability to download
42:33 the whole vulnerability list, basically.
42:37 And then you could audit it locally and look against the existing packages and see if you
42:42 had any issues and then report those.
42:43 And that was nice because it's a hard sell to send your list of installed packages to an
42:49 API.
42:50 Right.
42:51 Because now they know if you have any vulnerabilities.
42:53 Right.
42:54 I'm not saying that Vulners can't be trusted, but I would, you know, I would hesitate to use
42:58 them without without paying for them.
42:59 Right.
43:00 Like it just in general.
43:01 Yeah.
43:01 Just so that you can have somebody to blame.
43:03 You know what I mean?
43:05 So basically, with that original version, we were just hammering their servers and they
43:08 were like, please stop.
43:09 Please don't do this.
43:10 And I was like, OK.
43:12 And so like their new method is you actually get an API key from them and then you plug
43:17 that into our Vulner's module configuration and then it can basically it will securely send
43:24 your list of installed packages to them and they'll report if you have any vulnerabilities.
43:27 And that's not perfect.
43:28 Yeah.
43:29 How does it do that?
43:29 Does it look at the requirement like the pin versions and the requirements?
43:32 Does it look at your virtual environments or your system level one and see what's installed
43:37 or what's the process of even knowing what to send them?
43:40 So I don't think it does stuff like Python packages.
43:43 I think it's mostly dealing with system packages.
43:45 Right.
43:46 Like we know, you know, so they can keep a list of, hey, like bash or something.
43:50 Yeah.
43:50 CVE came out on.
43:53 Yeah.
43:53 And, you know, a lot of these things are repackaged.
43:55 Right.
43:55 Like you have you can install you can do yum install Python dash requests.
43:59 Right.
44:00 And you can get requests in its packaged Python version.
44:03 Right.
44:03 And so, you know, that kind of stuff would be reported.
44:06 Right.
44:06 If there's a big request vulnerability, then you can know that you're vulnerable and you need
44:10 and you need to upgrade.
44:11 But yeah, I don't think it does the Python stuff.
44:13 So so like this is something where we don't have a super great answer for because it's
44:18 arguably not a great thing to send all of your security data off to vulners who you maybe
44:22 don't necessarily trust.
44:23 Right.
44:23 Yeah.
44:23 And I haven't figured out a good way to do that because I don't want to be in the business
44:27 of trying to compile that list of vulnerabilities because that's why companies get paid big bucks
44:34 is because it would just consume all my time.
44:36 Right.
44:36 Like, you know, that's right.
44:37 Right.
44:38 Whole companies like vulners are just designed just to do that.
44:41 Right.
44:41 And they make money doing just that.
44:43 So I don't know where that's going to go in the future with Hubble.
44:46 But, you know, for now, you can do it with vulners, which is great.
44:49 You know, as long as you're OK with that compromise.
44:52 So sure.
44:53 That's interesting.
44:54 So one of the things I do for my servers and projects is I use a thing called pyup.io.
45:00 And basically, it looks at your requirements.txt, pins the versions.
45:06 And if there's new ones, it'll do a PR to automatically increment that.
45:09 And it'll let you know if there's a security vulnerability in them.
45:13 But that's only the Python side.
45:14 And you still have the problem if you're sending them your requirements file.
45:18 And GitHub has been doing great work recently.
45:20 Have you seen that?
45:21 Where they'll just like, they'll inspect your requirements file for you and say,
45:25 hey, there's a CVE on this.
45:26 You should upgrade this.
45:27 And it's like, whoa, thank you, GitHub.
45:28 Like, that's awesome.
45:29 It is really good.
45:30 And it's a huge like warning across when you log in and they update it weekly.
45:34 You're not going to miss it.
45:35 Yeah.
45:36 Yeah.
45:36 That used to would have been around for about a year, I think.
45:39 But it was only for JavaScript and Ruby.
45:41 I think it's doing it for Python now.
45:43 Yeah.
45:43 Now they added Python support, which is great.
45:45 Just like a few months ago.
45:46 So yeah, that's a really good one as well.
45:49 I guess maybe we could talk just a little bit about some of the pieces involved and some
45:54 of the building blocks.
45:55 So we talked about salt and we also talked about vulnerable.
45:58 So what else is involved here?
46:00 It's your Python code, those two projects.
46:02 And what else?
46:03 Like, how have you built it?
46:04 We actually use Docker to create our packages, which I recommend highly.
46:07 It's really great.
46:08 Like I can just basically do a Docker build and then a Docker run.
46:12 And I just have packages.
46:13 It's beautiful.
46:14 We have a security engineer who's all about like not trusting what's on the host, which
46:17 is smart, right?
46:18 Like theoretically, an attacker could compromise the tools that you're using to try to track
46:24 down the attacker, right?
46:25 So like in a security like Utopia, Hubble would have nothing that comes from the host.
46:30 It would have all its own stuff.
46:31 It would lock itself down immediately and like, you know, not be modifiable.
46:35 So we've worked to one of the easy ways to serve profiles to Hubble is via Git.
46:40 And we include, apparently I'm still going through puberty.
46:44 We include the Git libraries, you know, the Git Python libraries.
46:49 We can use Git Python or LibGit2.
46:51 We prefer LibGit2.
46:52 And so we actually ship that with Hubble.
46:53 But that also requires Git to be installed on the host.
46:57 And Git is a really easy requirement to source, right?
47:01 Like if you add a dependency for Git, it's almost certain that the host will be able to
47:06 serve that dependency, right?
47:08 Because that's a problem with like something like OS query, which is not necessarily in
47:11 all of the normal repos, right?
47:12 So like, that's harder to assume that it's going to be there.
47:15 But even with stuff like Git, we actually, we semi recently added code into our build process
47:21 to actually just compile Git from source.
47:23 So we don't, we don't even reply, rely on Git on the host, right?
47:27 We compile OS query from source, not because we need it to be newer than what they do, than
47:32 the stuff that they provide.
47:33 But just because we can add some different compile flags that are useful to our like little siloed,
47:39 you know, OS query over here.
47:40 So we compile a bunch of stuff from source as part of our build process.
47:44 But yeah, the big pieces are salt and its dependencies.
47:47 The reason we use salt SSH is because we don't need any of the transport stuff, which is where
47:52 all of the difficulties in sourcing come from.
47:55 Like all of a sudden, when you're sourcing zero MQ and these other things, you're talking about
48:00 C code and binaries, right?
48:01 Like it's not just pure Python and pure Python is easier.
48:04 Right.
48:04 And so we use all this as H because it doesn't have any of those dependencies.
48:08 Volners, we have their, we now package their Python package in with our stuff so you can hit
48:14 their API easily.
48:15 And we package OS query in there.
48:17 And yeah, those are really the biggest pieces that we add.
48:20 Yeah.
48:21 And that's cool.
48:21 It's pretty standalone then.
48:22 It doesn't have a huge list of dependencies.
48:24 You know, you open this whole section by saying, well, there's this trade-off like security
48:29 generally makes usability worse, but you kind of want it.
48:33 I know, but what's surprising is like in your description here, your security considerations,
48:38 like let's just package Git in there, even though it's a pain for us.
48:41 And let's package it in OS query and all that.
48:43 Like that actually makes it easier to deploy and maintain.
48:45 So it's not always true, right?
48:46 You're certainly right.
48:47 Yeah.
48:47 Like, but it does still have trade-offs.
48:49 Like let's, let's be clear because now our package is, you know, 50 megabytes instead
48:54 of five, right?
48:55 Like, so theoretically, if we could use the stuff that's already on the host, like that could
48:58 make our packages smaller and stuff like that.
49:00 But let's face it, it's 2018 and a hundred megabyte package is just not really a big deal
49:06 in general, like in most cases.
49:08 Yeah.
49:09 Yeah.
49:09 What is that?
49:10 Two, three seconds to download on a good connection.
49:11 Yeah.
49:12 It's just, I mean, yeah.
49:13 And, and, you know, usually you're going to rehost these in your data center, you know,
49:17 cause you usually will have mirrors of the repos or whatever.
49:20 And it's just, it's just not a big deal.
49:21 Yeah.
49:21 So it sounds like such a cool project.
49:23 People might be interested in participating, maybe bringing it over into their organization
49:28 or just working on it in general.
49:30 Are you looking for contributors?
49:32 And if so, how, what things like that?
49:35 So we definitely are.
49:36 Our repo is, I'll make sure it's in the show notes, but it's all open source.
49:41 It's all Apache too.
49:42 You can use it.
49:43 You don't have to worry about licensing issues.
49:45 It's not GPL or anything like that.
49:46 And we would love contributors.
49:48 I have been so swamped with like actually doing this at Adobe that I have, I have not been able
49:54 to spend as much time community building as I wish.
49:57 Right.
49:57 I've given a few talks and been on, you know, FOSS weekly earlier this year and now this podcast,
50:03 but like, I'm not great at, you know, doing the social media stuff, the aspect and all those
50:07 different things.
50:08 So I apologize if it doesn't look like stuff's going on, but you can see in the PRs that we
50:11 are actively developing and we would love people to report issues.
50:14 We'd love people to add pull requests and it's all Python.
50:17 It's a little bit salty.
50:19 Like it's a little, there's some salt stuff, salt isms in there, which might put some people
50:23 off.
50:23 But like when you get actually down to like the modules that do the work, it's pretty much
50:28 just straight Python.
50:29 So it can be pretty fun to write.
50:30 And we try to keep some of our issues labeled with like the help wanted good first issues
50:35 labels.
50:35 So that's a good place to start.
50:37 We have a Slack workspace, which you can get to from our, from our homepage, which will also
50:43 be in the show notes.
50:44 And we'd love to see it.
50:46 Like the Slack workspace is the best place.
50:48 How about writing tutorials or something like that?
50:51 Yeah, sure.
50:51 Like documentation.
50:53 I recently did a big overhaul of our documentation, which was way out of date.
50:56 And so it's all hosted on read the docs now.
50:59 And, but obviously that doesn't update itself.
51:01 You know, some of it does if you, if you base it off of the doc strings, but like we'd love
51:05 work there.
51:06 You know, if you, if you get into Hubble and run into issues and you're like, this was
51:09 not clear in the documentation, please submit a PR and let's fix the documentation or at least
51:14 an issue.
51:14 Like I don't even care if you don't want to like submit a PR that's, that takes time if you're
51:17 not set up for it.
51:18 But if you just file an issue and say, Hey, this could be better in this way.
51:21 Like I'll love you forever.
51:23 So yeah, we'd love to have people contribute.
51:26 We'd love to grow the community and make this just like a really good tool.
51:29 And yeah, I'm looking forward to your feedback.
51:32 Sounds good.
51:33 So when I was researching this project, I stumbled upon the Adobe security blog and it looks like
51:39 you guys are actually doing a lot of interesting stuff out in public.
51:43 Do you want to maybe just talk a little bit about that?
51:45 Cause it seems way more open and sort of put together for public consumption than I would
51:51 have expected.
51:52 Yeah.
51:52 Adobe is, is huge.
51:55 Right.
51:55 And, and all of a sudden it was so funny cause we did all this Hubble stuff and we're like
51:58 deploying it everywhere at Adobe.
52:00 And then my, my coworker, Krister, who came up with the idea for Hubble, he's like, Hey,
52:04 the security at Adobe guys want to sponsor, have Hubble stack sponsor scale over in California,
52:10 Southern California Linux expo.
52:13 And they're going to sponsor a booth and give us some materials to take down and stuff like
52:17 that.
52:17 And I was like, we, that's the thing.
52:19 Like we have like a, like a marketing team dedicated to security at Adobe.
52:23 And he's like, yeah, dude.
52:24 And I'm like, okay.
52:24 So like, they've been great.
52:26 That's pretty cool.
52:27 They blogged about Hubble a couple of times and they kind of ping us on their social media
52:30 and stuff like that.
52:31 Open source at Adobe.
52:32 You had a known here about that.
52:34 That was another thing that I was super surprised about because like, I don't think of Adobe as
52:38 an open source, as an open source company.
52:40 Right.
52:40 And it's really been in the last few years that it's really grown.
52:43 We have like an open source committee inside that both goes for like open source into the
52:48 like open, open source to the world, but then also encourages like open source inside of
52:52 Adobe.
52:53 Right.
52:53 Which, you know, sometimes teams make their stuff.
52:56 Like we have our own enterprise GitHub instance and, you know, you might be tempted to make your
53:02 stuff private, even though it's in Adobe only.
53:04 And they're like working, no, make it public.
53:06 Like let's, let's share, let's, you know, collaborate inside of Adobe.
53:09 And, and it's a really cool effort.
53:11 And in fact, we're having our Adobe open source summit internally tomorrow, which will be
53:16 really fun.
53:16 And they like have been helping teams through the legal stuff, you know, cause you have to
53:20 get legal approval to, you know, open source project, make sure that Adobe doesn't want
53:23 to actually, you know, compete in that space, that kind of stuff.
53:26 But anyway, so Adobe does have a, an org that's pretty substantial these days on GitHub.
53:32 So it's github.com/Adobe.
53:34 And you can check out all of our open source stuff there.
53:36 Hubble stack is not on that org because we have multiple repos.
53:39 So it doesn't gel well with having a giant, you know, being part of a giant org.
53:43 If it was one repo, it would be there, but we have our own org.
53:46 So yeah, cool.
53:47 Yeah.
53:48 Yeah.
53:48 I was surprised to see that.
53:49 And it's, it's a good sign.
53:50 So let's close this out with sort of a high level conversation.
53:53 You know, you've been building all the software to defend the servers and sound the alarms when
53:59 needed.
53:59 Are you hopeful about security, cybersecurity stuff in 2018?
54:04 Things get better.
54:05 Are they getting worse?
54:06 What do you think?
54:07 I think they're getting better.
54:08 I mean, it's so hard with all the legacy stuff, right?
54:11 Like that's, that's really where the problems come in is when you have this product that's
54:15 been around for 10 years and rewriting it from scratch is super expensive.
54:19 And that's a hard sell until you actually get owned.
54:22 Right.
54:23 And then it's like, okay, well, this is, this is going away in the year.
54:25 Right.
54:25 And so, you know, I think it's still okay.
54:29 The thing that I probably worry about more is if quantum computing, you know, breaks SSL
54:34 or something like that.
54:35 Like, I think that's probably the bigger risk to like the overall internet.
54:39 Like, cause if you don't have SSL, like, I don't, I don't know what you do at that point.
54:43 Yeah.
54:43 Economies come to a halt.
54:44 I just try not to think about it too much.
54:46 Yeah.
54:47 That's like thinking about the 9.7 magnitude earthquake.
54:51 It's like, well, there's nothing we can do about it.
54:53 So let's just hope that doesn't happen.
54:54 Yeah.
54:55 I feel like I'm not super pessimistic about quantum computing breaking the internet because
55:01 I feel like when that breakthrough happens, it's, there's going to be like one or two
55:05 quantum computers in the world.
55:07 And there's going to be a long time before that becomes something in the hands of standard
55:12 people.
55:12 And maybe by the time it's something in the hands of standard people, we'll have something
55:16 that is, we'll come up with another math problem that will solve that for us.
55:20 You know what I mean?
55:20 Like, you know.
55:21 Yeah, exactly.
55:21 Like TLS three, that's quantum safe, you know, encryption or something like this.
55:25 But yeah, that is, that is a little, a little concerning, but yeah, it's, I'm generally
55:30 hopeful for it, but I don't know.
55:32 I think I would say I'm generally hopeful as well.
55:33 That's just the one thing that like niggles at the back.
55:35 of my mind.
55:35 I'm like, Oh, that would be really bad.
55:37 This is like, this is like apocalypse, you know, fiction level stuff right here.
55:41 But hopefully, yeah, it is.
55:43 I mean, we've had, we've had plenty of those kinds of problems that we've, I don't know
55:48 if we've had problems like that.
55:49 We've been able to weather a lot of things already.
55:50 So I'm not, I'm not super worried about it.
55:52 And I think, I think we'll always have something that we can, that we can add on top and, and
55:56 weather the next storm.
55:57 So it would definitely send people scrambling and it would make the front page of a lot of news.
56:02 Yeah.
56:02 If it happened though.
56:03 Yeah.
56:04 Cool.
56:04 All right.
56:04 Well, I guess we'll leave it there for now.
56:06 We could talk about this last part for a whole nother episode.
56:08 I'm sure.
56:09 So before, before we move on though, let me ask you the final two questions.
56:12 When you write Python code, what editor do you use?
56:15 I use Vim.
56:15 I keep trying to, if I were going to switch to an IDE, it would be PyCharm.
56:20 They have the best Vim integration I've ever, or Vim emulation that I've ever used.
56:24 And they have a lot of other cool tools, but I hate not having it in my terminal.
56:27 So I stick to Vim.
56:29 Yeah.
56:29 I think a lot of people like the, it's just always around everywhere.
56:33 So I'm going to use that, but I hear you about PyCharm.
56:35 Okay.
56:36 And then notable PyPI packages.
56:38 Obviously you could install Hubble from.
56:42 Yeah.
56:43 PyPI, but you're probably better off getting one of the prebuilt binaries, right?
56:46 I definitely recommend using the prebuilt binaries because that way you get OS query, you get all
56:50 these different things.
56:50 I have been lax and like, there's no actual reason that Hubble shouldn't be on PyPI and
56:56 I just haven't done it yet.
56:57 So look for it in the near future, hopefully.
57:00 Cause I just, I just need to do it because it's not that hard, but yeah, technically you
57:04 can source install Hubble without any difficulty.
57:07 So yeah, cool.
57:08 And so another one besides Hubble, I don't have any others off the top of my head and I love
57:12 requests.
57:13 Like I know the whole world loves requests, but it's great.
57:16 Well, Hubble is my, is my main one right now.
57:17 I worked for a long time on SaltStack, which is a pretty good product.
57:20 I don't love some of the directions that the company's going, that backs it, but it is
57:24 open source Apache too.
57:26 So, and it has a great community.
57:27 So, check out SaltStack as well.
57:28 Right on.
57:29 All right.
57:30 So final call action, maybe there's a bunch of companies or organizations out there that
57:34 are nowhere near as organized as you guys are at Adobe.
57:38 They maybe want to start using Hubble.
57:40 They want to start putting some of these practices in place.
57:42 What do you tell them?
57:43 It's never too early to start thinking about security.
57:45 That's, I guess, I guess is where I'd go.
57:47 Like it's expensive.
57:48 I know like to actually start looking at this stuff, you have to like hire security engineers
57:52 and stuff.
57:53 And you know, how do you have the money to do that?
57:55 But if you lose a whole bunch of customer data, it's going to be way more expensive.
57:58 So yeah, think about it early use, you know, the tools are so good these days, you know,
58:03 Hubble stack included.
58:03 I think Hubble is a pretty great tool and it's pretty easy to deploy.
58:06 So there are things you can do from the beginning just to do a baseline thing that will just at
58:12 least give you something.
58:13 Right.
58:13 And so just start.
58:15 That's all.
58:15 Sounds good.
58:16 All right, Colton.
58:17 Thanks for being on the podcast.
58:18 Yep.
58:18 Thanks so much.
58:19 Yep.
58:19 Bye.
58:20 This has been another episode of Talk Python to Me.
58:23 Our guest on this episode was Colton Myers, and it's been brought to you by Linode and Rollbar.
58:27 Linode is bulletproof hosting for whatever you're building with Python.
58:31 Get four months free at talkpython.fm/Linode.
58:36 That's L-I-N-O-D-E.
58:37 Rollbar takes the pain out of errors.
58:40 They give you the context and insight you need to quickly locate and fix errors that might have
58:45 gone unnoticed until your users complained, of course.
58:48 Track a ridiculous number of errors for free as Talk Python to Me listeners at talkpython.fm
58:53 slash Rollbar.
58:54 Want to level up your Python?
58:57 If you're just getting started, try my Python jumpstart by building 10 apps or our brand new
59:02 100 days of code in Python.
59:04 And if you're interested in more than one course, be sure to check out the Everything Bundle.
59:08 It's like a subscription that never expires.
59:10 Be sure to subscribe to the show.
59:12 Open your favorite podcatcher and search for Python.
59:14 We should be right at the top.
59:16 You can also find the iTunes feed at /itunes, Google Play feed at /play, and
59:21 direct RSS feed at /rss on talkpython.fm.
59:25 This is your host, Michael Kennedy.
59:27 Thanks so much for listening.
59:28 I really appreciate it.
59:29 Now get out there and write some Python code.
59:35 I'll see you next time.
59:51 Thank you.
59:52 Thank you.