New course: Agentic AI for Python Devs

Secure code lessons from Have I Been Pwned

Episode #136, published Thu, Nov 2, 2017, recorded Sat, Oct 28, 2017
Do run any code that listens on an open port on the internet? This could be a website, a RESTful web service, or (gasp) even a database endpoint.

Troy Hunt, a renowned security expert likes to say that you're doing "free pen. testing for that product right there".

Join Troy and me on this episode of Talk Python To Me. We discuss lessons learned from running the vulnerability monitoring website Have I been pwned? As well as other lessons for developers to keep your code safe while providing public services.

Episode Deep Dive

Guest Introduction and Background

Troy Hunt is a renowned web security expert and the founder of Have I Been Pwned. He spent many years as a developer before moving into security education and advocacy. Troy runs popular developer workshops focusing on ethical hacking and secure coding practices. He also blogs frequently about data breaches and helps organizations respond to and learn from them.

What to Know If You're New to Python

If this episode is your first deep dive into Python-based web development or security, here are a few quick pointers and resources:

  • Understanding HTTP and basic web concepts (URLs, ports, HTTP vs HTTPS) is foundational.
  • Knowing how databases connect to your Python code (ORM vs direct SQL) helps you avoid common pitfalls like SQL injection.
  • Comprehending virtual environments and package management (pip, requirements.txt) will make it easier to install and test security-related packages.
  • Familiarize yourself with security-centric tooling in Python, such as requests, security linters, or frameworks that enforce best practices around input validation and encryption.

Key Points and Takeaways

  1. Have I Been Pwned and Security Awareness Troy Hunt’s motivation for creating Have I Been Pwned (HIBP) grew from seeing massive data breaches and wanting to offer a free, user-friendly service. HIBP has expanded to over 4 billion compromised accounts, drawing attention to the scope of breaches and encouraging password hygiene. Troy highlights the reality that running any public-facing code basically provides “free penetration testing” to attackers if not secured properly.
  2. SQL Injection Remains a Top Threat Despite SQL injection being one of the oldest vulnerabilities, it still ranks #1 on the OWASP Top 10. Troy explains how careless string concatenation and improper query parameterization allow attackers to exfiltrate or delete data. He strongly advocates for parameterized queries or ORMs to mitigate these attacks.
  3. Importance of Patching and Updates A major theme throughout the conversation is the stark reminder of high-profile attacks (e.g., WannaCry, NotPetya) that exploited unpatched systems. In many of these cases, patches had been available for weeks or even months. Large organizations, especially those with older infrastructure, often struggle with the friction of updates, but ignoring them can lead to massive breaches and ransomware devastation.
  4. Bug Bounties and Responsible Disclosure Bug bounties offer security researchers and ethical hackers a legitimate avenue to report vulnerabilities, often with financial rewards. Troy notes that big organizations like Tesla and even the Pentagon run formal bug bounty programs, encouraging researchers to disclose issues responsibly rather than selling them on the black market.
  5. Ashley Madison Breach and Sensitive Data The Ashley Madison incident stands out because it exposed personal and highly sensitive details of millions of users. Troy discussed the additional complexities in verifying sensitive breaches such as this and how HIBP implemented a “sensitive breach” category that requires email verification rather than open searching.
  6. Cloudflare and Securing at the Edge Troy’s site, Have I Been Pwned, uses Cloudflare for multiple benefits: DDoS protection, caching static content, and programmatically filtering out bad traffic. By placing a reverse proxy in front of his origin servers, Troy can reduce cost, offload SSL, and block suspicious patterns before they reach the main infrastructure.
  7. The Equifax Breach and Struts Vulnerability Although discussed briefly, Equifax’s massive data breach illustrates how failing to update a vulnerable library (Apache Struts) can lead to catastrophic consequences. The friction of testing, recompiling, and upgrading can stall patching, and this delay can expose organizations to large-scale exploits. This example underscores the importance of proactive upgrades and thorough QA processes.
  8. Nissan Leaf Example: IoT Gone Wrong Troy shared a story of the Nissan Leaf app allowing remote access by merely knowing the car’s VIN number, a value often visible through the windshield. This highlights the frequent lack of security design in IoT solutions and how critical it is to validate all input and secure APIs, especially if the API key is printed on the device itself.
  9. IoT and Default Insecure Configurations The conversation frequently points to IoT as a hotbed for vulnerabilities. Many products hit the market with inadequate security, rely on default credentials, or store data in unprotected ways (e.g., children’s info, usage stats). Troy likens IoT disclaimers to putting warnings on cigarettes: if users fully understood the risk, they might think twice before plugging everything into the internet.
  10. Using Password Managers Reusing passwords across multiple sites remains a common security mistake. Troy and many security professionals advocate password managers like 1Password or LastPass, generating long random passwords to minimize fallout from breaches. Keeping distinct and lengthy credentials for every service drastically lowers the risk when data is inevitably leaked.

Interesting Quotes and Stories

"If you find a vulnerability in software, you’re doing free penetration testing for that product right there." -- (Troy Hunt’s principle on running code online)

"SQL injection is still number one in the OWASP Top 10, even though we've known about it for decades." -- (Troy emphasizing how old vulnerabilities persist)

"When you put a service like Cloudflare in front, suddenly you can start to programmatically exclude nasty stuff, free up resources, and significantly reduce cost." -- (On leveraging a powerful reverse proxy)

Key Definitions and Terms

  • SQL Injection: An attack where malicious SQL statements are inserted into an entry field, compromising databases.
  • OWASP Top 10: A standard awareness document for developers and security professionals focusing on the top ten most critical web application security risks.
  • Bug Bounty: A program where companies pay security researchers for responsibly disclosing vulnerabilities instead of exploiting or selling them.
  • DDoS (Distributed Denial of Service): An attack aiming to overwhelm a target with a flood of requests from many distributed machines.
  • IoT (Internet of Things): Everyday devices connected to the internet, often lacking robust security features.

Learning Resources

  • Python for Absolute Beginners: A comprehensive introduction to Python. Perfect for anyone just getting started and who wants a solid foundation in the language.
  • Getting started with pytest: If you’re looking to boost your application security practices, thorough testing is key. This course shows you how to effectively create and run tests in Python.
  • Rock Solid Python with Python Typing: Type hints can help make your code more robust and maintainable, which in turn can improve security audits and reviews.
  • Modern APIs with FastAPI and Python: Learn to build high-performance and secure Python APIs using FastAPI. This can be critical for ensuring security best practices around user input and data handling.

Overall Takeaway

Security is not optional in today’s connected world; it’s a continuous process that needs constant attention, especially if your code runs on the open internet. From ensuring we patch libraries promptly to adopting strong password policies and robust architecture, each layer contributes to a safer online ecosystem. As Troy Hunt’s example with Have I Been Pwned shows, understanding and proactively addressing real-world breaches can lead to better security outcomes for everyone. By staying informed, automating defenses where possible, and embracing best practices like parameterized queries, HTTPS, and regular updates, developers can significantly reduce their exposure to modern threats.

Troy Hunt: troyhunt.com
Troy on Twitter: @troyhunt
Have I been pwned?: haveibeenpwned.com
Disqus Demonstrates How to Do Breach Disclosure Right: troyhunt.com/disqus-demonstrates-how-to-do-data-breach-disclosure-right
Everything you need to know about the WannaCry / Wcry / WannaCrypt ransomware: troyhunt.com/everything-you-need-to-know-about-the-wannacrypt-ransomware
What Would It Look Like If We Put Warnings on IoT Devices Like We Do Cigarette Packets?: troyhunt.com/what-would-it-look-like-if-we-put-warnings-on-iot-devices-like-we-do-cigarette-packets
Careers in security, ethical hacking and advice on where to get started: troyhunt.com/careers-in-security-ethical-hacking-and-advice-on-where-to-get-started

Some of Troy's Courses
What Every Developer Must Know About HTTPS: troyhunt.com/new-pluralsight-course-what-every-developer-must-know-about-https
Web Security and the OWASP Top 10: The Big Picture: troyhunt.com/new-pluralsight-course-web-security-and
Crafting a Brand for Growth and Prosperity: troyhunt.com/new-pluralsight-course-crafting-a-brand-for-growth-and-prosperity
Exploring the Internet of Vulnerabilities: troyhunt.com/new-pluralsight-course-exploring-the-internet-of-vulnerabilities-2
Deconstructing the Hack: troyhunt.com/new-pluralsight-course-deconstructing-the-hack
Getting to grips with cloud computing security: troyhunt.com/getting-to-grips-with-cloud-computing-security-on-pluralsight

Little Bobby Table (SQL Injection Cartoon): xkcd.com/327

Episode #136 deep-dive: talkpython.fm/136
Episode transcripts: talkpython.fm

---== Don't be a stranger ==---
YouTube: youtube.com/@talkpython

Bluesky: @talkpython.fm
Mastodon: @talkpython@fosstodon.org
X.com: @talkpython

Michael on Bluesky: @mkennedy.codes
Michael on Mastodon: @mkennedy@fosstodon.org
Michael on X.com: @mkennedy

Episode Transcript

Collapse transcript

00:00 Do you run any code that listens on an open port on the internet?

00:03 This could be a website, a RESTful web service, or GASP even, a database endpoint.

00:08 Troy Hunt, a renowned security expert, likes to say that you're doing free penetration testing for that product right there.

00:14 Join Troy and me on this episode of Talk Python To Me.

00:17 We discuss lessons learned from running the vulnerability monitoring website Have I Been Pwned,

00:22 as well as other lessons for developers to keep your code safe while providing public services.

00:26 This is episode 136, recorded October 26, 2017.

00:32 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

00:52 This is your host, Michael Kennedy.

00:55 Follow me on Twitter, where I'm @mkennedy.

00:56 Keep up with the show and listen to past episodes at talkpython.fm, and follow the show on Twitter via at Talk Python.

01:03 This episode has been sponsored by Rollbar and GoCD.

01:07 Thank them both for supporting the podcast by checking out what they're offering during their segments.

01:13 Troy, welcome to Talk Python.

01:15 Hey, thank you very much for having me.

01:16 Yeah, it's really great to have you on as a guest.

01:19 I've respected the work you've done in the security space immensely, and I'm looking forward to sharing what you've learned about software security and developers with the audience.

01:28 Cool. Awesome. Let's do it.

01:29 Yeah, let's do it.

01:30 Now, before we get into all the details, let's start with how you got into programming in the first place.

01:34 What's your story?

01:34 That's a good question.

01:36 So I was a little bit anti-computer when I was a kid, probably up until about teenage years.

01:43 And I was actually very frustrated by my friends who'd be inside on the computer.

01:46 I'm like, come on, man.

01:47 I want to go outside and play football or something like that.

01:50 What are you nerds doing?

01:52 And then I guess I got into it by, I moved overseas when I was a kid.

01:57 So this was when I was almost 14.

01:59 My family moved to the Netherlands.

02:01 And it's kind of cold over there.

02:04 And like you get into it.

02:05 A lot of dark evenings.

02:06 The sun goes down to what, like 2.30?

02:08 Yeah, well, this is a little later, man.

02:10 No, just kidding.

02:10 Yeah, yeah.

02:11 But it gets, it's pretty dark in the winter there, right?

02:13 It is.

02:14 I mean, for folks in the UK, it's basically like the UK.

02:17 But, well, I won't say something derogatory about the UK.

02:20 You can read my Twitter feed for that.

02:21 No, we love them, honestly.

02:22 So anyway, there's a lot of time spent indoors.

02:25 And I just started getting involved.

02:26 I think I must have started doing a bit of basic back then.

02:29 So this would have been sort of early 90s as well.

02:31 And I think really for me, though, so I was doing a bunch of sort of PC-related stuff in

02:37 my teenage years and doing part-time jobs at PC repair stores and that sort of thing.

02:41 But the thing that really hooked me was seeing the web.

02:44 So I started using the web in 95 when I started uni.

02:48 And it was just like, wow, this is awesome.

02:50 This is amazing.

02:51 I want to build stuff for this.

02:52 And really, that was the start of modern day life as I know it in terms of what I do.

02:56 That's awesome.

02:57 Yeah, I had the same experience.

02:59 I was a sophomore or junior at university when the Mozilla?

03:04 What was the first before Netscape?

03:06 Before Netscape?

03:07 Yeah.

03:07 Something very real.

03:08 Mosaic.

03:08 Oh, Mosaic.

03:10 Yeah.

03:10 Yes.

03:11 When Mosaic came out.

03:12 And I was just like, oh, my gosh.

03:14 Like, the entire world opened up.

03:17 It was such an amazing time.

03:18 And it seemed like it's funny because technology was so much more limited.

03:22 And what you can do on the web now is way more impressive.

03:24 But the world seemed so open back then, right?

03:27 Yeah, yeah.

03:28 I don't want to reminisce in so far as, oh, so awesome, McMahon.

03:31 Because frankly, it's so awesome now.

03:32 And it's really a lot more awesome by any reasonable measure.

03:35 Yeah, for sure.

03:36 But it was just an exciting time where we're sort of seeing stuff that didn't even resemble

03:41 the things that we'd done before.

03:42 Yeah, I had the same feel.

03:43 And that was awesome.

03:44 So now, what do you do day to day?

03:47 More stuff on the web?

03:48 More stuff with security?

03:49 Yeah.

03:50 So look, it's a bit of both.

03:52 It's funny.

03:52 In fact, I've been thinking the last couple of days because I've had, I don't want to say

03:55 downtime, but I haven't been rushing somewhere.

03:57 And I was sort of going, what am I going to do today?

04:00 Like, I'm entirely independent, so I can do pretty much whatever I want.

04:05 And that consists of sort of several main things.

04:08 So I still do a lot of traveling and speaking at conferences.

04:11 I'm trying to do less of the traveling just because of how much I've been away.

04:15 But I'm still doing a lot of that.

04:17 I'm still doing a lot of workshops.

04:19 I was down interstate last week doing a workshop.

04:22 I think I've probably done this same workshop.

04:24 I must have done it 25 times this year.

04:26 25 two-day events.

04:28 The workshop is Hack Yourself First.

04:31 So I usually go into, say last week it was into a large financial institution and then

04:37 another online retailer.

04:38 And I'll go in there for two days and I'll sit down with usually developers, but also regularly

04:44 project management or managers, QA people, DBA, security folks, and go, okay, let's spend

04:51 two days figuring out the mechanics of things like SQL injection and cross-site scripting.

04:56 Here's how they work.

04:57 Here's what you do to be resilient to them.

04:58 And it's very, very sort of hands-on, active sort of stuff.

05:02 And it's just like, it's a huge amount of fun for everyone because we get to go in and break

05:06 stuff in ways that they've never been able to before.

05:08 And everyone leaves with this sort of new appreciation of the mechanics of how these

05:12 attacks work.

05:14 And they end up building better software for it, which is a good win.

05:17 Yeah, it's definitely a great win.

05:18 And I think sometimes you just have to see these things in action to really appreciate how your

05:24 security and the code you're writing is not actually secure.

05:27 I mean, even something as sort of old school as SQL injection attacks, like when you first

05:32 start writing SQL strings, you're like, well, I need to put the variable here.

05:35 So plus, plus quote thing, you know, how could that be wrong?

05:38 Right?

05:38 It just works.

05:39 But I think totally seeing it go, oh my gosh, do we do that?

05:43 Because if we do that, I just see how horrible this is going to be.

05:45 That's it.

05:46 It's the hands-on, you know.

05:47 It's like the light bulb moments when people do that.

05:50 And SQL injection is a great example because, you know, like you said, it's something that's

05:54 been around for a long time.

05:55 And we sort of think of this as an old thing.

05:57 It's still number one in the OWASP top 10 most critical web app security risks.

06:01 You know, it's still up there, even in their 2017 revised edition, it's still number one.

06:06 And when I do the workshop and I demonstrate SQL injection, I show things like there's a blog post

06:12 I refer to written by a guy who's trying to teach people how to do password resets in

06:16 ASP.net.

06:17 And literally within the one screen, he's got good, resilient, parameterized SQL statements.

06:23 And we're looking at this going, is this okay?

06:25 Yeah, yeah, it's okay.

06:26 Because you've got parameterization.

06:27 And then like the next line, there's an update statement.

06:30 And it's just like inline concatenated SQL with untrusted data that you can totally own.

06:35 And then he connects to the database with a privileged account.

06:37 And it's like, yep, good night.

06:39 Like the whole thing's over.

06:39 And there's still material, new material out there teaching people that.

06:43 And here's another fun one.

06:45 You can go and do, and be careful with this, folks, if you go and do this.

06:48 You can go to Google and do a Google Doc search.

06:51 So one of these sort of searches that turn up things that aren't really meant to be there.

06:55 And you can do a search for inurl.php?id equals.

07:00 And I think about the first half a dozen results have got an integer somewhere in a query string.

07:05 You load the page up.

07:07 You put an apostrophe on the end of it.

07:08 And bang, there's an internal SQL exception.

07:10 So very, very high likelihood of having a SQL injection risk.

07:14 And this sort of stuff is just absolutely rampant.

07:17 Yeah, that's really scary.

07:18 Why do you think it still is a problem?

07:21 I mean, it's 2017.

07:23 We've got ORMs and ODMs that generally protect us against these things.

07:28 We've got an XKCD, Little Bobby Table.

07:31 I know.

07:32 If that doesn't teach people the lesson, I don't know what will.

07:34 We've had it for years.

07:35 I got him on a T-shirt, actually.

07:37 I used to wear it at conferences.

07:38 People love it.

07:39 Oh, that's wonderful.

07:39 I'll have to put a link to that in the show notes.

07:41 So why does this all happen?

07:44 I think there are multiple factors.

07:46 So, you know, I just gave one example.

07:48 People are still creating training material that's vulnerable.

07:51 And in fact, one of the reasons why I show this blog post is that I actually left the

07:55 guy a really nice comment.

07:56 And I literally said, here's some friendly feedback.

07:59 And being very constructive, I gave him this feedback.

08:02 It's just like the comment sitting there.

08:03 The guy's never replied.

08:05 And then after that, there's been a whole bunch of people that have chimed in and said,

08:08 thank you.

08:09 This is very useful.

08:10 And I'm going, didn't you just read like the big, long, friendly comment from the Australian

08:14 guy saying like, this is really just full of holes.

08:17 Don't do this.

08:17 So we see this propagate over and over and over again.

08:22 And ultimately, so many of these risks just fundamentally boil down to the people building the software

08:28 and not being familiar with how something like SQL injection works.

08:31 And that is just purely a competency issue.

08:33 Yeah.

08:33 I guess that part of it is like you kind of hinted at the copy and paste stack overflow

08:37 type of thing.

08:38 Although I suspect it would take a pretty good beating on stack overflow.

08:41 But you know, this copy and paste sort of thing.

08:44 And we have so, our technology and languages and libraries we use changes so quickly that

08:50 people are, I think a lot of people are just scrambling to keep up to make things work,

08:54 much less be secure.

08:55 So that the copy and paste thing is funny.

08:57 In January last year, I was running the workshop we just spoke about in Norway.

09:02 And part of this workshop, there's a module on looking at mobile APIs.

09:06 And one of the guys in my workshop says, all right, look, I want to look at the way the

09:12 app for my Nissan Leaf, or in the US you'd say Nissan, the car.

09:17 So the way my car.

09:19 Yeah.

09:20 So, okay.

09:21 How does my app talk to my car?

09:23 Because he could control features of the car from the app.

09:27 Now, this is not something we have a problem with in Australia, but apparently in some parts

09:31 of the world, it is so cold, you've got to turn your car on before you get in the car,

09:35 you know, like turn the heating on.

09:36 And this is one of the things I have to do in Norway because it's so freaking cold there.

09:39 So he's like, all right, pulls out his app and he's figuring it out.

09:42 And what he discovers is that the only thing that the app needs to know about the car is

09:48 the car's VIN number.

09:49 Now, for folks who may not be familiar with what a VIN number is, first of all, this was

09:53 being used like an API key.

09:55 It was a secret.

09:55 Second of all, it was printed in the windscreen of every car.

09:58 So you could literally walk past a car and it had its API key in the windscreen.

10:03 And it was worse than that too, because they're innumerable.

10:05 So you can just take the, in this case, I think we could take about the last five digits and

10:10 just keep randomizing numbers and finding different cars.

10:13 And based on what he found in the space of literally sort of single digit minutes, we

10:18 discovered that you could control our climate control features of the car.

10:21 You could pull back trip history, battery status, all sorts of things.

10:24 It was just crazy.

10:24 Shouldn't have happened.

10:25 That's crazy.

10:25 And if there's any vulnerabilities, you can, you're like, you can own the car basically,

10:29 right?

10:29 Because you're talking to it.

10:30 Yeah, God knows.

10:32 You know what I mean?

10:32 And when we say you're talking to the car, let's be clear.

10:35 You're talking to, you're basically talking to a web server and not a web server in a car.

10:39 We can come back to the things people put web servers in later.

10:41 You're talking to a web server on the web running an API.

10:44 And then that goes back end over some proprietary GSM network or something to the vehicle itself.

10:49 So there is a proxy in between.

10:51 Yeah, that's not too bad then, I guess.

10:53 Well, so here's where it kind of went worse.

10:55 I disclosed it to Nissan and we had chats on the phone and, you know, they were like, yeah,

11:00 we probably should fix this.

11:01 Yes, you probably should fix this.

11:02 And then time goes by and they're not fixing it.

11:05 And eventually we get to like a month after disclosure and they've stopped replying to messages

11:10 and nothing's happening.

11:11 So I write about it and then suddenly they decide it's important.

11:14 So they take the whole thing offline.

11:16 And it's offline for about six weeks and eventually it comes back online and they've got a new app and everything.

11:21 The funny thing about the app, and this is the relevancy to Stack Overflow and code reuse,

11:26 is that down the bottom of one of the screens there's this really odd text.

11:31 And the text is something like, the spirit of Stack Overflow is developers helping developers.

11:36 And this isn't an app for your car.

11:38 And we're looking at this going, what are you doing?

11:41 Why would you put that in there?

11:42 And then, of course, we found the Stack Overflow post where they've literally copied and pasted the text from the Stack Overflow post without understanding what it does.

11:50 Put it in the app that controls features of your car.

11:52 That's incredible.

11:53 Honestly, folks, if you Google for this, Google like, Nissan Stack Overflow code reuse, and it's just like, this is insane.

12:00 How does this happen?

12:01 I mean, how does this happen at all, let alone in a car or software to control a car?

12:06 Yeah, I feel like the companies that have these types of IoT-like things that are really valuable, you know, not a light bulb, but cars and humans travel in them at high dangerous speeds.

12:18 They should really be careful about this, right?

12:19 Yeah, they probably should.

12:20 They probably should.

12:22 Maybe you should send them an email about that or a bunch.

12:25 Yeah, well, afterwards, they were like, if you find any other stuff, please do send us an email.

12:30 I sent you an email.

12:30 We had phone conversations.

12:32 You were endorsed.

12:33 What happened?

12:33 Anyway.

12:34 Wow.

12:36 So, do you know if any of these car companies run bug bounty type things?

12:39 Yeah, no, Tesla does.

12:40 In fact, Tesla runs a bug bounty through Bug Crowd.

12:43 Bug Crowd is a bug bounty as a service platform started by a mate of mine who's done extremely well moving over to the U.S. and getting funded and doing wonderful things.

12:54 And bug bounties are becoming absolutely massive now.

12:57 It's great to see him doing well, but it's great to see this being a big thing in the industry now as well.

13:02 Yeah, I totally think it's a very positive thing.

13:04 I suspect most people listening to know what bug bounties are, but maybe just define it for everyone.

13:08 Yeah, so a bug bounty is effectively acknowledging that maybe the right way to put this is everyone who has anything online is continually getting free penetration tests.

13:18 So, there are always people out there probing away at your things.

13:21 And a bug bounty is a means of saying to people, look, if you find vulnerabilities in our software, if you find things that could be dangerous, like SQL injection, submit them here.

13:31 And there's usually a formal process.

13:34 You know, this is the email address to send it to.

13:36 This is the information we need.

13:37 Here's how to encrypt your communications.

13:39 And then, depending on the vulnerability, you may be incentivized.

13:42 So, you may actually get anything from a T-shirt to a large amount of money.

13:45 And this is a really neat way of recognizing that we do have flaws in software.

13:51 This is the nature of building software.

13:53 And that if someone finds it, it is actually worth something.

13:55 And the sort of value proposition of the bug bounty is that it incentivizes people to report these things responsibly and allows the organization to handle them and fix them before someone actually goes and exploits nasty things.

14:07 And obviously, there's some incentivization for that in a monetary sense.

14:11 And these are becoming really big.

14:13 So, you know, we mentioned Tesla.

14:14 The Pentagon has run a bug bounty.

14:16 Like, these are going really, really mainstream.

14:19 And there's a lot more to it than just some random hackers sitting in a basement on the other side of the world trying to break into your things as well.

14:24 They can be very, very carefully managed programs with well-selected testers.

14:28 Yeah.

14:29 Like, the Pentagon one, you had to kind of interview and be approved to be part of it.

14:33 It wasn't just, now everybody go forth and attack the site, right?

14:36 You couldn't be Australian either or anything else that wasn't American, as I understand it.

14:40 That's unfortunate.

14:42 Well, you know, like, it's the freaking Pentagon.

14:44 Like, I kind of get that.

14:46 And frankly, just the fact that they ran that and it became such a more mainstream thing that entered into people's psyches in places it just wasn't before, I think is a very positive thing.

14:57 Yeah, I totally agree.

14:57 It's a positive thing.

14:58 And this money can be pretty large.

14:59 Like you said, it could be a t-shirt, but it could be $100,000.

15:02 And this incentive has been there recently.

15:04 Like, if there's not a bug bounty, there's probably some other bad actor who's willing to pay $50,000 for a good O-Day, right?

15:11 That's the thing.

15:12 And then we sort of get into this interesting space of who is competing for the dollars of the bugs.

15:16 And look, some people will argue that there are still actors out there that will pay a lot more than what the organizations with the potential vulnerability will.

15:25 But then at least you sort of get to have a little bit of weight on the other side of the scales in a monetary sense.

15:31 And of course, from a legal and an ethical perspective as well, there's always the incentive to try and report things through the formal channels and get money.

15:41 But yeah, look, there'll always be nefarious parties willing to pay for this stuff as well.

15:45 Yeah, at least now there's some option to monetize that and make it part of your living and do the right thing.

15:50 Exactly.

15:51 Yeah.

15:51 This portion of Talk Python To Me has been brought to you by Rollbar.

15:55 One of the frustrating things about being a developer is dealing with errors.

15:59 Relying on users to report errors, digging through log files, trying to debug issues, or getting millions of alerts just flooding your inbox and ruining your day.

16:08 With Rollbar's full-stack error monitoring, you get the context, insight, and control you need to find and fix bugs faster.

16:15 Adding Rollbar to your Python app is as easy as pip install Rollbar.

16:19 You can start tracking production errors and deployments in eight minutes or less.

16:23 Are you considering self-hosting tools for security or compliance reasons?

16:27 Then you should really check out Rollbar's compliant SaaS option.

16:31 Get advanced security features and meet compliance without the hassle of self-hosting, including HIPAA, ISO 27001, Privacy Shield, and more.

16:40 They'd love to give you a demo.

16:42 Give Rollbar a try today.

16:43 Go to talkpython.fm/Rollbar and check them out.

16:48 Speaking of looking at breaches and disclosure, you're running a pretty amazing website called Have I Been Pwned that has really grown in terms of awareness for when these breaches happen, right?

17:02 Tell everyone about Have I Been Pwned.

17:04 Yeah.

17:04 Well, Have I Been Pwned has been running almost for four years now.

17:08 I've got to think about a birthday thing to do, actually, because it will be, I think, late November, early December will be the four-year anniversary.

17:14 And I started that after the Adobe data breach.

17:18 So Adobe was about 150 million records from memory.

17:21 And at that time, I'd been doing some analysis across sort of different data breaches, you know, looking for patterns.

17:28 Are people using the same password?

17:29 Well, guess what?

17:30 Yeah, they are.

17:31 Are people appearing in multiple different incidents, you know, like the same person?

17:34 And I was sort of seeing stuff.

17:36 I thought, like, I find this really interesting.

17:38 I reckon other people would find it interesting if they could see where they are exposed as well and particularly see things like, you know, you're exposed in these multiple places.

17:47 So that was sort of the genesis for the project.

17:51 And I built that out with my 100 and I think it was a total of about 155 million records.

17:56 So it was basically just about all Adobe when I launched it.

18:00 And I was like, wow, this is massive.

18:01 And I was actually, I built it all out on Microsoft's Azure platform as well.

18:05 And I really wanted to spend time using Azure in anger.

18:08 So, you know, how can I do something that actually uses a lot of storage?

18:13 And how can I use things like the table storage construct instead of relational databases to save money and make it go faster and all the rest of it?

18:20 And that was sort of a bit of a hobby project too.

18:23 And now sort of fast forward to nearly four years later and there's 4.8 billion accounts in there, which I still find to be a bit of an unfathomable number.

18:31 That's really incredible.

18:33 Incredible.

18:34 That's more than half the population of the earth.

18:36 I mean, I realize it's not one-to-one person to account, but still, that's staggering.

18:41 It is crazy.

18:41 And there's just sort of all sorts of metrics about the project that just exceeded any sort of form of expectation.

18:47 So that's one of them.

18:48 The visitor stats, an average day is somewhere between 50,000 and 70,000 people come to the site.

18:54 A big day is seven figures.

18:56 So I think I had about 2.8 million in one day the other day.

18:59 What was the driving factor behind that?

19:01 That's a good question because there's always a trigger, right?

19:04 So it's to sort of deviate from that baseline by an order of multiples.

19:08 In that case, there was this spam bot.

19:11 So it was called the online spam bot.

19:12 And a security researcher in France discovered this spam bot where, unfortunately, or fortunately, depending on your perspective, whoever was running the spam bot didn't do a great job of actually securing their data and left 711 million email addresses exposed.

19:27 So he managed to grab all this data and sent it over to me.

19:31 And I went, OK, well, let's load 711 million records in here.

19:34 It's not a breach in the traditional sense, but it is data about individuals redistributed over the Internet.

19:40 Let's load this.

19:42 And of course, a huge number of people were then interested in it.

19:44 And that sort of made a lot of news headlines.

19:47 And this is the thing that really drives the traffic.

19:49 It's news headlines because every time there's a data breach, I see a spike because it's on CNN or the BBC or something that faces the masses and not just the tech audiences.

20:00 So it's sort of really interesting to see how broadly appealing the project has been.

20:05 Yeah, I think it's a great service.

20:06 And I've gotten probably five or six emails from your service saying you've appeared in some breach or other.

20:13 Usually it doesn't freak me out too much because I use one password and my passwords are 40 characters long and they're unique per site.

20:21 So I hear a lot from people saying they hate getting email from me.

20:24 So I apologize to everyone that's had an email from me.

20:27 Don't hate the messenger, right?

20:29 Come on.

20:29 Well, exactly, right?

20:31 Yeah, cool.

20:31 So the one that really stands out to me that you were pretty well highlighted in, it actually had some interesting ethical components to it, was the Addison Mashley one.

20:44 Just make it look like you don't know the name.

20:46 That is a great defense.

20:47 Well done.

20:47 Thank you.

20:48 I've heard of it.

20:50 Honestly, darling, I've never heard of this site before.

20:52 I have no idea why my email address is.

20:53 So this is like an adult friend finder, let's have an affair type of thing.

20:57 And it got hacked.

20:59 Because not just people's passwords and possibly reused passwords were leaked, but just the fact that you even existed on the account was a pretty bad data breach.

21:07 You had to even be a little careful about letting people know or searching for that data, right?

21:13 Yeah, well, look, I mean, Ashley Madison, just for context for everyone.

21:16 So this was a data breach that happened in July 2015.

21:19 And hackers said, hackers, we still don't know who it was, said, look, we've got the Ashley Madison data.

21:28 These guys, we don't agree with their business model.

21:31 And OK, many people took an ethical stance on the whole context of this is not like a dating site.

21:37 It's not like an outright adult site.

21:39 It is literally their whole MO was helping people have affairs.

21:43 You know, their strapline used to be life is short, have an affair.

21:46 So they were, I guess, trying to mainstream adultery.

21:49 And obviously, a lot of people took issue with that ethically.

21:53 So whoever it was that got their data, said, look, we got the data.

21:57 These guys have been bad.

21:58 We don't like the business model.

21:59 If they don't shut down in the next month, we're going to dump the data publicly.

22:02 And this was kind of interesting on many levels.

22:06 I mean, we do see threats like this, but often we see threats that are more financially motivated.

22:10 We will see threats that say, you know, give us Bitcoin or we'll dump your data publicly.

22:15 But in this case, they just obviously took an ethical dislike.

22:19 And it was kind of interesting as well, because part of the original reasoning was what Ashley Madison did.

22:24 And this is a this was a real dick movie in anyone's books is they said, look, you know, you sign up to this website.

22:30 If you want to remove your data, you've got to pay for this full delete service.

22:36 And from memory, it was about $19 to delete your data.

22:38 So what would happen is there were a lot of people and I had I had literally hundreds of conversations with people on the site afterwards.

22:46 And I was learning there are a lot of people who'd maybe have a couple too many red wines one night and go, hey, this might be fun and sign up and then, you know, this wasn't a good idea.

22:55 And then they'd forget about it.

22:57 There are a lot of people in there who were single.

22:59 There are a lot of people who look again, regardless of your ethical position on it, were consenting adults.

23:04 And they decided they wanted to use the site.

23:06 And then if they wanted to get off there, they're having to pay money, which is just it's just a reprehensible move.

23:11 But obviously, they thought they could monetize that.

23:13 But anyway, one of the things the hacker said, as I said, when you pay for your full delete, you're not actually getting deleted.

23:20 And what we discovered after we actually saw the data is paying the money would null out your record in the membership table.

23:28 And then there would be a payment record with a foreign key back to the membership table.

23:32 And the payment record would have your personal data on it.

23:35 So it's like, yes, we removed you from the database.

23:38 By the way, you just created a payment record.

23:40 And the payment record has your personal info on it.

23:43 So they took an ethical dislike to it.

23:47 And in a way, it was kind of fortunate the way it panned out and that we had a month's notice between saying we're going to dump this and it actually happening.

23:55 And I had time to think about how would I handle this and have I been pwned.

23:59 Because it turned out to be more than 30 million records.

24:00 So it was a very large breach.

24:02 And I kind of went, look, this is going to be valuable data if it does turn up.

24:06 And I want to make it searchable.

24:08 But by the same token, this is the sort of thing where I don't want to be the vector through which, let's say, a rightly jealous wife discovers that her husband has been on the site.

24:18 And incidentally, this was a very, very heavily male-dominated service, which probably comes as no surprise to anyone.

24:24 And the women that were on there, a huge number of them signed up from IP address 127.0.0.1, which is a little bit suspicious.

24:30 That is a tiny bit suspicious, yes.

24:32 Yes.

24:33 And actually, this is one of the things we learned, right?

24:35 They're actually FEM bots, you know.

24:37 So I always think Austin Powers, every time I hear FEM bots.

24:40 I don't know if it was exactly like that.

24:42 However, you've got these accounts on there, which are effectively bots, which are trying to engage with men.

24:49 Because the more you can engage with them, the more you can get them to pay and all sorts of shady stuff.

24:53 So anyway, eventually the data does get dumped.

24:56 And by virtue of having had time to think about it, I had decided that I'd introduce the concept of a sensitive breach.

25:03 And a sensitive breach means the data still goes in Have I Been Pwned?

25:06 But you can't publicly, anonymously search for someone else.

25:10 So you've got to actually subscribe to the notification service, which is free.

25:14 But the reason I use that mechanism is because that sends you a verification email with a unique link.

25:19 You click the unique link, and then it says, okay, now I know that you control this email address.

25:23 We'll show you everything.

25:25 So the public ones and the sensitive ones.

25:27 Pretty legit way to handle it.

25:29 Definitely the way the website handled the breach didn't sound like it went well.

25:34 Let's talk about one more breach before we move on to some developer topics.

25:39 One that you did write about was Disqus, the comment section, which actually is at the bottom of your show page, right at talkpython.fm/136.

25:51 I suspect you could find Disqus right there.

25:53 And you said that these guys, while something happened in terms of a data breach, they handled it right.

26:00 Yeah, and you know, it's at the bottom of my blog as well.

26:02 And it's also my data in the Disqus data breach.

26:06 So I think that the macro picture here is that when data breaches happen, there's a real broad range of reactions from organizations involved.

26:16 And they range from, on the one hand, being extremely difficult to get in touch with, sometimes denying it, often downplaying the severity, sometimes covering it up, like knowing there was a breach, but not telling people because they're worried about reputation damage.

26:30 And doing all the sorts of crappy things that you might expect an organization at the receiving end of one of these things might do.

26:37 And then on the other hand, there's sort of like, this is exactly the way to handle it.

26:41 And Disqus was very much on that right-hand side.

26:44 And there's really only a couple of organizations I've dealt with that have been down there.

26:48 And the things that Disqus did really well, one of them is the speed.

26:52 So everyone would have seen Equifax in the news only last month.

26:56 And Equifax took about five weeks after learning of the data breach to advise everyone.

27:02 They needed to sell their stock.

27:03 Well, geez.

27:04 Allegedly.

27:06 Well, they did sell stock.

27:08 Allegedly because of the breach.

27:10 You've got to be careful.

27:11 Right, right, right, right.

27:12 People get very litigious in America.

27:13 I know this.

27:14 Yes, that's right.

27:16 Actually, I've heard this.

27:17 Fortunately, I don't know this through personal experience.

27:19 But the thing with Disqus is that I got in touch with them.

27:24 And actually, just to be clear as well, the context we've discussed is someone popped up and gave me seven different data breaches.

27:30 And they're things that I had never seen anywhere before.

27:33 It was things like Reverb Nation, Kickstarter, Bitly.

27:37 And all of these three had previously disclosed where they'd said, hey, we've been hacked, never seen any data for it.

27:43 And then suddenly they all turn up in this one place and they're all legit.

27:46 And then one of them as well was this Disqus one.

27:49 And I'm trying to find references to a data breach.

27:52 I can't find anything.

27:53 And there's 17 million email addresses in there.

27:55 So fortunately, I had a contact there, someone I'd been chatting to only a couple of months earlier on another topic.

28:01 And I was like, I think I have your data and you probably should know about this.

28:06 And from the point where I sent that first email to when they had made a public statement and had already reset impacted passwords as well, I think it was 23 hours and 43 minutes.

28:21 It was just under a day.

28:23 And it's like, okay.

28:23 That's awesome.

28:24 That is awesome.

28:25 Like you guys have turned this around less than a day.

28:27 And when I spoke to them, so we jumped on the phone as well and had a good chat.

28:31 They were sort of really, I mean, obviously they weren't happy about the situation.

28:35 No one's going to be.

28:36 But they sort of understood that this is the reality of operating online today where these things do happen very regularly.

28:42 They prepared communications, which was transparent, candid, honest.

28:47 They worked with the media as well.

28:50 So one of the things I was sort of impressing on them is I think it's really important to engage with the media because there's going to be stories on this.

28:58 And you can either ensure that those stories are representing your point of view and your version of events or you can say, look, we're too busy or we don't respond to the media and you can let them form their own opinions.

29:07 So they just did all of that right.

29:10 And the only negative feedback I saw out of any of this was people saying, well, how come it took you four years?

29:16 Because apparently it actually happened in late 2013.

29:18 Well, they didn't know.

29:20 And we still don't know.

29:21 Well, I haven't seen any press as to how it happened or why it took this long to know.

29:27 But I guess based on the hand that they were dealt a couple of weeks ago, the way they handled it was just exemplary.

29:33 One of the things that's always kind of in the back of my mind, I've run a number of websites, have a couple of servers, some backend servers.

29:40 I talk to those servers.

29:41 You know, how would I know if I've been hacked?

29:44 Well, as we've just seen, you may not.

29:47 And this is sort of part of the problem.

29:49 We've got to remember as well that the discuss situation is far from being exceptional.

29:54 So just off the top of my head last year was LinkedIn, Dropbox, MySpace, Last.fm, Tumblr, many others that were in similar boats where they had had breaches years ago and were only just discovering it.

30:06 And incidentally, that big stash of data that had Bitly and Kickstarter and all the other ones in it, there was another one.

30:11 I'm thinking very carefully before I say these words because two of them haven't been disclosed yet.

30:15 But there was another one which was a service called We Heart It, which seems a little bit like Pinterest.

30:19 That was in there.

30:20 And there were two others which were still waiting.

30:23 One of them should be sending their message out any moment now, which is millions of accounts again.

30:27 And then a final one, which I'm still trying to get in touch with some people.

30:31 Not everyone replies to an email when you say, hey, you've been breached.

30:34 They might pretend it went to spam and just go dark, right?

30:37 Yeah.

30:37 Well, they won't be able to once I've published the data and have I been pwned.

30:40 But I'd really like to give them the opportunity to control the messaging themselves first.

30:44 But what we've got to remember with all these cases is they happened years ago, very often with entirely different people in the organization as well.

30:52 Very often with different infrastructure, different code bases.

30:56 Very often you're not even going to have logs that go back, say, four years.

31:00 You know, do you really keep web server logs that long?

31:02 A lot of organizations don't.

31:03 So it can be enormously difficult to know when an incident has actually happened.

31:09 And I guess your point about how do you know, one of the things that sort of continues to strike me is that there are all of these incidents that have already happened that we don't know about as the public.

31:20 And then a subset of those, probably a very large subset, where the organization themselves doesn't know about it as well or don't know about it.

31:27 So we're yet to see so much stuff actually come out of the woodwork that we just haven't even begun to conceive of yet.

31:33 Yeah, that's kind of daunting.

31:35 It's sobering, isn't it?

31:36 Yeah, sure.

31:37 So it sounds to me like a lot of the time this becomes something that we're aware of because the data is discovered.

31:44 You're like, oh my gosh, these accounts are all coming from this one place.

31:47 There might be people that use like their email address, like my email address plus LinkedIn at gmail.com.

31:55 And you're like, well, I only use the plus LinkedIn at LinkedIn and it's out in this paste bin or something, right?

32:00 Well, when people do that, so when they use that sort of aliasing pattern where they have this sort of plus after the alias or when they have their own domain and they create custom aliases for every service.

32:11 Right.

32:11 LinkedIn at my domain.

32:12 Yeah.

32:13 Yeah, right.

32:13 That does actually help.

32:14 In fact, one of the ones I'm going through a disclosure with now, and this is sort of the last out of the seven that was in the stash I just spoke about.

32:21 It wasn't immediately clear where all the data was from.

32:24 It's labeled one service, but I was actually checking with have I been pwned subscribers saying, hey, look, you're in here.

32:31 Have you used this service?

32:32 And a bunch of them were saying, you know, no, I've got no idea what it is.

32:35 And it was only when I started going through looking at things like the aliases on email addresses where it's like, oh, okay, I can actually see this other thing now.

32:43 And I was able to kind of join the dots and go, okay, well, I can see that this is actually from multiple different sources.

32:49 I think it's actually from two different sources.

32:51 It does seem like at least a little bit of a breadcrumb.

32:53 I don't do it that often, but sometimes I do.

32:56 Yeah.

32:56 No, it can be really useful for that.

32:58 The other thing that I hear people saying is I say, look, I use this aliasing pattern because then I can see when my data is, say, sold or redistributed or something like that.

33:08 I can see where the source was.

33:10 The problem then, of course, is what are you going to do about it?

33:12 Like we know that this happens.

33:14 And if you go back to the organization and say, hey, I signed up on your service and I signed up with my name plus your service name at gmail.com.

33:21 And I've just seen I've just gotten spam trying to sell me Viagra.

33:24 Okay, now what?

33:25 You know, like there's not really anything that's actionable.

33:28 Yeah, I've heard that.

33:30 And actually, I've been on both sides of that story, not with stuff that I'm doing these days, but previous project.

33:35 And, you know, how much weight should you put in that?

33:39 Like on one hand, I feel like, yeah, okay, if somebody really got all the data out of the database, they would have those things that they could email people.

33:44 But it seems like that's a weird thing to do is just spam folks.

33:47 On the other hand, if somebody breaks into that person's email and just harvests every email they see and just start sending to it,

33:54 maybe it's their account that got broken into and just happened to be they kind of loop back.

33:59 Well, you know, this is also one of the great mysteries, right?

34:01 Where there are so many different ways that these things can go down.

34:04 And very, very often, you just simply can't get to the bottom of it.

34:08 I mean, people say to me all the time, like, I'm getting spam.

34:11 Where's it come from?

34:12 I don't know.

34:12 It's like you put your email address out all over the place.

34:14 It could be from anywhere.

34:16 Yeah, that's right.

34:16 Get a good spam filter and just buck it up, I guess.

34:19 Exactly.

34:22 This portion of TalkPath in a Me is brought to you by GoCD from ThoughtWorks.

34:26 These are the people that literally invented continuous integration.

34:29 They have a great open source on-prem CDCI server called GoCD.

34:34 But rather than tell you about the server this week, I want to share a course they created for people who are new to continuous integration.

34:39 Check out their course called Continuous Delivery 101 from GoCD at gocd.org slash 101.

34:47 This video series covers the history of continuous delivery, concepts, best practices, how to get started, and popular tools.

34:54 You'll gain a holistic view of continuous delivery and a deeper understanding and an appreciation of the critical concepts.

35:01 Be sure to try their course at gocd.org slash 101 and let them know that you appreciate their sponsorship of Talk Python.

35:08 Speak to the web developers out there.

35:10 What lessons have you learned from running Have I Been Pwned that you'd like to share with that audience?

35:17 There are a lot of things, you know, speaking or sort of thinking about it from a pure dev perspective.

35:22 Going back to what I was saying earlier, one of the main reasons I wanted to do this is because I really wanted to do something in anger on Azure.

35:31 And I've had a heap of fun, to be honest, building this service out on Azure and sort of experiencing the whole cloud scale thing and commoditized pricing and all the other sort of promises of the cloud.

35:44 And there's a heap of learning out of that, both good and bad.

35:46 I mean, some of the good stuff has been things like the ability to have auto scale, you know, so actually provision more infrastructure on evidence of exceeding existing infrastructure resources is fantastic.

35:58 And I've spoken many times at events about how to run a project like this on a coffee budget.

36:04 So how do I run a service with 4.8 billion records and sometimes millions of visitors a day for what you'd spend on coffee?

36:10 And I don't always manage to do that, but I usually get pretty close.

36:14 And things like really managing your scale very carefully have been great.

36:19 Things like choosing the right data storage constructs for your use case have been great as well.

36:25 So particularly in sort of the Microsoft-y world, there's been this traditional view of you're going to store data, you've got to have a SQL database.

36:33 And a SQL database is a behemoth of a thing, right?

36:36 I mean, it is a big, big thing.

36:39 And it's expensive and it's...

36:41 Especially with four point something billion records.

36:43 Well, hey, look, if I got to that point, if I was actually trying to put that data in a relational database like SQL Server, I mean, the cost would just be astronomical.

36:51 But it would also be really unnecessary because the patterns with which the data is used just don't predispose it to needing a relational database.

36:59 So one of the best things I ever did was to use Azure's table storage, which is basically just a key value pair.

37:05 And I just petitioned it in a way that worked really well.

37:09 So those 4.8 billion records are in there.

37:12 You can create a petition and then a row key.

37:15 So my petition keys are the domain.

37:17 So say gmail.com.

37:18 And then the row key is the alias.

37:20 And what that allowed me to do is do super, super, super fast lookups.

37:24 Because when you're searching, you're literally searching by domain and alias.

37:28 So a very specific petition and a very specific row key.

37:31 And it also made it really easy to do entire domain wide reports because I just pulled the entire petition.

37:37 So that works awesomely.

37:39 And that is still the single best decision I've ever made, I'd say, in terms of the architecture.

37:44 So for those 4.8 billion records that on disk is some tens of gigabytes, they actually don't have good reporting about actual size.

37:52 And it always scales infinitely.

37:54 So it is platform as a service.

37:56 I've never reached any scale capacity on the storage tier.

37:59 And it usually returns records within sort of a single digit millisecond range.

38:04 And that cost me about $40 a month.

38:06 Awesome.

38:06 Yeah, which is just like, how cool is this?

38:08 It just rocks.

38:09 One of the other really, really big things I've learned is I started using Cloudflare for Have I Been Pwned about a year ago.

38:17 And look, I've been using them on things like my blog and a couple of other little projects.

38:22 And it was kind of cool for that in that you get HTTPS for free and a few other little sort of bits and pieces that make things like my blog run a lot better.

38:31 But it made a massive difference to Have I Been Pwned.

38:34 And I originally did it because I was getting DDoSed and that sort of, that wasn't fun.

38:38 And Cloudflare put a stop to that.

38:42 But then it became really, really awesome because you have things like a firewall that you can control programmatically.

38:47 So one of the reasons I put it in place was the API I have, I introduced a rate limit because I was seeing some fairly nefarious behavior to it.

38:56 And I went, oh, look, I'll just put a rate limit in.

38:58 And if you exceed the rate limit, I'll return a 429 too many requests and say you can retry after two seconds.

39:04 And they'll see that and they'll stop.

39:07 But no, they don't.

39:08 They just kept hammering it.

39:10 And one of the pennies that dropped was that when you expose that sort of origin server to the world, you have to deal with everything there.

39:18 So you have to deal with every incoming request on that same infrastructure, which is actually trying to reserve legitimate requests.

39:24 And when you put a service like Cloudflare in front, suddenly you have this other layer in front where you can start to programmatically exclude nasty stuff and actually free up the underlying resources to do the things they're meant to do.

39:37 So I've got some great rate limiting implementations.

39:40 I've got a really neat model I've written about before using Azure Functions where even when I see behavior that's slightly nefarious on the website itself, I just drop in a JavaScript challenge rule on the Cloudflare edge node so that if you go to the site, it just makes sure you're in a browser and you can't automate it with an API.

39:56 And then you basically get a 24-hour timeout and then you can try again.

40:00 So it's been awesome for that.

40:02 That has also been super awesome for reducing my cost because I cached the bejesus out of this thing.

40:07 So Cloudflare has got 118 edge nodes as of today around the world and everything from the front page to the FAQs to every single image in JavaScript file and CSS file is served from those edge nodes.

40:20 So the actual traffic that comes through to the site is usually just API requests and a couple of other dynamic things.

40:27 So it's really, really dramatically reduced my costs and the frequency with which I need to scale my infrastructure.

40:33 Oh, that's really cool.

40:34 And Cloudflare is actually in the news recently this week, a couple weeks ago, for announcing basically unlimited DDoS production.

40:41 So yeah, that's cool.

40:43 It's the world we live in today, isn't it?

40:45 And look, that was sort of their traditional thing, right?

40:48 Like they made their name out of DDoS protection.

40:50 But they do so much more now because they can sit on the wire and intercept that traffic.

40:55 And that spins some people out as well.

40:57 And if you are listening to this and you get spun out by it, I've got a blog post about security absolutism.

41:02 So if you Google my name, security absolutism, you sort of see me try and put things in perspective.

41:07 But it means you can do stuff like all these websites that are now going HTTPS because they're being sort of forced down this route.

41:14 You can go HTTPS for free via Cloudflare within about five minutes.

41:19 And not only that, but they can also do things like add an HSTS header because when you can intercept the traffic, you can add headers.

41:25 They can rewrite HTTP references to HTTPS.

41:29 They can 301 all of your insecure requests over to secure requests.

41:34 And they can do all of this stuff because they're sitting there controlling the traffic.

41:38 And that just makes a huge amount of sense for many, many good reasons.

41:41 Yeah, that's cool.

41:42 And you don't have to worry about it.

41:43 Yeah, yeah.

41:43 It's like literally a turnkey thing.

41:45 Yeah, that's cool.

41:46 Yeah, I don't use it, but I've considered adding it.

41:49 And it seems like it might be a good idea.

41:50 I should be quiet.

41:51 Somebody might try to attack my...

41:53 No, you know what?

41:55 And it is like a five-minute job too.

41:57 And people go, yeah, but you can just go to Let's Encrypt and get a certificate.

41:59 And they're right.

42:00 And Let's Encrypt is absolutely awesome.

42:02 But it is just certificates.

42:04 And there is so much more that you can do with a reverse proxy like Cloudflare.

42:08 Like once you actually use it in anger on a large-scale site, you go, wow, how would I ever not?

42:14 You know?

42:14 Right.

42:15 And even if it's a small site or a static blog or something, it still makes an enormous amount of sense because of stuff like the caching.

42:21 Cool.

42:21 Yeah, it definitely sounds like it's worth checking out.

42:25 So one thing I wanted to talk to you about is WannaCry, which was a real sad thing that sort of was a ransomware thing that went around just encrypting all the things.

42:37 And it took out a bunch of places.

42:39 The most notable one, I guess, was the National Health Service in the UK.

42:44 That's right.

42:44 But it also took out things like Maersk, FedEx, a bunch of places, right?

42:49 WannaCry seems to me like a real lesson in just being vigilant and patching all of your stuff and keeping it up to date.

42:57 But there's tons of things that were way out of date.

43:00 What do you think some of the lessons are from WannaCry?

43:02 Well, just one thing on that, actually.

43:03 The Maersk one was NotPetya.

43:05 So NotPetya came, I think it was about a month or six weeks after WannaCry.

43:10 It wasn't that long.

43:11 Right, right.

43:11 There were similar times.

43:12 Yeah, it was similar.

43:13 But, yeah, yeah, there's similar times.

43:15 And look, it's still ransomware.

43:16 Look, you don't want either.

43:18 All right?

43:18 The thing that was really interesting about WannaCry is the sort of sequence of events leading up to that.

43:25 Without having the exact dates in front of me, from memory, WannaCry hit us in May.

43:30 And back in March, we had seen Microsoft say, look, there are some critical patches.

43:34 You really should take these critical patches.

43:36 And they didn't really elaborate why they knew they were so important or so timely.

43:40 But, you know, maybe you should just really just do this right now.

43:43 Exactly.

43:44 And then a month later, we saw this sort of shadow brokers dump.

43:49 So this collective that has collected themselves a bunch of zero days.

43:54 And one of the vulnerabilities in there was this eternal blue vulnerability which exploited SMB.

44:00 Which we'd normally use for sort of connecting and file systems and sharing information across them.

44:04 And the problem there was that you could remotely connect to a machine with a vulnerable SMB implementation and have remote code exec on it, which is really nasty stuff.

44:13 And, okay, now we're saying, all right, so a month ago, Microsoft said patch your things.

44:17 Now we know why it was important.

44:19 And then another whole month went by.

44:21 And then WannaCry hit.

44:22 So by the time someone got hit with WannaCry that was exploiting Eternal Blue, two months ago, they knew it was a big thing.

44:29 A month ago, we knew why it was a big thing.

44:31 And now you still haven't patched your things.

44:34 And really, the lesson out of this was around patching.

44:38 And this was just such a sort of, I guess, poignant example of why it was so important.

44:44 Because this was devastating.

44:45 I mean, it hit particularly the NHS.

44:47 But it hit everything from German trains through to other services and other parts of Europe in particular really, really hard.

44:54 And what sort of stunned me is after that, a lot of the stuff I was writing and talking to people about was this whole patching cycle where there were still people going, I disable Windows updates.

45:05 And there's literally tutorials out there.

45:07 How to disable Windows updates because maybe you'll get a bad one.

45:10 And I would have people sort of justifying turning it off.

45:13 They're like, well, I don't like it because sometimes you go down, you got to shut down your PC and it says you've got to wait while updates install.

45:19 And once I was about to get on an airplane and I couldn't close my laptop.

45:22 That's your reason.

45:25 But actually, my favorite one was a bunch of people would say, I keep installing all these stupid updates.

45:30 I've never even had a virus.

45:31 Yeah, but that's why you haven't because you install the updates.

45:36 You know, it's certainly an important part of it.

45:38 So to have that happen and then have NotPetya occur just after that, which again was exploiting a number of exploits.

45:45 But one of them was an unpatched or rather a patched vulnerability, which people had left unpatched and then got exploited.

45:51 Right.

45:51 If they didn't learn from the first two times around, they should have really learned from WannaCry.

45:54 I know.

45:55 Exactly.

45:56 Anyway.

45:57 Yeah.

45:57 The reason I bring this up is not to just talk about these crazy viruses and patching.

46:01 But I think there's a real tension in organizations and the larger the organization, the greater the dissolution of responsibility in this is to say like, look, there's this system that's running our invoices.

46:15 Nobody knows how really to upgrade it.

46:18 And nobody wants to be the one to take the responsibility of patching it because if it goes down, their weekend is toast.

46:24 We're just going to leave it.

46:25 Not my problem.

46:26 Someone else's problem.

46:27 Right.

46:28 How do you think we address that?

46:30 Look, it's a good observation.

46:31 And we've got to be fair here that we don't sort of overly trivialize the complexity that can be involved in actually patching these things.

46:39 I mean, think about the NHS for a moment and think about what hospitals run and some of the systems they have.

46:45 Think about an MRI machine.

46:47 You know, imagine actually trying to patch that thing.

46:50 And look, there's a bunch of them probably sitting out there still running Windows XP.

46:54 You know, you don't just like whack in the DVD and upgrade to Windows 10.

46:57 This is not a simple process.

47:00 So I'm sympathetic to that.

47:01 And I suspect that what happened in cases like the NHS is there's budget constraints.

47:07 In fact, we know there's going to be budget constraints in a hospital.

47:10 There's budget constraints.

47:11 The IT managers had to make a call between where that budget gets spent, what's patched when, how much money they allocate into different areas.

47:19 And it would have been a very hard problem.

47:21 And large enterprises are not exactly just running Windows update automatically across everything either.

47:26 I mean, these things are tested and rolled out through standard operating environments.

47:29 And they're a big thing.

47:31 And I think, to be honest, like the bigger picture here is that when there is a high friction of updates, it makes the uptake very difficult.

47:40 I mean, one of the things that Apple's done really well is they've made it such a low friction process.

47:45 You know, like, hey, iOS 11 landed the other day.

47:47 A thing popped up on my screen and I said yes.

47:49 And I never had to worry that it wouldn't work.

47:51 And if it didn't work, I would have restored from iCloud while I went out and, you know, kicked the ball with the kids or something.

47:56 And I'd come back inside and I'd be done.

47:57 So, yeah, that's really the model that we'd love to move towards where these things are low friction and automated.

48:04 And unfortunately, that's just not the reality with a lot of systems today.

48:09 Yeah, absolutely.

48:10 The more complicated they get, like it could be like a library that's compiled into your code that you've got to upgrade.

48:17 For example, with the Equifax one, right?

48:20 Yeah, the struts in Equifax.

48:21 That's right.

48:22 And like they had to like recompile and, you know, who knows what got deprecated, what had to have been changed.

48:28 Granted, that doesn't excuse them.

48:29 They really messed that up.

48:31 But still, it's not that.

48:33 It's a little bit hard trying to throw Equifax a bone.

48:36 But look, I mean, that is something where it's like, okay, let's just be objective about it.

48:41 I can see where when it struts and you have to like go and recompile some pretty serious stuff as well.

48:46 It comes back to that point about the friction.

48:48 And when there is this high friction of change, well, yeah, it's going to be hard to get this stuff done in a timely manner.

48:54 Yeah, absolutely.

48:55 Absolutely.

48:55 So we don't have a lot of time left.

48:57 Let's talk a little bit about IoT.

48:58 You wrote an interesting blog post.

49:03 I think it's a blog post called, What It Would Look Like If We Put Warnings on IoT Devices Like We Do on Cigarette Packs.

49:08 Tell us about that.

49:10 Yeah, yeah.

49:12 That was fun.

49:13 So the sort of premise of this is a lot of IoT stuff's got some pretty crazy vulnerabilities in it.

49:18 Now, one of the ones I saw a couple of years ago is that there's these kids' tablets.

49:24 So imagine if Fisher Price made an iPad.

49:27 This is sort of my vision of it.

49:28 So it's plasticky and colorful and all this sort of thing.

49:32 And these were made by a company called VTech, Hong Kong-based toy maker.

49:35 And VTech had a vulnerability that someone exploited, sucked out millions of parents' and children's data.

49:41 And the kids' data included things like their names, their birthdates, their genders, their photos, foreign keys to the parents with the parents' physical addresses as well.

49:50 So it was like from a stalker perspective, it was kind of the worst possible thing you can imagine.

49:56 I know, it was like super, super creepy.

49:58 And, you know, like this was really bad news.

50:01 And someone did break into their systems, but they had some shockingly bad aspects of their security there.

50:08 I mean, stuff like, I wrote a blog post at the time.

50:10 You would log in.

50:11 And when you log in via, they had a little flash emulator for the tablets.

50:16 Okay, there's another hint.

50:17 Some things are going to be wrong.

50:18 So there's a little flash emulator.

50:20 It calls an API.

50:21 The API returns a JSON response, which contains the actual SQL query executed in the database.

50:26 And it's just really, really weird stuff like that.

50:29 And it wouldn't surprise me at all if it was just classic SQL injection that the guy got in with.

50:33 Anyway, they had a bad time out of it.

50:37 And they were in the news only a couple of weeks ago because a class action against them didn't succeed, which frankly, I agree with.

50:44 I think people trying to mount a class action against a company where the data was exposed but contained.

50:51 It was never spread.

50:52 It was never abused.

50:53 It was just a bit of a money grab.

50:54 I think the regulators should be pinging them.

50:56 That's a different story.

50:57 But anyway, so they're in the news.

50:59 And this one story pointed out that their terms and conditions today effectively say you could get hacked.

51:05 Someone could get your data.

51:06 That's your problem, not ours.

51:08 I kind of replied to them or quote tweeted them and said, look, how about you put this on the front of the pack?

51:13 And it was a little bit tongue in cheek, right?

51:15 Because if you put that in the front of the pack, no one's going to buy your freaking tablet.

51:17 And I was thinking about it later.

51:20 I was like, it's almost like it's a dangerous good, you know.

51:22 And in Australia, what we do with dangerous goods like cigarettes is we put great big warning signs on the front of the packs.

51:28 And they're very graphic.

51:29 And they tell you how bad stuff can be when it goes wrong.

51:32 This can kill you in slow and horrible ways.

51:34 Let us list them.

51:35 I know.

51:36 And they're super, super graphic in Australia as well.

51:38 So I thought, all right, well, look, why don't we do this with IoT?

51:41 We'll literally just put these things on the front of these IoT devices like we would a cigarette pack.

51:47 So I just did this blog post with a bunch of mock-ups of what would it look like.

51:51 And it's, you know, there's sort of warnings on everything from the VTech tablets to teddy bears to automated dog feeders.

51:58 And that was a rather popular post.

52:00 Yes.

52:03 Warning.

52:03 You acknowledge and agree that your child's intimate voice recordings may be placed in an unsecure Amazon S3 bucket.

52:09 Oh, cloud pencil.

52:10 Oh, my God.

52:11 Oh, yeah.

52:11 That's pretty funny.

52:12 I think it makes a good point.

52:14 And I guess maybe just a final thought on IoT.

52:17 Like, do you think things are going to get better?

52:18 Or is it just going to continue to be a bunch of unpatched badness?

52:21 No, of course they're not going to get better.

52:22 I mean, there's just nothing that predisposes it to getting better.

52:27 When you look at the factors that are driving the growth of IoT, you know, we want to be first to market.

52:31 We want to put internet in things that were never meant to have internet.

52:35 And if you want to know what I mean by that, just Google WeVibe and we'll leave it at that.

52:39 So there are all sorts of things.

52:42 Wait, wait, wait.

52:42 Do that in an incognito window.

52:44 Look, you know what?

52:45 You'll find news stories on it.

52:47 And let's touch on that in a mature, responsible adult fashion.

52:51 So these are toys for adults and they are internet connected.

52:55 And what I find fascinating from this in a very kind of mature way is that this is digitizing data we never had before.

53:03 So it actually stored usage data.

53:06 And whilst these devices have been around for eons, that the concept of actually recording the use, everything from the modes that was used into the times of day, reference to the identity using it, we never had that before.

53:19 And now we have this new class of data, which is enormously sensitive by virtue of the fact that we've internet connected the things.

53:26 And the reason you'll find them in the news is that they recently got fined up to $10,000 per owner of the device because they were collecting this data without consent.

53:35 And you can imagine for the owners of them, I mean, that must be absolutely gut-wrenching to think that this sort of data about them now exists on a server somewhere.

53:43 Definitely.

53:44 It's quite troubling.

53:46 Yeah.

53:47 So I don't know.

53:48 I think you're probably right.

53:49 I think that there's going to be a lot of trouble with these types of things.

53:52 Just the incentive to keep things updated is not very good.

53:55 I did recently get an electric car and I have a charger for the car that's on the internet, which makes me a little nervous.

54:02 But I logged into it the other day and it had updated itself within the last two weeks.

54:07 So maybe the higher end devices will be a little bit better off.

54:10 Yeah.

54:11 Look, I think the devices that are doing these auto updates are the way we're going, right?

54:15 And some of them do it very well.

54:16 Some of them do it still with requiring user action, but sort of very low friction.

54:21 I hope that's going in the right direction.

54:24 But geez, there's so many things out there where something has gone fundamentally wrong in order to require the update in the first place as well.

54:31 Yeah.

54:31 Yeah, absolutely.

54:32 Yeah.

54:32 I mean, there's the danger that you could brick the device.

54:35 So there's that same type of hesitation to actually put that security patch in like that you have with the big companies, right?

54:42 Like no one wants to brick everything they've sold.

54:44 I know.

54:45 There's also that.

54:48 So, all right, I think we're just, I have so many questions I'd love to chat with you about and maybe someday we'll do a follow up and I can ask the other ones.

54:54 But I do want to give you a chance to talk about your courses because you've written a ton of PlutalSite courses and they're actually really, really valuable, I think.

55:03 So maybe touch on some of the ones you feel are notable for my audience.

55:07 I think the one which is most notable at the moment is the one that's still pinned to my Twitter timeline.

55:13 I do intend to leave it there for a little bit, which is what every developer must know about HTTPS.

55:18 And I love the HTTPS discussion at the moment because there are so many, so many angles to it.

55:24 And it's such an important time as well.

55:26 So, for example, back in January this year, we saw Chrome and Firefox start to warn anyone if they went to a login form or a credit card form over HTTP, even if it publishes or posts rather to HTTPS.

55:39 And that was an important change.

55:41 We're here recording in October and Chrome 62 has just hit.

55:46 And in 62, they are gradually enabling the feature, which now does the same thing for any page with an input form as soon as you type a character.

55:54 So you can go to like CNN, click on the search link, type any character into the search box, and then suddenly you get a big not secure warning.

56:03 So that makes things really, really interesting.

56:05 Like that's really starting to push HTTPS.

56:07 And the sorts of stuff I talk about in the course are things like, I think a lot of people know, you're meant to make sure that all of your embedded resources on the page are done so over a secure connection.

56:16 Otherwise, you lose your padlock and your green text and your secure in Chrome.

56:21 But there are tricks to help you do this.

56:23 So there are things like there is an upgrade in secure requests content security policy.

56:27 So you can add a response header or you can put it in a meta tag.

56:30 And if you accidentally put anything in securely on the page, then the request automatically gets upgraded to a secure one.

56:36 Oh, that's nice.

56:37 So there are all these neat little tricks like that you can do, which make HTTPS so much more easily accessible.

56:44 And a lot of time when I hear people saying I'm having problems because of this or that or whatever, it's like there are usually things like response headers, the CSP, also HSTS to enforce HTTPS connections for all requests.

56:57 You know, those sorts of things are just fantastic.

56:59 And that's what a lot of the course is about.

57:01 Okay, cool.

57:02 I'll definitely link to that and a couple of other ones in the show notes that I thought were pretty cool.

57:06 You also have a nice article on getting started in ethical hacking as a career.

57:11 So that's cool as well.

57:13 Yeah, I guess we're going to have to leave it there for the topics.

57:15 I always have two questions I ask at the end.

57:17 So let me ask them to you now.

57:19 The one of them is if you're going to write some code, what editor do you open up?

57:23 Usually Visual Studio.

57:24 Okay, right on.

57:25 And now normally I ask about libraries and Python.

57:29 You don't do that much Python.

57:30 So I'll give you a pass on that one.

57:32 I'm going to give you a variation on this.

57:33 So what password manager do you use?

57:36 So I'm like you.

57:37 I use the password manager called 1Password.

57:40 Now, I said the password manager called that because if you just say I use 1Password, people are like, what is wrong with you?

57:45 Well, people should be like, what is wrong with you?

57:46 You're using it everywhere.

57:47 What are you thinking?

57:48 I know.

57:48 I know.

57:49 So look, I still use that.

57:51 I'm a little bit agnostic in so far as I think, frankly, so long as you're using a mainstream professional password manager like that or LastPass.

57:59 Honestly, that's going to make your life so much better in so many ways.

58:02 So use one of those things.

58:04 Yeah.

58:04 It definitely makes your blood pressure stay pretty cool when you find that there's been a password breach.

58:11 Yeah, that was 40 characters random.

58:13 I'm going to reset it.

58:14 Not a big deal.

58:14 That's it.

58:15 Yeah.

58:15 Right on.

58:16 Okay.

58:16 Well, Troy, thank you so much for being on the show.

58:18 Any final call to action for everyone listening?

58:20 No.

58:21 Look, I mean, if you want to learn any more, go to TroyHunt.com or find me on the Twitters as Troy Hunt.

58:26 All right.

58:26 Awesome.

58:26 I'll be sure to put those links in the show notes.

58:28 And thanks for being here.

58:29 Good on you.

58:29 Thanks, mate.

58:30 Yeah.

58:30 Bye.

58:30 This has been another episode of Talk Python To Me.

58:34 Today's guest has been Troy Hunt.

58:36 And this episode has been brought to you by Rollbar and GoCD.

58:40 Rollbar takes the pain out of errors.

58:42 They give you the context and insight you need to quickly locate and fix errors that might have gone unnoticed until your users complain, of course.

58:50 As Talk Python To Me listeners, track a ridiculous number of errors for free at rollbar.com slash Talk Python To Me.

58:57 GoCD is the on-premise, open-source, continuous delivery server.

59:02 Want to improve your deployment workflow but keep your code and builds in-house?

59:06 Check out GoCD at talkpython.fm/G-O-C-D and take control over your process.

59:12 Are you or a colleague trying to learn Python?

59:15 Have you tried books and videos that just left you bored by covering topics point by point?

59:19 Well, check out my online course, Python Jumpstart, by building 10 apps at talkpython.fm/course to experience a more engaging way to learn Python.

59:28 And if you're looking for something a little more advanced, try my WritePythonic code course at talkpython.fm/Pythonic.

59:36 Be sure to subscribe to the show.

59:38 Open your favorite podcatcher and search for Python.

59:40 We should be right at the top.

59:41 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.

59:51 This is your host, Michael Kennedy.

59:53 Thanks so much for listening.

59:54 I really appreciate it.

59:55 Now get out there and write some Python code.

59:57 I really appreciate it.

Talk Python's Mastodon Michael Kennedy's Mastodon