10 Python security holes and how to plug them

Episode #168, published Fri, Jul 6, 2018, recorded Thu, Jun 28, 2018

Episode Deep Dive Links Transcript

Do you write Python software that uses the network, opens files, or accepts user input? Of course you do! That's what almost all software does. But these actions can let bad actors exploit mistakes and oversights we've made to compromise our systems.

Python is safer than some languages, but there are plenty of issues to be careful about. That's why Anthony Shaw and Anthony Langsworth are joining me to discuss Python security.

Episode Deep Dive

Guests introduction and background

Anthony Shaw and Anthony Lingsworth join this episode to discuss the most common security risks in Python and how to address them. Anthony Shaw is a seasoned Python developer who contributes to open source, particularly around infrastructure and security tooling. His background includes working on Libcloud and teaching Python on platforms like Pluralsight. Anthony Lingsworth has a strong networking background, using Python for automating and interfacing with network hardware from companies like Cisco and Juniper. He brings a practical developer lens to security, focusing on how attackers exploit common programming mistakes in Python applications.

What to Know If You're New to Python

If you’re new to Python, here are a few quick points to help you follow this discussion on security:

Python is an interpreted, high-level language often praised for its readability. Many security pitfalls stem from using dynamic features incorrectly, such as mishandling user input.
The standard library provides batteries-included modules (e.g., subprocess, pickle, tempfile) which can be risky if used improperly.
Security often involves layering protections (e.g., validating all user inputs, using parameterized queries, limiting privileges).
Don’t be overwhelmed: Implementing even basic practices, like avoiding eval or using safe file operations, goes a long way.

Key points and takeaways

A Security-First Mindset Security requires inverting the usual developer perspective. Instead of just planning how your code should work, you must also consider how it can be misused or broken by an attacker. Simple steps like reviewing your authentication checks and restricting privileges can greatly reduce your exposure.
- Tools / Links:
  - OWASP Top 10: Widely recognized list of common security pitfalls in web apps.
SQL Injection and Parameterized Queries Accepting user input and directly concatenating it into SQL statements is a classic mistake. Attackers can escape your intended query string and execute arbitrary SQL commands. Parameterized queries, as offered by libraries like sqlite3 or ORM frameworks, mitigate this risk by separating code from data in queries.
- Tools / Links:
  - SQLite documentation on parameter substitution
  - SQLAlchemy (ORM) handles parameters safely.
Shell or Command Injection Vulnerabilities Similar to SQL injection, direct string construction passed to system calls (e.g., using subprocess.Popen) allows attackers to append malicious commands. Always use safe modules like shlex for escaping, or pass arguments as lists rather than raw strings.
- Tools / Links:
  - Python docs: shlex.quote
  - Python docs: subprocess best practices
Unsafe Deserialization (Pickle, YAML, XML) Loading untrusted data via pickle, yaml.load, or vulnerable XML parsers can execute arbitrary code. For YAML, prefer yaml.safe_load; for XML, use hardened libraries like defusedxml; and treat any pickle-based data from unknown sources as potentially dangerous.
- Tools / Links:
Timing Attacks and Constant-Time Comparisons Even if Python code is not prone to buffer overflow, attackers can exploit differences in how long each step takes to deduce passwords or keys. Using Python’s secrets.compare_digest or robust password libraries ensures comparisons happen in constant time, preventing data leaks through timing differences.
- Tools / Links:
  - Python docs: secrets.compare_digest
  - Passlib: Library for secure password hashing in Python.
Assert Statements for Security Checks Some developers use assert to validate privileged actions (e.g., assert user.is_admin). But assert is removed when Python runs with the -O optimization flag, leaving the check entirely disabled. Security checks should raise explicit exceptions or perform conditional checks instead.
- Tools / Links:
  - Python docs: assert statement
Temporary Files and Race Conditions Using the older mktemp() or creating a temp file insecurely can open race conditions where an attacker injects data between file creation and file access. Instead, use safe methods like tempfile.NamedTemporaryFile or tempfile.mkstemp().
- Tools / Links:
  - Python docs: tempfile module
Keeping Dependencies Updated (Supply Chain Security) Many breaches occur because of outdated libraries or hidden vulnerabilities in dependencies of dependencies. Tools like pyup.io help track known security flaws so you can stay patched. This prevents “dependency rot” where pinned versions age without updates and leave you exposed.
- Tools / Links:
  - pyup.io: Automated dependency scanning and update PRs
  - Python docs: virtual environments
Unpatched Python Runtime Just like frameworks and libraries, your Python interpreter occasionally has security fixes. Running an old version of Python (e.g., 3.6.0 when 3.6.9 is available) can expose you to known CVEs. Keep your Python runtime on the latest patch version to benefit from those fixes.
- Tools / Links:
  - Python Release Notes: Changelogs and CVE fixes
Teaching Yourself Security: Vulnerable-by-Design Projects Deliberately vulnerable apps like PyGoat and Djangoat demonstrate security blunders and how attackers exploit them. Walking through them helps you internalize what not to do and how to fix it. This hands-on approach is often more eye-opening than reading code snippets in isolation.

Tools / Links:

Interesting quotes and stories

"Developers want to build quickly and keep customers happy, but attackers are thinking, ‘How do I break it? How do I bend the rules?’" -- Anthony Shaw

"A lot of the stuff isn't rocket science. It's about not leaving default passwords or ignoring simple security steps." -- Anthony Lingsworth

"We often rely on pinned dependencies for stability, but that can hide major security holes if we never update them." -- Anthony Shaw

Key definitions and terms

SQL Injection: A class of vulnerabilities where attackers alter database queries by manipulating user input.
Command (Shell) Injection: Similar to SQL injection but targets system-level commands called by the program.
Pickle: Python’s built-in serialization format that can execute arbitrary code on load if tampered with.
Timing Attack: Exploiting time-based discrepancies (e.g., how fast a comparison runs) to extract sensitive data.
Parameterization: Safely inserting user input into queries or commands by passing parameters separate from code statements.

Learning resources

Here are a few ways to deepen your Python and security knowledge:

Python for Absolute Beginners: A comprehensive foundation in Python’s core syntax and features.
Defensive Security Readings - OWASP: Great resources on web security best practices.
Passlib Documentation: Learn how to properly hash and manage passwords in Python.
Talk Python Training: Offers courses on everything from web frameworks to specialized topics in Python, including security-focused ones.

Overall takeaway

Security in Python is not solely about advanced exploits or obscure vulnerabilities, it often comes down to common oversights like string-building queries, using insecure defaults, or leaving dependencies stale. By layering security (e.g., parameterized queries, safe serialization, environment hardening), you significantly reduce your attack surface. Whether it’s through hands-on practice with tools like PyGoat or systematically keeping your environment patched, staying vigilant and proactive is the single greatest defense against Python security pitfalls.

Links from the show

Anthony Shaw on twitter: @anthonypjshaw
Anthony Langsworth on twitter: @alangsworth

10 common security gotchas in Python and how to avoid them: hackernoon.com

OWASP Top 10: owasp.org
PyGoat: owasp.org
DjanGoat: github.com
Risky Business Podcast: risky.biz

Sponsorship links
Test and code podcast: testandcode.com
Talk Python Training: training.talkpython.fm
Episode #168 deep-dive: talkpython.fm/168
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Episode #168 deep-dive: talkpython.fm/168

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 Do you write Python software that uses a network, opens files, or even accepts user input?

00:04 Of course you do. That's what almost all software does. But these actions can let bad actors exploit

00:10 mistakes and oversights we've made in our code that will allow them to compromise our systems.

00:14 Python is safer than some languages, but there are plenty of issues to be careful of.

00:20 That's why Anthony Shaw and Anthony Lingsworth are here to discuss Python security. This is

00:25 Talk Python To Me, episode 168, recorded June 28th, 2018.

00:30 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem,

00:49 and the personalities. This is your host, Michael Kennedy. Follow me on Twitter, where I'm at

00:54 M. Kennedy. Keep up with the show and listen to past episodes at talkpython.fm, and follow the

00:59 show on Twitter via at Talk Python. Anthony Shaw, Anthony Lingsworth, welcome to Talk Python.

01:05 Hey, Michael. It's great to be back.

01:06 Hey, Michael. Good to be my first time.

01:08 Yeah, it's good to have you here. So we have a bit of a name conflict here. So Anthony Shaw's

01:14 aunt, and Anthony Lingsworth is Anthony for the rest of the show. Hopefully that works for you all out

01:20 there listening. And we're going to cover something that I think is often overlooked in Python,

01:25 the whole security side of things, right? I mean, I feel like, well, there's no buffer overflow issues and

01:31 things like that. So we must be just totally fine, right?

01:34 No. If only that was the case. If only that was the case.

01:38 Exactly. There's actually a ton of interesting vulnerabilities and things that are pretty common,

01:44 I suspect, still in some circumstances that we're going to go through. But before we do,

01:49 I'd like to hear your guys' story really quick. Anthony, how'd you get into programming in Python?

01:54 I mainly got into programming Python through networking programming. One of my responsibilities

01:58 here is interacting with a lot of Cisco and Juniper and other hardware, and Python is what they use.

02:04 Right on. And Ant, you talked about getting started on Lib Cloud last time, right?

02:09 Yeah, that was me. I was in Seattle and I got sick, and I was stuck in a hotel for the weekend,

02:14 so I learned Python. I never looked back since then.

02:17 Most people say, I got really sick, so I stayed in bed. Other people like you are like,

02:23 oh, I learned a new programming language because I wasn't feeling well.

02:26 I had nowhere to go.

02:27 That's awesome. I totally get it. Let's start first talking about just security in general.

02:33 Anthony, how do you think we should start thinking about security as software developers

02:38 and people that run servers and stuff like that?

02:41 What I tell people is when they start thinking about security, they have to invert how they think.

02:46 Developers are very good at thinking, how do I build something quickly and that's designed well

02:52 and that keeps my customers happy? Security is not like that. Security is how do I break something?

02:59 How do I compromise something? How do I bend the rules?

03:03 So you almost need a tester style thinking.

03:06 So you look at what you've built and think, well, how do I make it do things it's not meant to do?

03:12 I find there's just so much creativity in that.

03:15 It's really hard to think about.

03:17 And unless you see some examples of it in action, it's almost hard to conceptualize some of these things.

03:23 Like I remember the first time I saw the actual code for a buffer overflow exploit.

03:30 Here's how you overflow the buffer.

03:32 Here's how you get it to execute those bits that you've dropped in.

03:34 I'm just like, that is just, that's a different way of thinking, let's say.

03:38 Yeah. I mean, as you say, some of the people who do this, they're really smart and they get really technical.

03:42 But I also want to stress that a lot of this stuff is not rocket science.

03:47 A lot of this is looking at what you do and just trying to keep that mindset.

03:52 Yeah, I think that's a really good point.

03:54 It's easy to get overwhelmed thinking, you know, this stuff is super advanced.

03:59 But actually, 99% of the time, it's like, did you change the password from the default admin password on the database and stuff like that, right?

04:07 That's where people usually start whacking on it, right?

04:10 Exactly. Exactly.

04:11 I mean, most people make a big deal about the super, super complex hacks out there.

04:18 But if you talk to people like Kevin Mitnick and so on, a lot of the stuff they were doing is social engineering, which is just simple, as you say, password stuff or basic configurations.

04:26 A lot of the times with security, if you get the basics right, the simple stuff right, the attackers will just go elsewhere because they're looking for the easiest exploit.

04:35 Yeah, they're just rattling the door to see if it's open, right?

04:38 Exactly.

04:38 Yeah. And how about you? What are your thoughts on the sort of security thinking versus developer thinking?

04:43 Yeah, I mean, I guess that probably the difference between Anthony and I, which is, I guess, confusing because we have the same name.

04:50 Although we sound quite different, hopefully that'll be more obvious to the viewer if we have different accents.

04:55 Is that Anthony spends a lot of his time, I guess, looking at things from the security angle, whereas most of my time has been spent from a development angle.

05:04 And I've been curious about security, I guess, for a long time.

05:10 My first job was doing sort of system support for servers and dealing with hacked servers was a pretty common occurrence.

05:18 And I guess learning how people got into the systems, what they did once they got there.

05:24 And it was kind of shown in the movies, like there's, you know, this person in a dark room with like a green screen and it's all like super technical.

05:31 And they've always got a hoodie on and they're always playing some sort of like metal music in the background.

05:36 But actually, when you're more familiar with, I guess, how systems get compromised, it's just people running scripts that just pinging server after server after server looking for known vulnerabilities.

05:47 And if people haven't configured stuff properly or they've left bad usernames, bad passwords, then they just poke a hole in it and then have their fun, basically.

05:55 I mean, that takes up a lot of people's time and a lot of the security challenges.

05:59 So my kind of curiosity has been, how can I write code in a way that I'm not doing things which the attackers are going to be excited about?

06:07 Like, what mistakes could I be making in my code that they're going to be like, ah, fool, he left this particular vulnerability.

06:14 He didn't read this obscure documentation page.

06:17 And it does come across like that to people.

06:19 Like, it seems hard, which is, I guess, why I put the article together to point out some of the more obvious ones.

06:25 And they're not that complicated.

06:28 Yeah, that's a really good point.

06:29 Most of it isn't that complicated.

06:31 It's just knowing that what is out there.

06:33 You know, me personally, I don't think about it the way that you do, Anthony, in terms of like, how can I break things?

06:40 I think of it more as like a super paranoid, forgetful person, which I'm fairly forgetful in general life as it is, but not paranoid.

06:49 But I'm always thinking like, oh my gosh, what have I forgotten?

06:52 What is the thing that is just wide open that I forgot?

06:56 Did I forget to turn on the firewall that blocks the database port publicly?

07:00 Did I forget to, you know, validate this one thing?

07:04 I think, you know, a while ago there was one admin page on my site that didn't check whether this person was an admin or not.

07:12 And if you could come up with this random URL, you could have found it.

07:15 I'm like, oh my gosh, I can't believe I forgot that.

07:17 So that's sort of my feeling is like a constant uneasiness.

07:20 I know there's thousands of people attacking this site constantly, but throughout, you know, they sort of continuously, I guess.

07:27 And it's like, well, when will they find the thing I forgot?

07:30 It's an ego driven security as it's called.

07:32 But that's cool.

07:33 That's cool.

07:34 If it makes you more secure, then go for it, please.

07:37 Yeah, I think it probably does.

07:38 I mean, I'm always double checking stuff and I think it does really, really help.

07:42 So there's a lot of these things you can address.

07:44 And I think it's probably time to go through this list.

07:48 So when Brian and I spoke about this on Python Bytes just yesterday, actually it came out today, but we recorded it yesterday.

07:56 And he was like, well, the ones that were obvious, they're not even here.

08:00 I mean, some of them are, but some of them are like Python, you know, there's the eval, eval user input.

08:05 Like, don't do that, right?

08:06 So I think you have a really interesting list that you put together here.

08:10 Some of which will be people like, oh yeah, I've heard of that.

08:13 Others like, really?

08:14 I didn't know that was a problem.

08:16 And then they'll frantically pause the podcast and they'll run away and go look at their systems.

08:20 So you want to take us through the topics and we can sort of spend a little time on each one?

08:25 Yeah, sure.

08:26 So I guess I left out some really obvious ones.

08:30 I mean, eval is probably the worst one.

08:32 Like you wouldn't just accept input from the internet and pass it through eval, I'd hope.

08:36 You say that.

08:38 You say that.

08:39 I'm sure it's happening all over the place.

08:40 It's easier to sort that way when you just pass the lambda on the command line.

08:44 Yeah, the PHP equivalent is the reason that you're getting all those entries in your logs looking for wp-admin.

08:51 Anyway, so I guess the number one is input injection, which is just common across languages, across frameworks.

09:00 It's nothing specific to Python.

09:05 It's basically noticing that you're creating SQL queries and you're taking a parameter and you're injecting it into a string literal.

09:13 And what they can do is they can escape the quotes you put around the argument or the filter or whatever it was.

09:20 And they can basically run arbitrary SQL queries on your table.

09:24 So SQL injection, I think, is pretty well known.

09:26 It's pretty easy to avoid.

09:28 There are libraries for doing sensible escaping.

09:32 There are libraries for basically parameterizing your SQL queries, which is really the way it should be done.

09:38 But time and time again, that's probably seen as one of the most common things in web applications.

09:43 And it's just one of those things that won't die, right?

09:45 Like it's just there's still popular blog posts that have examples of SQL injection attacks all over.

09:52 And people just read until they get it to work.

09:54 They don't go to the comments and say, don't do this.

09:55 Don't do this.

09:56 Yeah, because it's like when you learn SQL or if you learn how to create SQL queries, it doesn't like there are so many tutorials out there that actually show you the wrong way to do it.

10:07 So this is why it keeps happening.

10:08 Yeah, this is why it keeps happening because people are like, this is how you write a SQL query.

10:12 And then, OK, you want to filter by a particular user or a particular ID or whatever.

10:17 Here's how you write a where clause.

10:18 And, oh, let's take that as a parameter.

10:20 Yeah, yeah, because it's simpler, right?

10:22 People are learning the wrong way to do it every day.

10:24 And we're having to unteach them that kind of practice.

10:26 Luckily, there's a cartoon to remind us.

10:29 SKCD.

10:31 Yeah.

10:32 Yeah.

10:32 Are you familiar with little Bobby tables?

10:34 Yeah, absolutely.

10:34 And I've seen the picture.

10:36 There's a photograph as well as of a car where someone's put on their license plate.

10:40 They've actually tried to put SQL injection on their license plate.

10:43 Get all those license plate readers.

10:45 That was brilliant.

10:47 Yeah.

10:47 So the license plate readers would crash.

10:49 That's the intention anyway.

10:50 And then I guess the other version of that is command injection, where you have a script which needs to call a local process on the machine,

10:58 like an encoder or a specific executable or something like that.

11:03 I mean, Python has a few ways of doing that in the standard library.

11:07 There's subprocess.

11:08 There's popen.

11:10 There's in the OS module.

11:14 There's another way.

11:15 And there's many ways that similar to SQL injection, you can take an argument from somewhere and people can escape that argument and use it to run arbitrary commands on the server.

11:25 So that's command injection or shell injection is called.

11:28 That sounds a little bit bad.

11:29 And that one actually was surprising to me.

11:31 I wasn't super familiar with that.

11:32 But if you're going to do popen and pass a string as the command to run, you could easily do dollar sign, dollar sign, next command or semicolon or whatever the separator your shell uses.

11:43 Because you can issue commands that have multiple operations one after another in a single line, right?

11:50 Yeah, so the most common example is somebody writing a, basically calling an executable and then giving it an argument.

11:56 Let's say you want to create a thumbnail or encode an MP4 or something.

12:00 Then you'd call a local process which does the encoding and you'd give it an argument which is the name of the file you want to encode.

12:07 So that's straightforward, except that if instead of a file name, they close the quote and then they do a semicolon or an amphosand, amphosand, and then they can put in any other command they like.

12:18 How could that be wrong?

12:19 How could it go wrong?

12:20 And it's funny because I shared this post and I saw some of the discussion forums, people arguing back saying, oh no, that's not a problem.

12:27 Why, how could that possibly be a vulnerability?

12:29 And even if you kind of like make it really obvious, people don't look at it like an attacker would look at it.

12:35 They look at it like, oh, who would misuse that?

12:38 How could that happen?

12:38 Exactly.

12:39 That's flexibility.

12:39 It's good.

12:40 It means good things.

12:41 Yeah, and I think it's really interesting to have these two contrasts, not just SQL injection.

12:45 And I was going to add too that there's, in addition to the escaping and stuff that Ant is talking about, there's additional things you can do here to protect yourself.

12:55 So, for example, if you can run as the least privileged account you can.

13:00 So, don't run as root or administrator, for example.

13:03 So, if you do happen to have a bug in your application where someone can do command injection, you're going to minimize what it can do.

13:11 So, there's multiple levels of ways you can protect yourself here as well.

13:15 I think that's a really good point for a lot of this stuff, right?

13:18 It's sort of security as layers, not the one big silver bullet, right?

13:23 Yes, very much so.

13:24 Yeah.

13:24 The Python way around that is, well, first of all, avoid doing this if possible.

13:29 I mean, there's many other ways of doing spawning processes and doing them correctly.

13:35 But if you have to, if you need to escape the input, then use the shlex module, which is built into the standard library, and it has a utility function for escaping shell commands.

13:45 Yeah.

13:45 What I really like about your article, which I don't know if we've said it explicitly, it's 10 common security gotchas in Python and how to avoid them.

13:52 And we'll put that in the show notes, of course.

13:54 And for many of these, you have these, oh, did you know there's this other module you can just use that fixes this problem?

14:02 So, that's, it's really nice that you have sort of some way to deal with this, not just like to put paranoia into people.

14:08 This episode of Talk Python To Me is brought to you by Test and Code and the Python Bytes podcasts.

14:15 If you love podcasts in Python, and you're listening to mine, so you must, right?

14:19 Well, don't miss out on these two other great podcasts.

14:23 If you're in a hurry and just want the headlines, get the rapid fire Python Bytes podcast that I co-host with Brian Okken over at pythonbytes.fm.

14:30 Brian also hosts his own show, Test and Code, at testandcode.com, where he explores topics like test-driven development, continuous integration, code coverage, and pytest, of course, as well as agile development, mentoring, public speaking, and lots more in an opinionated mix of interviews and solo episodes.

14:47 Keep up with the latest news and become a better developer by listening to Python Bytes and Test and Code.

14:52 The next group of attacks are all related to deserializing or serializing data.

15:00 And this is a pretty common vector where, let's say you're writing an application that reads configuration from somewhere.

15:08 And that could be in YAML or JSON or XML, or it's receiving a message from somewhere, or you're basically using it as a way of communicating between different processes.

15:19 There's different ways you can do that in Python.

15:20 You could use XML, you could use JSON, you could use YAML, or you could use pickle files.

15:25 So all four of those have massive gaping holes in them, which you could drive a bus through.

15:30 So I kind of explained how you could abuse each one of them.

15:35 I guess people might assume that XML is the safest.

15:39 Yeah, it seems sort of old school, but XML can get complicated on the edges, you know, like weird namespaces and referring back to itself and other sorts of bizarre things that nobody ever does.

15:50 Yeah.

15:51 So the one I kind of explained was the billion, what's called the billion laughs exploit.

15:58 It's funny.

15:58 And it's, you know, look, if you're going to get like taken down, at least it should be a funny sort of hack, right?

16:04 Come on.

16:04 Got to appreciate it.

16:06 So this is if you wanted to, if you received an XML payload from somewhere and an attacker has access to put their own XML in there, then basically XML has this thing called entity expansion.

16:19 I guess the most similar way to think of it is a zip file.

16:24 And what you could do, imagine you created a text file and it just contained spaces, lots and lots and lots of spaces, like let's say a million spaces.

16:34 Now, if you compress that into zip, it compresses down to nothing because the way compression algorithms work.

16:40 But when you extract it, it's like, oh, it's one character.

16:43 Yeah, exactly.

16:44 So it can condense down to pretty much nothing.

16:47 Let's say you compress that into a zip file and then you made 10 of those, you copied it and you pasted it 10 times.

16:53 And then you copied those 10 zip files and then added them to a zip file and then did that again and again and again.

16:58 Now, what happens is if you extracted that zip file, it will go and sort of recursively extract the zips.

17:06 And it can go from something like a one kilobyte zip file to gigabytes or potentially even worse.

17:13 So the XML version, that's called a zip bomb.

17:17 The XML version is a similar idea where in XML, you can have basically you can sort of reference things.

17:24 And so in entity expansion, it can kind of reference something which references something.

17:29 So it's similar to the zip bomb where you kind of create this tree and then it basically exponentially expands.

17:35 So when you're trying to deserialize this special message in XML, it can take up gigabytes of RAM.

17:40 So that is basically the exploit is that it just takes your server offline.

17:44 And here it is fitting on one screen with word wrapping.

17:48 Yeah.

17:48 And it's like eight lines of XML.

17:50 Yeah.

17:50 It sort of teaches you that exponential stuff and like factorial type things.

17:54 They get big fast.

17:55 Yeah, exactly.

17:56 So the XML one's pretty bad.

17:57 There's some other pretty well-known ones with XML.

18:01 So if you actually look at the standard library documentation for Python, and this is how I came across this.

18:06 It does say in a big red box that there are known security challenges with XML.

18:10 And you shouldn't deserialize arbitrary XML documents.

18:15 Unless you do like soap web services, then what could go wrong?

18:18 Exactly.

18:18 So I guess the alternative is you can use a package called defused XML, which is a third party package.

18:26 But it was actually written by Christian Himes, who's one of the core developers on Python.

18:30 He's a security expert at Red Hat.

18:32 So he kind of put together, I guess, a safer version of the XML standard library module.

18:37 And it's a drop-in replacement.

18:38 That's awesome.

18:39 Does it just like not support these self-referential things and some of the other dangerous bits?

18:44 It's like a limited XML?

18:46 Stuff that it would be really unlikely you'd have to use anyway.

18:48 Yeah.

18:48 Anthony, as a network person, how do you think about these issues?

18:52 Like, do you scan for these sorts of things?

18:55 They keep me up at night?

18:57 Look, there's a whole class of exploits with XML.

19:00 There's exploits where you can embed schema information into there and do weird stuff with schema.

19:06 I generally think that if you are going to handle XML, you generally whitelist what you accept.

19:12 You define your own schema and say, if it doesn't match that schema first, I'm not going to parse it.

19:18 I'm only going to parse minimal pieces of it.

19:20 There's also, you can use what's called an XML external entity attack where you reference to a schema or an inclusion from somewhere else.

19:29 And that somewhere else is relative to your server.

19:32 So by getting you to upload a piece of XML, they can try to get your server to load something from your internal network and potentially display it back to the attacker.

19:41 So XML can be used as a reconnaissance tool as well.

19:45 It's not to say that XML is bad, but you have to follow the advice that is giving you now to use it properly and use it well.

19:52 Yeah.

19:52 Stay away from the dangerous bits unless you really need them.

19:55 And it's rare these days, I think, that you need them.

19:58 It seems like, you know, early 2000s, XML was supposed to be the answer to all the questions that involved any form of connectivity, right?

20:06 Like we have all these fancy things with XML.

20:10 We had all the hype around soap services.

20:14 You have XSLT going wild being like the way to generate all sorts of stuff.

20:18 And I feel like we've kind of moved on, but there's still plenty of XML processing around.

20:23 XML has its place.

20:24 It's a great metadata standard with a lot of bells and whistles attached.

20:29 The problem is 90% of people don't need those bells and whistles, but it is what it is.

20:34 Yeah, I guess I should knock XML.

20:36 I mean, it is the main bit of traffic on some of my websites with the podcast RSS feed.

20:41 That thing gets hammered.

20:43 So I guess I should love XML, but that's the simple version.

20:46 So another thing that you put in here that I thought was really interesting is people relying on assert to actually do runtime control.

20:58 This one is pretty obscure, actually, and not that well known.

21:02 And that is that there's effectively an optimizer.

21:06 Well, there's a series of optimizations you can do in Python.

21:09 And you can run Python in optimized mode, which does.

21:13 I mean, if you look at the history of Python, basically what it tries to do is traverse the syntax tree and kind of make assumptions about how it should be executed.

21:22 So that if you're doing it in production, it almost removes commands out of your code and skips them or looks at them differently.

21:30 So this is things like constant crushing constants together, loop on folding.

21:35 And there's a whole bunch of optimizations.

21:37 One of those optimizations is to ignore assert statements because really they should be used for debugging.

21:43 So if you're used to any other programming languages, so sometimes you can basically create like a debug assembly or a runtime assembly where they have debug symbols in one.

21:53 And it's designed for production and another Python basically has that, but it's by adding an extra flag when you execute the Python file.

22:01 So it skips certain commands depending on whether it's running an optimized mode or not.

22:06 So if you use an assert statement to do things like checking that the password is right or checking that the user has the right amount of privileges or whatever, then basically it all runs fine in unit tests and integration tests because you never use optimized mode because you always want the assert statements.

22:24 And then when you actually deploy it to production, someone says, oh, Python is quicker, like 10%, 20% quicker if we run it in optimized mode.

22:32 They switch the flag on and all of your security catches and everything like that just disappear.

22:39 Yeah.

22:39 I mean, they could be totally sort of silent as well, right?

22:42 If you were doing like containers, you could depend upon a Docker container and the base image could change in a way.

22:47 They're like, yeah, we made it a little bit better.

22:48 It's faster.

22:49 Oh, great.

22:50 I love faster.

22:51 Let's do that.

22:51 Wait, why can people be on this page again?

22:54 This is terrible.

22:55 Yeah, that's pretty scary.

22:56 So the fix here is don't do that.

22:59 It's like, doctor, it hurts when I raise my arm.

23:01 Well, don't do that.

23:02 Keep your arm down, basically.

23:04 Like just use a if, you know, not admin, raise exception, permission error, or something like that, right?

23:10 Yeah, typed exceptions.

23:11 Yeah, exactly.

23:13 Which is how it's supposed to be done anyway.

23:14 Okay.

23:15 Another one that's pretty interesting is not as common for the web because of the latency, but it's pretty interesting.

23:24 Timing attacks.

23:24 Tell us about those.

23:25 Anthony, do you want to explain this one?

23:26 So timing attacks, there's two really, two little buckets this falls into.

23:31 One, if you're doing any form of security sensitive operation, like let's say, for example, you're checking a password is valid.

23:38 The way a lot of code works is it might say, well, it will say, well, if the password is too short or too long, then don't bother checking the password, just fail out straight away.

23:49 And an attacker can use this to work out.

23:51 Well, what is a valid password to decrease the passwords it needs to check?

23:55 I see.

23:56 So if you're going to do like a dictionary attack and you realize, well, it's zero milliseconds for anything under seven characters, we'll start there.

24:04 Exactly.

24:05 Exactly.

24:06 So if you're doing particularly some cryptography, the idea is it's meant to have an approximately equal amount of time, irrespective of what input you give it.

24:16 There's also another set of attacks around time of check, time of update.

24:20 So if you're doing certain checks and then doing stuff that depends on those checks, make sure those checks are around the same time in code as they are that you're doing it.

24:32 Because otherwise there is a gap between when you do that check, that security check, for example, and when you do that task or that operation that requires a result of that check, that can be exploited too.

24:43 So there's kind of two areas here.

24:45 That's pretty interesting.

24:46 I hadn't really thought of the sort of information leakage around that.

24:51 So I want to give a shout out to a Python package that I think just is really amazing.

24:55 He talked about password checking and stuff called Passlib.

24:58 Are either of you familiar with Passlib?

25:00 No, I'm not.

25:00 Yeah.

25:01 So Passlib is really awesome because, you know, one of the bits of advice is to get a random salt and mix that in when you generate the password, right?

25:11 Or when you hash and store the password.

25:13 And then also make this computationally expensive to guess.

25:15 So Passlib is like, there's like one function of like encrypt and one function like verify.

25:21 And it will, you give it a plain text password and it will take that and it will fold it like 150,000 times with custom salt for that particular user.

25:31 And then it can check it.

25:32 But it takes like, you know, 0.2 seconds to determine whether it's right or wrong.

25:37 It doesn't even just check the length.

25:39 So it's a really nice way to sort of add a few more layers there.

25:42 Yeah.

25:43 So the other one I recommended was part of the standard library, but it was only added in 3.5.

25:48 And that's a module called Secrets.

25:50 And it has a method called Compare Digest where you can basically give it two values and it says, are they equal or not?

25:57 So that's where you would say if the actual password equals equals the password the user entered.

26:03 And statements like that are the type that can be left open to timing attacks.

26:07 So Compare Digest basically makes it impossible to do that.

26:10 Yeah, that's cool.

26:11 So one piece of advice that I have around if you're doing anything like password management or any cryptography stuff, generally speaking, don't write it yourself.

26:20 Use a library off the shelf that other people have used and vouched for because it's very easy to, if you do crypto wrong, it can be worse than having no crypto.

26:32 Because for all the false assurances it will give you.

26:36 Right.

26:36 You think it's fine, but, you know, I don't understand.

26:38 I base 64 encoded the password.

26:40 They can't get that back out of the database.

26:42 Come on.

26:43 Of course not.

26:44 Of course.

26:45 Yeah.

26:46 That pass level one that I talked about, by the way, they have like a list of all the support algorithms and the ones they recommend.

26:51 And they're constantly checking them.

26:52 So I think it's bcrypt and SHA-512 right now that they're using.

26:55 It'll be bcrypt and probably the PBK-DF2.

26:58 They're the two ones that people tend to recommend these days.

27:00 So, yes.

27:01 Yeah, nice.

27:02 So one that I think is kind of scary because it's an exact opposite or in contrast to what makes Python awesome, right?

27:12 Like, you know, import anti-gravity.

27:15 It sort of captures the joy of Python where you just import anything and it's just there and it just works.

27:21 But it's possible that you could have installed something bad, right?

27:26 Yeah, absolutely.

27:27 And I think not just install something bad, but the way that Python imports work.

27:33 They're really flexible.

27:34 You can override all sorts of things, including system like almost language keywords.

27:40 So you can override the print function or the assert statement or you can do all sorts of crazy things in imports.

27:47 So I think one of the challenges with Python is that if you have a package somewhere in your site packages, basically an import path that does something malicious, it can be very difficult to detect that.

27:59 And because your site packages is almost like a tier.

28:03 So you've got like a system level one and then you might have a virtual environment one.

28:07 And then maybe there's another one that you use as well where you might import stuff from somewhere else in your Python path.

28:13 Then when you import something, you never actually know unless you go and check what it's imported.

28:18 Yeah.

28:19 And it could do really bad stuff like, you know, just while you're talking, I was thinking, OK, so I write a package.

28:24 The import statement is effectively execute dunder init.py.

28:29 Maybe in that section, I'll see if I could go and find SQLAlchemy or some database connection.

28:35 And then I'll monkey patch that over for the connect so I can grab your connection string and then pass it along.

28:41 So you think your app is still working or, I mean, there's all sorts of nefarious stuff you could do, right?

28:45 Yeah, exactly.

28:45 And I think it's just people put a lot of trust in a pip install statement in terms of not only what package they installed, but what dependencies that package had.

28:56 And what dependencies their dependencies had and what ended up getting installed and whether any of those have actually polluted your site packages.

29:03 So basically, if you're importing even a standard library module, like, is that has something else overloaded that?

29:11 And is that import basically changing your global namespace?

29:17 This portion of Talk Python To Me is brought to you by us.

29:21 Over at Talk Python Training, we're building out our library courses, and that library is growing each month.

29:27 That's why I want to take a moment and tell you about a special offer we have.

29:30 It's called the Everything Bundle.

29:32 With it, you get, well, everything.

29:34 Every course we have in our library, including the ones yet to be published this calendar year.

29:38 Many online training companies have subscriptions and discounts if you buy them a year at a time.

29:43 Our Everything Bundle is like a subscription, but it's way better.

29:47 With subscriptions, the moment you stop paying, you lose access to everything.

29:51 With our Everything Bundle, you keep access to our courses forever.

29:55 Any course published the year you bought the bundle or sooner is indefinitely yours, even if you never renew.

30:01 Check out what's available at training.talkpython.fm.

30:04 And if you're looking for a way to support the podcast, taking or recommending one of our courses is the very best way to do this.

30:11 Thank you.

30:12 That sort of hints at a really scary scenario because I'm pretty careful what I install, but I don't go and check the dependencies of everything that it depends upon before I use it.

30:25 I'm like, yeah, this looks like a reputable package.

30:27 I'll go ahead and use it.

30:29 But it hints at almost like a supply chain challenge in that.

30:34 Imagine I find some not very popular package that something popular is based upon.

30:40 And then I go do a PR that sneaks in some extra sneaky dependency.

30:46 That is actually the problem that people don't realize.

30:48 And so when that gets pushed out to be updated, you update your thing.

30:52 All of a sudden, you're indirectly pulling in this thing because some guy was like, yeah, it looks like a good PR.

30:57 I'm busy, but we'll take it anyway.

30:59 Go ahead.

30:59 Yeah, it certainly can be.

31:00 So, well, there isn't really one.

31:02 Again, like the doctor.

31:04 Don't do that.

31:05 It hurts when I do this.

31:06 Don't do that.

31:06 Well, you do reference a service that I'm a fan of.

31:11 So tell people about it.

31:12 Yeah, so pyup.io basically has a sort of a database of packages and versions of those packages which have known vulnerabilities.

31:21 So this links to one of the other things I put in the article, which was not patching your dependencies, which is a pretty common pattern where people say, okay, I'm going to get all my dependencies and then I'm going to do pip freeze or something.

31:34 And I'm going to put that into my requirements.txt file.

31:37 So I've basically pinned which versions do I know work and no one ever touched that file.

31:44 So like that's probably one of the scariest things, as you said, here's all the versions.

31:48 These are the ones we tested with.

31:50 But no one goes and checks each packages or each dependencies website and sees the notification that says, oh, yeah, sorry.

31:57 There's a massive security hole in version 2.9.6.

32:01 Please update to 2.9.7.

32:03 And it's the dependency, maybe the dependency of your dependencies thing.

32:07 Like even if you were crazy diligent and you went and checked all the stuff you installed, that's not even enough.

32:12 Yeah, exactly.

32:13 So pyup.io basically has a database, a freely available database where you can look up bad versions of packages that exist on PyPI.

32:22 And they have a service where they can actually go and scan your requirements.txt if it's on a GitHub repository.

32:29 Or they have hooks where you can use it with other services like GitLab.

32:33 And if they basically say that there are newer versions available or this is a bad version, then they'll give you a notification.

32:39 So I guess that's one of the better ways of approaching the problem.

32:43 And this is probably the point in my article that got the most negative feedback from people.

32:48 They said, oh, of course, you have to pin dependencies.

32:50 Like, you know, you can't just willy nilly install the latest version.

32:55 But pyup does it does have version ranges.

32:58 There's a there's a PEP for that.

33:00 So you can say, you know, I want this minor version.

33:03 So I want version 2.8 up to 2.9.

33:07 And then if they release a 2.8.1 or 2.8.2, then you'll automatically install the right version.

33:12 And if you hope that people use semantic versioning, the API itself and the functionality shouldn't have changed.

33:18 And of course, you've got good test coverage anyway.

33:20 So you'll catch it.

33:21 That's a really good point.

33:22 The problem is pip freeze doesn't do that.

33:25 Right.

33:26 And that becomes a ton of work on your side of things.

33:28 I use pyup for my stuff.

33:30 Like so the training website, the various podcast websites and a few other things.

33:33 I have it hooked up and I will get weekly pull requests.

33:37 These three packages have changed and they could be like deep dependencies.

33:42 Here's their change log.

33:43 Here's here's the new version.

33:44 And there's just a PR which patches the frozen packages in my requirements at TXT.

33:49 It's really nice.

33:50 Anthony, how do you think about this?

33:52 Yeah.

33:52 So one thing I was going to add is that you can never predict when a package that you thought was good may turn bad.

33:59 People might write a package and think, yeah, this version is great.

34:02 And then someone somewhere finds an issue with it.

34:04 So the key here is a regular monitoring of what you're doing and a regular patch cycle.

34:10 Because you said you never know when something's going to happen.

34:13 So you need to make sure you're looking for it.

34:14 And that turnaround time is as short as possible.

34:17 So the focus here is more on your operations.

34:20 How do you manage your system as opposed to necessarily how you program it?

34:23 But nevertheless, that's important.

34:25 Yeah.

34:25 I think another step there is you have to proactively be applying these changes.

34:31 It's one thing to notice there's a problem.

34:33 It's another to notice that PyUp sent me a PR that says, oh, you're depending on this version of, you know, a JSON parser or something.

34:41 And it has this vulnerability.

34:42 You need to fix it.

34:43 It's another to go, yeah, and now I'm patching the server.

34:47 You think, look at, what was it, Experian?

34:50 Right?

34:51 They were hacked with a known vulnerability.

34:54 It was, everybody said, this is ultra bad.

34:56 You need to drop what you're doing, skip lunch and go fix it.

34:59 And it's just, well, we work at this big company.

35:02 We can't be bothered to change this because if it breaks, this is not my site.

35:06 But if I break it trying to fix it, I'm losing my weekend.

35:09 So I'll let someone else deal with that on Monday when they get in.

35:11 And then, you know, 143 million people lose all their credit and personal banking details, me included.

35:17 Ouch.

35:18 Yeah.

35:19 But I mean, what are you going to do, right?

35:20 I think there's like another layer that goes on top of this.

35:23 Like you have to act.

35:24 Yes.

35:25 I mean, it's a patch management aspect of your server maintenance or system maintenance.

35:30 I mean, as Ant said, there's tools inside Python, but you need to use them and you need to be using them regularly.

35:36 You need to have that, you know, I do need to fix it now.

35:39 So how do I, how do I decrease that turnaround time to as small as possible?

35:43 Yeah.

35:44 Yeah.

35:45 And what do you think about speed to actually deploying the fix?

35:48 Like, are there like continuous delivery bits of magic that you people would be willing to rely upon?

35:54 Like it says massive security vulnerability.

35:56 We will let it automatically publish this if the test pass.

36:00 I'm not sure.

36:00 I mean, if your test passed, it still needs to go through some sort of UAT.

36:04 It depends, I guess, on the, it depends on the project and depends on what it's for.

36:08 Like if it's a, if it's an internal web application, then, you know, it's unlikely, but you also want to make sure that thing gets patched and updated as quickly as possible.

36:18 The cost of, the cost of, the cost of downtime is one thing, probably smaller than the cost of attack or the cost of compromise.

36:25 So I guess it's a, it's a, it's a kind of looking at it from the angle.

36:28 Another thing I guess I want to talk about is that the way that you as a developer talk to operations, assuming you, you have an operations team that you're talking to, and that's not just another hat you have to wear.

36:40 And the way you talk about what versions of the dependencies your, your package depends on and how brittle that is.

36:47 Cause something I've heard a lot is people say, never know, don't touch my dependencies.

36:51 Cause I know that they work and it was probably going to break the code.

36:54 So from an ops, you know, angle, they're like, well, we're going to look after all the system.

37:01 And then we're going to look after all the stuff that's, you know, dev have told us not to touch that and to leave it alone.

37:06 Otherwise it's all going to go really bad.

37:08 And if, if they touch it and they try and patch something, then the dev team's going to be like, oh, well, you know, you've, you've been changing the code and you've changed the dependencies and we didn't ask you to do that.

37:17 And that's why it's breaking.

37:18 So I guess it's, it's a conversation between the two teams or if you've got DevOps and then having a conversation about how you try and automate some of that stuff.

37:26 Yeah.

37:27 The more automation, the better.

37:28 So the next one that you have on your article was surprising to me.

37:31 I didn't really think about it, but I guess it makes sense.

37:34 Is it temporary files?

37:35 Yeah.

37:36 I'm lucky enough not to have to use those very much in most of the stuff I write these days, but they're actually very common vulnerabilities.

37:43 So apparently there's a MK temp function that'll generate a temp file.

37:46 I didn't even know about this.

37:47 I don't use temp files very much either, I guess.

37:49 Yeah.

37:50 It's a, it's an old one.

37:51 And I came across this when I was doing building the course for moving from Python two to Python three.

37:56 Cause one of the recommendations is that you don't use the Python two functions for creating temporary files because they have known security issues in them.

38:04 So kind of digging into that a bit deeper.

38:06 So there's a temp file module, which is the one that you should use.

38:10 And there's a function called MKSTemp, which you should use to generate temporary files.

38:15 Is that for secure?

38:16 I'm pretty sure S stands for secure.

38:18 Yeah.

38:18 Nice.

38:20 Okay.

38:20 So the problem is there could be timing attacks or if you hit the time, just right, like you could create the temp file and then you could swap out the data before they read it or other bad things like that.

38:30 Right?

38:30 Exactly.

38:31 So the reason the old one is bad is because it will create the file first and then it will open and load the file.

38:38 So what can happen is someone can sit there scanning the file directory, the temporary file directory.

38:44 An attacker can basically sit there looking at it constantly.

38:47 And as soon as they see a file pop up, so the file handle is being created, then they can go and dump a load of data in there.

38:55 And then you open it and then it's all changed.

38:58 Like basically it's a way of them injecting data into your application.

39:01 Yeah.

39:01 That sounds bad.

39:02 So another one, you kind of touched on this before is parsing other file types can be really bad.

39:09 And I kind of riffed on XML being bad because it was old and complicated, but YAML is kind of new and awesome.

39:16 That probably is bad, right?

39:17 You know, this YAML one is so really hardly anybody knows that you can do this because when you look at the example, you think, why would they allow that to happen?

39:28 Wait, hold on, hold on.

39:29 I think exactly like maybe you shouldn't tell anybody about this because you're going to tell people they could do it as much as to protect themselves against it.

39:36 Exactly.

39:37 No, I'm just kidding.

39:38 Go ahead.

39:38 Tell everyone.

39:39 So I guess the most popular package for reading and writing YAML files is called PyYAML and it has a method called YAML.load or YAML.loads.

39:50 And if you actually read the documentation, it says in big letters, warning, YAML.load is not safe, essentially, because you can run any Python function by putting special characters in the YAML file.

40:05 Yeah, like exclamation, exclamation, Python object apply, OS.sys, you know, some form of like Python code or calling some sort of sub process or all sorts of bad things.

40:17 Like that's incredible that you can do that, that you can just say, shall execute this in my YAML file.

40:22 So you can basically put a special, it's like two exclamation points and then a special path and then the name of the function in the standard library that you want to execute and the parameters.

40:34 And then if you call YAML.load and the YAML text or the YAML file has that basically the special syntax in it, it will just execute that locally.

40:43 So you can imagine like if you had a web application where you were asking people to provide some simple YAML input to describe something or you were loading something to configure something.

40:54 I mean, an example that I used in there was, you know, I use YAML a lot with Ansible, which is like a network automation tool.

41:01 And they had a, they had the vulnerability.

41:03 They had the same vulnerability because in that product called Ansible Vault, they were loading it using YAML.load.

41:09 And people are like, oh, awesome.

41:11 I can just like pop a shell on this box instead of providing a password.

41:16 That seems amazing.

41:18 Anthony, do you use Ansible or work with YAML and worry about these things?

41:22 Yes.

41:23 I mean, this is, this is a perfect example of something that really, it's like for those, if any of your listeners are familiar with JavaScript, you used to have the eval that used to run to load in JSON, but they changed that to parse because eval was inherently unsafe.

41:38 And this is an example of a function, which is just, it just goes on the, on the, on the do not use list for anything that is serious, right?

41:45 You just don't use it.

41:46 If there's no equivalent on it, then, then we look at something, something else because there's an, unless we go and rewrite it, there's, there's no workaround.

41:54 It's basically like giving people like just dollar prompt access, like type here.

41:59 What do you want to do on the server?

42:00 Exactly.

42:01 Exactly.

42:01 To be honest with you, this is one of those things.

42:03 When I read something like this, the first thing in my head is what were they thinking?

42:07 I am with you.

42:08 So first.

42:09 Yeah.

42:10 So, and tell us the fix.

42:11 What is your fix here?

42:12 Okay.

42:12 So there's two fixes and I only found out about the second one yesterday and I think it might have something to do with this article.

42:18 The first fix is that you don't use load, use safe underscore load, which.

42:24 Of course.

42:25 Of course.

42:26 So in the same way that you have to use MKSTemp instead of MKTemp, you should use, however, I guess, explicitly saying you want to do things safely is a bit of an anti-pattern.

42:37 So I did see there was a pull request which has been merged into PyYAML to make the safe version the default.

42:44 And you have to explicitly ask for the unsafe version.

42:47 However, I think that's only been merged in the most recent version of 4.1.

42:53 And then if you ignore all of our prior advice about updating your pip dependencies, then you're never going to see that anyway.

42:59 These do interplay, don't they?

43:01 So I was definitely thinking of that API design sort of experience and like, wait, load and then the safe unload is what you're really supposed to do some of the time.

43:10 Like, this does not help people fall into the pit of success, right?

43:16 Like, to me, it seems like if you don't know what you're doing, it should default or just lead you down the path of, well, do the safe one.

43:24 And if they need the crazy, you know, run this shell thing, like, make them hunt for that.

43:29 Don't make that the default until they know better, right?

43:31 Sounds like at least they've gotten that working now.

43:34 Let go.

43:35 Yeah, exactly.

43:35 And I mean, the advice that I would give to anybody who is designing their own APIs is security by default.

43:41 What you do, the most obvious way of doing things, the standard way of doing things should be the secure way.

43:47 If you have features that are potentially unsafe, by all means, expose them that make the developer or the user specifically ask for it so they can accept and manage the risk.

43:59 Yeah, that's perfect.

44:01 One of the ones that I felt is kind of obvious on this list is obvious because I've been around for a while, I guess, is this concept of pickling.

44:10 And pickling is the act of taking more or less a binary version of a Python object graph, putting it on disk, and then rehydrating it back into objects, right?

44:21 Yeah, and it's typically used in sort of inter-process or inter-Python version communication or dumping an object somewhere and then loading it again somewhere else.

44:31 And so it's kind of built into a standard library.

44:34 It's been around for a while.

44:36 And before things like JSON kind of became really popular, I guess it's one of the preferred ways of communicating or sharing data between Python processes.

44:46 The problem is that it has a number of fairly well-documented security vulnerabilities.

44:51 I mean, if you're loading up a pickle file, you can basically put things in the pickle file, which cause it to run any Python process.

45:00 Like, if you basically create a new class and then you declare something called dunderreduce, then the contents of the dunderreduce method, if you return back a tuple, you can basically, the tuple that you return can be the name of a standard library function and its arguments.

45:16 So basically, that's, again, where you do os.system or subprocess.popen, and you basically pass it all the arguments.

45:23 So instead of loading this nice pickle file and you continuing with your day, basically, you load a pickle file and it just runs commands on your local machine.

45:31 Yeah, that's not amazing.

45:33 So, again, one of these sort of don't do this.

45:37 It's extra hard to check, too, because the thing you're delivered is a binary blob.

45:42 So how do you know whether it's safe or not?

45:44 You can't even look at it.

45:45 It's not even like the YAML file where you're like, wait, wait, wait, what is that double, double, double exclamation mark Python thing?

45:50 That doesn't look right.

45:51 Right.

45:52 This is just, you know, one, zero, one, zero, zero, one.

45:54 Seems okay.

45:55 Yeah.

45:56 So there's not really a workaround for this other than don't use pickle or don't try and unpickle things from an untrusted source.

46:04 But then I don't even like saying that because it's that kind of assumption which attackers use for using attacks, combining them together.

46:13 But Anthony knows a lot more about that than I do.

46:15 Yeah, it's chaining them, right?

46:16 It's like you lose one to get one layer in and then another and then another.

46:20 Yeah.

46:20 Ant's got a very good point.

46:22 The way a lot of attackers compromise systems is they won't go after the top tier, the most important, the most visible ones.

46:30 They'll try to look at maybe the second or third tier systems, ones that people may not have paid as much attention to or made as much effort securing.

46:40 And if they can compromise those, they will use those as a foothold to go after things that people perceive as important.

46:48 So if I compromise an app and it's communicating via pickles to other apps, for example, that can be how the exploit occurs.

46:57 So it's good to keep in mind that every system you write has to consider security, as Ant is saying, and follow Ant's advice on looking at these 10 topics.

47:06 Yeah, I mean, maybe this is some wimpy little app that's just like some reporting thing, but you could use it to get behind the firewall or maybe you could pickle some bad stuff into a shared Redis cache that then the other system might read.

47:18 And then, you know, you just level up step by step, right?

47:21 The so-called lateral movement.

47:22 It is being recorded, you're right.

47:23 Exactly, exactly.

47:24 And look, three, four, five hops for the compromises is not unusual.

47:28 And when you see it going, you realize how good some of these attackers actually are.

47:33 But I digress.

47:34 That's right.

47:36 That's right.

47:36 So I guess the last two kind of blend together a little bit in about not patching your Python runtime, sort of the main one, what you get if you just go to the open a shell and just type Python 3 or something, or your dependencies as well.

47:51 And I heard that you don't really need to patch your Python runtime because there's zero days that sometimes appear in like your OS.

47:58 So you can just like, this is not worth bothering with, right?

48:00 What was that crazy comment was about?

48:04 I can't remember.

48:04 There was a comment on those lines.

48:07 They were like, oh, but why do we bother?

48:09 Why do we have to patch Python?

48:10 Because they're probably just going to hack us anyway.

48:11 So what's the point?

48:12 If only.

48:13 If only it worked that way.

48:14 I think that's, I mean, we touched on that at the beginning a little bit.

48:17 And that is sort of a sense of despair that people have.

48:20 You're like, look, I could do all these things that I'm supposed to do.

48:23 But soon as there's some sort of zero day or O day in my system, you know, all bets are off.

48:28 I'm not going to be able to defend against that.

48:31 I don't know what zero day there is in, you know, NGINX unit.

48:36 And so forget it.

48:37 I'm just not going to worry about security.

48:39 And I can totally understand how people feel that way.

48:41 But there was some interesting analysis and reports saying that even state sponsored actors of hackers and stuff primarily use super boring stuff, like all the things we've already been talking about, right?

48:54 Like SQL injection and other boring unpatched things.

48:57 I mean, zero days, yes, they happen.

48:59 But as a proportion of total security incidents, they're actually relatively small, particularly for the well-known, well-trusted packages and products that are out there.

49:11 I think this is one of those cases where people need to get their own house in order before they throw stones, if you know what I mean.

49:17 Yeah, yeah.

49:19 But I mean, once again, I mean, most of this stuff is relatively simple once you understand what you need to do.

49:25 So it's just a question of putting time and effort into it.

49:28 That's all.

49:28 I think there's a difference between the NSA or some other country is trying to literally get into your thing or I'm on the Internet and people are rattling the doors to see if they're open.

49:39 And they're not going to spend a full week trying to get into my system because they know there's thousands that just have SQL injection or some other random thing, right?

49:47 Yeah, and look, to be honest with you, if the NSA really, really wants to get into your network, the version of the Python package you're using is probably the least of your problems at the moment.

49:56 So...

49:58 Yes, exactly.

49:59 Exactly.

50:00 Plenty scary, but we're not going to think too much about that, I guess.

50:03 So what you're telling me is I need to install Python 3.7 because it just came out.

50:08 Things like that.

50:09 Yeah, I'd actually say, this is probably controversial, but I wouldn't install Python 3.7 until 3.7.0.1 comes out.

50:17 Generally, I'd...

50:19 Yeah, yeah, sure.

50:19 But maybe, Pat, like if there were like 3.6.6 or 3.6.7, whatever the next one there is.

50:26 3.6.6, yeah.

50:27 I'd go with 3.6.6.

50:28 But it's part of that staying on top of things.

50:30 I wish PyUp actually had a check to say your Python version is 3.6.6 and it should be 3.6.7 or something like that.

50:38 There is another tool, which is open source as well, called InSpec.

50:41 And it's made by the same people that brought you Chef.

50:46 So it's built in Ruby, but it does actually have a bunch of hooks for Python applications.

50:51 And I know that it does do virtual environments and also checking versions of packages in production.

50:58 So you can say, make sure that none of our servers in production have this version of Python.

51:03 Make sure that they don't have these versions of these packages as well.

51:06 And you can do that in InSpec and basically declaratively saying, here's our security rules.

51:11 And it will go and scan through your installations for you or your service for you and actually give you back a report.

51:17 So InSpec, I guess, would be the way to do it or a tool like that.

51:21 Yeah, that makes a lot of sense.

51:22 All right.

51:23 Well, I think hopefully people learned a lot from those 10 items.

51:28 I mean, there's certainly a couple of them that I had heard of, but others were new to me.

51:32 And it's always good to be aware of these things.

51:34 So thanks for putting that together and both of you for sharing it here.

51:37 So Anthony, how do I sell this to my, you know, I work at a startup.

51:42 The management just got VC funding.

51:45 We're supposed to do growth, growth, growth.

51:47 We're trying to figure out new features so that we get this thing to get out.

51:51 How do we get ourselves to worry about this?

51:53 So, well, the first thing is to understand any legal or contractual obligations that you have.

51:58 So if you're in Europe, for example, you have to follow the GDPR, the new general data protection

52:04 regulation that's there.

52:05 If you're handling credit cards, for example, you've got PCI DSS, the payment card industry

52:11 data security standard.

52:12 These are the non-negotiable things you have to do.

52:15 The second thing is if you have any security executives or leaders in your organization,

52:21 a CISO or if they have corporate policy, security policies need to follow.

52:25 They're two things.

52:27 But beyond that, one is just look at the newspaper.

52:29 Reputational and financial damage is real.

52:33 It happens.

52:34 We've all seen or heard of companies that have been hacked and thinking, gee, I wish I weren't

52:39 them.

52:40 I wish I was not them.

52:41 So don't.

52:42 Don't be them.

52:44 Exactly.

52:46 Well, and also I think it comes to do with culture a little bit, right?

52:49 Like if the culture is do not change this application, do not touch it, because if you break it, you're

52:56 going to take it down some part of our business and we're going, it's going to be so bad that

53:00 that thing ossifies.

53:01 And it's basically like, no, it's kryptonite.

53:04 Nobody wants to touch it.

53:05 I'm going to stay away from that.

53:06 Yeah, it probably is got some security issues, but I'm not touching it because I will be

53:10 punished to no end if I'm the one who breaks it.

53:12 I can see that happening a lot.

53:14 Yeah, it is.

53:14 And this comes down to your general culture around, as Ant said, test coverage, refactoring.

53:20 If you have stuff that you can't touch, it's going to cause you security problems.

53:25 If it's stuff that you can touch, you have a fighting chance of solving them.

53:29 So that comes down to, once again, to the term that uses risk management here.

53:33 Is your management happy to accept the risk that you're going to have an app that will

53:37 eventually cause you a problem?

53:38 Right.

53:39 It's sort of the consequence of failure there.

53:40 I mean, look at WannaCry and Windows XP on a lot of these things.

53:45 There's so much damage from that, right?

53:48 Yes.

53:48 And particularly if you're in a small shop, if you're in a startup, reputation is big early

53:54 on.

53:55 So you need to make sure that you can maintain that trust that you're going to need from your

53:59 first few customers.

54:00 Yeah.

54:01 And you probably have an advantage in the sense that you don't have big data centers with legacy

54:06 machines laying around.

54:07 You're probably on the cloud using modern tools because you just started.

54:10 All right.

54:11 Chances of keeping that upgrade train running is probably easier.

54:14 Exactly.

54:15 Exactly.

54:15 It's all greenfield.

54:16 You don't have to keep older stuff running.

54:18 Exactly.

54:19 Exactly.

54:20 Yeah.

54:20 All right.

54:21 We're getting pretty short on time and I'll be respectful of both of your times, but maybe

54:25 you could tell us really quick some places to learn more about security.

54:28 Yeah.

54:29 I guess one, two projects I want to call out.

54:31 So there's a number of projects with the same kind of name that's related to goats.

54:36 I'm not sure where the goat name came from originally, but basically there's a project called

54:41 Web Goat, which was originally done as a how not to do things, open source project to basically

54:47 include all of the top 10 most common vulnerabilities.

54:52 And I can't remember what language it was written in, probably Pearl, but there's a Python version

54:56 called PyGoat and there's a Django specific one called Djangoat, which I guess we'll have

55:02 in the show notes.

55:05 But if you actually look at that, it's got all the Djangoat ones really cool.

55:09 It's got all the stuff you should never do with Django projects like leaving debug enabled,

55:13 for example.

55:14 But basically how to do the, all the kind of vulnerabilities we've talked about as well

55:19 as the other ones in a project.

55:21 So it's kind of like a how not to do things project.

55:23 And I think it's good if you want to learn how things are done.

55:27 And also it has instructions about how to break it, how to use those vulnerabilities.

55:31 So I would start there.

55:33 And then if you know them, then you can start doing things like doing scans through your source

55:39 code for those particular types of vulnerabilities or those functions and seeing if you can find

55:44 bugs in other places.

55:45 Yeah, that makes a lot of sense.

55:46 I really like this goat series.

55:49 That's pretty interesting.

55:50 One thing I want to give a quick shout out to that's in your neighborhood and is a podcast

55:55 called Risky Business.

55:56 Have you heard of it?

55:56 I listen every week.

55:57 Yeah, I love it.

55:58 It's fabulous, man.

55:59 Those guys do such a good job.

56:00 It's like headline news in security and tech programming security.

56:05 It's really great.

56:06 And just listening to that, you learn a lot sort of by osmosis, I think.

56:10 All right.

56:11 Well, I think just we're out of time.

56:13 We could go on and on and maybe we'll have to do another episode on this.

56:16 But that was really helpful and insightful.

56:18 So thank you both.

56:19 Since there's two of you, I'm going to ask two questions at the end, but the same one.

56:23 We'll just do the notable PyPI package.

56:26 So is there a package, maybe related to security that people maybe haven't heard of,

56:30 but you want to throw out there to let people know about it?

56:33 Anthony, you want to go first?

56:34 I had a couple, but if one of those things that I think rather than a package, I think people need to take a step back and realize that security is a mindset here in that there's no package that's going to make your app secure.

56:48 So I'm going to twist the question a little bit here.

56:51 It's about keeping abreast of what's going on and keeping current and continuing to learn.

56:57 And it's an area that can be fascinating, can be really interesting.

57:00 Yeah.

57:01 Excellent.

57:01 And?

57:02 Yeah.

57:02 This one hasn't actually been released yet, but Brian and I are working on a new pytest plugin called pytest Requests.

57:08 And by the time this episode goes to air, it'll be out and it'll be wonderful.

57:12 So I'd recommend you check it out.

57:13 Oh, that sounds really fun.

57:14 That's awesome.

57:15 And also, I want to give a shout out to your Python 3.7 course.

57:18 You just did a new course on Python 3.7.

57:21 I thought it was great.

57:22 Thanks.

57:22 Yeah, that just came out just in time for the actual release of 3.7, which is good.

57:26 Yeah, you beat the window by like a week or something.

57:29 Maybe just tell people super quickly where they can get it and what it's about.

57:32 So the course is on Pluralsight.

57:34 If you have access to Pluralsight or you want to sign up for a trial, you should be able to watch the course.

57:38 It's less than an hour.

57:40 And I basically go through all the major new features in 3.7, how to use them, how they work, how to configure them, as well as some of the other benefits.

57:49 So it'll take an hour of your time and you'll be a Python 3.7 expert by the end of it.

57:54 But you should wait till Python 3.7.1 to actually get going, maybe?

57:58 Probably, but that'll be weeks away, I reckon.

58:00 I reckon, yeah.

58:01 I already installed 3.7, but I have 3.6 on my machine just in case I need it.

58:04 All right, awesome.

58:05 Well, yeah, I enjoyed it.

58:06 So good to let people know about that.

58:08 All right, both of you, final call to action.

58:10 People, maybe they're thinking about the software that they have.

58:13 What's the first thing to do to start to address this?

58:16 Anthony?

58:16 The first thing is to take a step back and think, well, what is the most important things that my app is protecting?

58:22 What are the most important business processes or data that's there?

58:25 And to think, how can I protect it?

58:28 How should I protect it?

58:29 Is what I'm doing what I need to do?

58:32 All right, excellent.

58:32 Anthony, anything to throw in on that?

58:34 Yeah, I'd say just don't be intimidated by security in general and just start simple and be willing to learn and also be willing to learn from people who have a lot more experience, of which there are plenty of people out there.

58:46 Yeah, it's definitely a different way of thinking and it's pretty fascinating.

58:50 All right, well, thank you both for being on the show and sharing all this stuff.

58:53 It was great.

58:53 Thanks, Mike.

58:54 Thank you, Mike.

58:54 Great to be here.

58:55 Yeah, bye-bye.

58:55 This has been another episode of Talk Python To Me.

58:59 Our guests on this episode have been Anthony Shaw and Anthony Langsworth.

59:03 And this episode has been brought to you by the Test and Code podcast as well as Talk Python Training.

59:09 Keep up with the latest news and become a better developer by listening to Python Bytes at pythonbytes.fm and Test and Code at testandcode.com.

59:17 Want to level up your Python?

59:19 If you're just getting started, try my Python jumpstart by building 10 apps or our brand new 100 days of code in Python.

59:26 And if you're interested in more than one course, be sure to check out the Everything Bundle.

59:30 It's like a subscription that never expires.

59:32 Be sure to subscribe to the show.

59:34 Open your favorite podcatcher and search for Python.

59:36 We should be right at the top.

59:37 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.

59:47 This is your host, Michael Kennedy.

59:49 Thanks so much for listening.

59:50 I really appreciate it.

59:51 Now get out there and write some Python code.

59:53 I'll see you next time.