« Return to show page
Transcript for Episode #37:
Python Cybersecurity and Penetration Testing
How secure is your application? Do you know the main vulnerabilities that most apps suffer from? How would you even start answer these questions? On this episode of Talk Python To Me, Justin Seitz is here to tell us all about it. This is episode number 37, recorded December 2nd 2015.
Now, before I play the theme music, I have a little something special for you. This week only, instead of Developers Developers Developers, we have "Secrets from the future" by M.C. Front Alot. A great song about futility of computer security over time. You can catch the full song at the end of the episode.
Welcome to Talk Python to Me. A weekly podcast on Python- the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy, Follow me on twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the show on twitter via @talkpython.
This episode is brought to you by Hired and Codeship. Thank them for supporting the show on twitter via @hired_hq and @codeship
Hi everyone. Thanks for listening today. Let me introduce Justin.
Justin Seitz is a respected cyber security expert who has trained and consulted with Fortune 500s, law enforcement agencies, and governments around the world. He is the author of two Python books that were translated into 7 languages. He has helped teach tens of thousands of people how to write code to automate computer hacking and OSINT tasks. In October 2014, he presented a unique method for tracking ISIS supporters on Twitter.
2:07 Justin, welcome to the show.
2:07 Thank you very much for having me.
2:11 Yeah, I'm pretty excited to talk about this whole world of computer security and breaking software and understanding more of vulnerabilities are in your software. So I'm just happy you are on the show to talk about that.
2:21 That's great, normally you have builders and now you have a breaker, so that's awesome.
2:25 Yeah, that's right, normally we have the builders on here but I think it's super important to see that side of the story, right, like if you build a website and you put it out there how do you- I kind of feel like it's safe- is it safe, I don't know, like you should understand. You know what the people who are trying to break into your system are, how that even happens right, so that I think is going to be really valuable to builders in addition to everyone else.
2:54 Cool, so we are going to talk a lot about that, but let's get started with where you got into programming in Python, what's your story?
3:00 So, how I got programming in Python was good buddy of mine Dave Falon, I'll never forget him peering over my shoulder when we worked together at the startup at one point, and I was doing everything in PHP and he kind of said you know, "it's really lame that you are using PHP to do all this stuff, you should really look into Python". So I did, and I'm one of those old dogs new tricks kind of guys, I was like, oh man, I'm not truth to be told not the strongest developer I had the pleasure of working in a couple of different companies with some really top notch developers who just kind of blew my mind on a daily basis and I knew that I was never going to be like that, but I found with Python that I kind of went from zero to actually knowing what I was doing very quick, and kind of around this time as I was spending time in kind of hacker forms and reverse engineering forms and stuff, you know, it was kind of strange, but Python seemed to almost become the defacto language for people to start using in the hacking community, so between Dave kind of talked me into learning it and kind of the hacking community beginning to adopt it as really as the language we are all going to kind of standardize ourselves on for the most part, that was really what kickstarted my journey into Python coding.
4:23 I think that's a way a lot of people get started in Python is it's kind of the easy path to get started but unlike a lot of other easy paths it doesn't seem to have a real strong 4:33 right, like you can build rich high end systems but you can also get started easily, that's kind of unique to this whole ecosystem, right.
4:43 Yeah, I totally agree, I mean I've seen some of the most, the craziest systems built completely in pure Python and I've seen some of the most beautifully simple scripts that do amazing stuff that are ten lines long, which is great because I think ten years ago there was always the argument of you know performance and compiled languages versus things like .Net when it was kind of going through its renaissance period and I think we are at the point where we are kind of like you know, unless you are processing billions of transactions a second which I bet you there are Python installations out there that are doing that, we are ok, everybody has kind of accepted that there is many ways to skin these cats and Python is just a great way to literally go from zero to sixty very very quickly.
5:28 Yeah, I definitely agree. So, that's kind of how you got into Python, that's really interesting, but you took a sort of different path right, you got into sort of analyzing systems and checking them for vulnerabilities and offensive security and all that kind of stuff, that's a pretty different path than, "I'm going to start building a website and charge to build people's homepages" or whatever, right; tell me the story there?
5:52 Sure, yeah. So I actually did spend a period of time being a web developer. Again, hence why I was into PHP, but you know, the big thing for me was that I was at the startup that was amazingly good, a fantastic engineering team, that kind of looked at talent and said, "you know, you are good at this particular job, do you want to do it?" And for me, I got into quality assurance and totally by accident, I was originally hired on there to fix printers believe it or not, but this was one of these really progressive kind of funky startups and very quickly I was leading the QA team which was very small and soon it turned out that I was really good at breaking software.
6:34 Now, I had spent a number of years kind of in and out of kind of the hacking scene, and doing research on my own, but never really took it really seriously, never really took it like something that I wanted to do as a career, I didn't even know that it was actually a career a the time. So as I got further and further along in this QA stuff they realized that we should actually get just in spending all of our time breaking stuff, because I seemed to have this kind of weird ability to find the bugs that nobody else would find, and to also because I was into reverse engineering that I could assist the development staff and tracking down particularly nasty bugs that they couldn't figure out other ways.
7:15 So I basically eventually became just a breaker so they brought in someone to run the QA team and I was able to step aside and just simply focus on that. And around this time, probably in 2006, 2007 I became more and more active on reverse engineering forms and started sharing code and kind of networking with people. It was around this time that I also decided, "hey I think I actually want to write a book" because I was writing some tools in Python specifically for reverse engineering.
7:48 And then, Immunity, where I spent 7 years sponsored the competition I believe in 2007 that I was writing a plugin for what was called immunity debugger which is a debugger specifically designed for reverse engineering primarily gear towards exploit development. So I ended up writing a plugin for that of course in Python and I won that competition and shorty after that the Immunity hired me on in 2008, and from that point forward, I was doing all kinds of development work so their products were all written in Python.
8:23 I was working on penetration testing product there, called "Canvas" and I am also doing a lot of consulting and other work, and that's kind of what carried me down that path, so I've been very fortunate that I've had a number of employers that kind of allowed me a bit of free rein and allowed me to kind of chase the stuff that I found interesting so I've been really fortunate over the past ten or fifteen years to have that.
8:51 It's really great when you get to pursue what you are super interested in, right, it's almost like you get paid to be on vacation or to do hobby or something, right?
8:59 Yeah, absolutely. Absolutely.
9:02 Yeah, it's great. So you talked about your books- the first one you wrote was "Grey Hat Python" is that right?
9:06 That's correct, yeah.
9:08 Yeah, so can you tell us kind of what topics you covered in there, and what's the story of that book?
9:14 So, "Grey Hat Python" was definitely more heavily geared towards lower level reverse engineering and exploit development and also looking at building tools to assist you in identifying vulnerabilities. So in the security world a lot of us employ a technique called fuzzing, which just basically means generating random or semi random inputs for a piece of software to process, so if you think of traditional server written in C that kind of takes 9:43 and dissects this proprietary protocol.
9:47 What we would do is we would write fuzzers that would basically try to break how that protocol is parsed buy that software, in the hopes that we would find vulnerability. So "Grey Hat Python" kind of takes you through how to build some tools to assist on the back end which means trapping bugs or using it in automated kind of debugging system to track bugs all the way up to building the fuzzers building some of the other tools to help you find bugs. So it was definitely more a low level book but it leverage Python all the way through to build tools to assist you.
10:23 Oh that's really cool. So is that like looking for buffer over flows and SQL injection attacks and things like that? Or other stuff as well?
10:32 Yeah, exactly, so I men, ten years ago and still somewhat today but things have changed to bit. Ten years ago we were definitely looking for memory corruption bugs which would be buffer overflows, heap overflows and you know, there was many other bugs, but you are right, we also most of us in the community that are writing tools we are building stuff too that is looking for SQL injection bugs or looking for cross site scripting vulnerabilities, so much the same that we would be focused on fuzzing software we also build tools that would fuzz web applications as well.
11:06 I suspect a lot of the listeners know what buffer overflows are, and what SQL injection vulnerabilities are, but maybe there is probably a decent number of people who don't- could you maybe just talk about those two term, those are probably the two big super bad problems you can introduce into your code, right?
11:22 Sure. So, a buffer overflow is really where you are kind of saving more data into a spot in memory than it can handle, so if you think of a string in memory that is we can treat it like a bucket, so this bucket can hold a maximum of 50 letters or if you want it to treat it like water, it could be 50 liters of water. So typically, what you want to do when you are a programmer and you are using a language like C is that you want to ensure that you can never have even 51 liters of water or 51 letters in that bucket.
11:58 So, what happens in a buffer overflow situation is that we are able t literally kind of overflow the bucket and depending on how we overflow that bucket we can actually then control how your program executes from there. So it's very common vulnerability, but some of it is definitely starting to go away because things like Visual Studio the tool chains are starting to build in protections in an attempt to deal with those programming flaws and they are also trying to prevent you from using functions like strcpy or mancopy in unsafe ways.
12:38 So we are starting to getting away from it but that's kind of the general feeling or general explanation of how a buffer overflow looks. Now, for a SQL injection vulnerability, we are not so much concerned with kind of shoveling too much data in, but if you have ever written SQL code in like PHP application or even in python and you concatenate strings together for example so you have your select statement and you say, where Id = then you have your quote, and plus and then some piece of input from the user.
13:12 Now, what we can do is we can substitute in a quote or single quote or potentially other characters that can actually allow us to control how that SQL statement is executed. So by injecting our own SQL that means that we could potentially extract data maybe you are doing select against the product's database but when we send in our injection code if we are successful in getting it in, potentially we could then begin mapping all of the tables in the database or we can begin extracting data not from the product's table but from the user's table where we could grab user names and passwords, or in some cases you can even begin executing commands directly on the operating system straight from that little SQL injection vulnerability.
13:57 Yeah, that might be like the text box for your password.
14:03 That's the command line to the remote box, right, which is less good, when it's used that way I think.
14:11 Yeah, that's right. And I think you know it all boils down to as either just input sanitization problems, right, so again, there is a lot of platforms are starting to get better and toolchains are getting better at forcing programmers to write code in the certain way, and then on top of it you know, the number of frameworks are trying to make it so that these kind of class of vulnerabilities are going to 14:33
14:35 Yeah, that's really nice that the systems and the compilers are taking care of it somewhat that helps, right. As well the ORMs, right, so like SQLAlchemy or other high level ORMs that don't accept string SQL definitely help mitigate that. Have you Googled, have you seen the XKCD Exploits of a Mom Little Bobby Tables?
15:02 Oh yes.
15:04 For those of you who don't know what a SQL injection attack is make sure you take the time to Google for "little bobby tables" and you'll get the XKCD Exploits of a Mom. I'll put it in the link in the show notes, I won't say anymore, I'll let you check it out.
15:18 That's great, yeah.
15:20 Did you really name your son that- yes. So, I mentioned the two vulnerabilities that are like well known to me because I you know, take account for them when I write web apps and stuff but, what else is out there that are sort of on that scale that we should be aware of as developers to like just know that we should make sure we don't do that?
17:02 Now, again these are all things that if you Google for like the OWASP top 10 these are all things you are going to be looking for. But typically, in my experience as someone who spent a lot of time hacking into systems, a lot of our big wins where we were able to really compromise the applications didn't necessarily involve some of these class attacks it might be something as simple as not validating that a user account should have access to a particular set of data.
17:31 So if you and I both use the same system and I am user ID 1 and you are used ID 2 and there is a set of documents and the system that you assigned maybe the first ten documents and I am assigned last ten in a lot of cases what we found is that they are not properly checking and validating that I should only be allowed to access particular documents so now I am able to access all of the sensitive information that you are in some cases just by incrementing one number by walking through all of the various document IDs.
18:05 So is this an architectural flow- yes, is this an input sanitization flow which are the most common or previously most common- no. So it's a bit more 18:14 because you as a developer as you are paying attention to escaping all the input and double checking your SQL queries and all that stuff, some of these more architectural flows are a little bit more subtle.
18:28 Yeah, so interesting. So for example, if I've got relational database with a primary key integer and auto incrementing for all of my resources and my web app, and I have a user account, it's very likely I can enumerate all of that type of data so I might be /user/271, well it looks like I can just try a bunch of number between 1 and 10 000 and look for users and see what I can see about them, right, or documents or whatever, yeah?
18:59 Absolutely. And you know, it sounds completely simple but it has worked in a number of cases. So you know, this is where again things like using good so very big long unique numbers that are randomized are really helpful because then it becomes very difficult for me the attacker to begin enumerating GUIs because they are tremendously big. It's not just a simple integer so when you are passing information around the web app in your user ID 1 you should really reference that user by GUI that's really big and unique, because it makes a tough for an attacker to do some of those enumeration techniques.
19:42 Yeah, that's great advice.
This episode is brought to you by Hired. Hired is a two-sided, curated marketplace that connects the world's knowledge workers to the best opportunities.
Each offer you receive has salary and equity presented right up front and you can view the offers to accept or reject them before you even talk to the company. Typically, candidates receive 5 or more offers in just the first week and there are no obligations, ever.
Sounds pretty awesome, doesn't it? Well did I mention the signing bonus? Everyone who accepts a job from Hired gets a $2,000 signing bonus. And, as Talk Python listeners, it get's way sweeter! Use the link hired.com/talkpythontome and Hired will double the signing bonus to $4,000!
Opportunity is knocking, visit hired.com/talkpythontome and answer the call.
20:44 Ok, so what else was in the "Gray Hat Python"?
20:57 So that was basically- we've kind of run the gamut for Gray Hat Python and it was really heavily focused on the reverse engineering and exploit writing stuff.
21:09 So that sounds like it's focused on kind of the apocation level? But there is the whole sort of the infrastructure, the way apps are put together, you know, the network, those types of things that maybe you didn't talk about in that book, right?
21:25 That's right, so I didn't talk a whole lot of that in that book but that's where I decided to write the second book which was "Black Hat Python" which is more traditional penetration test view of writing tools. So getting people to write tools that interact on the network so just fundamentally understanding how you write a client and server in Python is actually going to help you understand how to write tools to do network attacks. So I teach people how to do that, and then I also teach them how to use some more powerful libraries in Python like 21:56 that allows you to execute more complex attacks and allows you to do things like pet sniffing, it allows you to kind of analyze some of the data you capture in tools like wireshark.
22:09 I also spend time teaching people how to write tools to attack web applications so whether that's unique kind of root forcers or using something like burp suite which is a popular web application hacking tool that a lot of people use so I teach them how to write pluging for burp suite and then later on in the book I start to move into more and more authencive 22:34 techniques, so I teach people how to write a Trojan or a virus that leverages GitHub for command and control so that means that this virus doesn't actually communicate to you it communicates only to GitHub which in most corporate environments will bypass all of the firewalls because most corporate environments allow people to go to GitHub-
22:56 Right, GitHub is fine, it's HTTP, it's outbound, how could that be wrong?
23:01 Exactly, well it's actually HTTPS which is even better because then a lot of the inline any virus products are blind when it's an SSL connection, so they can actually inspect any of the traffic that is going by, so have this HTTPS this encrypted session to GitHub and then basically this Trojan is designed to retrieve its commands from GitHub, also it will do if the Trojan does not have a library say like win 32, you can push that library to a GitHub repo and your Trojan will try to import it and actually hook into the import mechanism so that it reaches out to GitHub for all of its import that it can't resolve locally, so it'll retrieve them over the network and import them that way and then after it executes the task like say it takes the screenshot of the target system, it then actually re-uploads the results back to GitHub repo.
23:54 So techniques like that which I really wanted to show people that #1 writing these tools in Python is amazingly simple and when you sit back and realize you just wrote a Trojan that bypasses pretty much every firewall in any virus product out there, in like 100 lines of Python or less, it's pretty neat. But also as a way to help people understand from the network perspective how simple it is to attackers to write tools like this and how we need to get better at detecting them. So I start to get more offensive there and then kind of the tail end of the book is where I teach people which is happening more and more commonly where attackers are trying to get into host systems that host a number of virtual machines.
24:43 So I've seen people who are kind of paranoid so they only will perform like their web browsing inside the virtual machine, right. And so in the last part of the book I teach you how to use a forensics framework called "Volatility" that is pure Python, how to use this forensics framework to actually analyze the ram for running virtual machine and then inject code into it so that we can compromise the virtual machine, which would allow us to then kind of climb inside it and see what the user is up to inside of that machine. So it covers a kind of a wide sweeping range from the network to web applications to Trojans and kind of offensive forensics, but it's also a very short book, so I give you the code, I give you the explanation and the why as to what we are doing and there is really no fluff outside of that it's really about developing that Python muscle memory.
25:41 Yeah, so that has scared me, to use my computer. But I think it was really interesting, some of the stuff that you did in that book I think is really neat, like for example you talk about if you understand how to use raw sockets in Python that will take you a really long way, right?
26:00 Yeah, absolutely. And again, in that module by learning how to use raw sockets and for example learning how to take something that comes off of raw socket and turn it into an actual IP structure like you would have done in C 20 years ago you are learning a ton of great concepts, you are learning about the network, you are learning how to use C types to create structures and memory, and you are learning about some of the more fundamental pieces of networking which is how packets are actually build from the ground up. And you are learning it in this really easy way, like it is really accessible, it's not like C or C++ which I still don't understand why people like code in it.
26:43 Yeah, it's definitely accessible, right, like a lot of the code samples are like 20 lines of Python.
26:49 That's right. Yeah and it's really- again, I really want people to be able to write in it and then sit back and say, "ok, what if I did this?" and just go out and start doing it. So give them the fundamentals give them the capability but don't lead them down the entire path, I really like when people email me and say, "yo, I took the example in chapter 3 and I did this with it, what do you think?" That means that people appreciate that style of writing.
27:19 Yeah, and it's really great. You talked a little bit about the malware type of stuff. You said you had some experience actually taking Python to like understand some piece of malware. So like, suppose I find some suspicious file on my computer- what can I do to understand whether that is just some random binary or if it severe problem.
27:43 So there is a number of tools and frameworks out there and again, you know, things like I mentioned previously, volatility is very quickly becoming one of the big tools that forensic and malware people use to examine what is a piece of malware doing to your machine, and what other facts is it leaving behind and what is it modifying inside the memory of your machine, which is really critical. But there is a number of other things that you can do, for example a lot of most modern malware is looking at how to defend itself against you so it doesn't particularly want anybody to reverse engineer it because if it can guard itself then it prevents people from developing defenses against it.
28:30 So a number of years ago actually myself and a guy by the name of Neil the Hippy Killer built a framework called "Muffy", it was a Python framework that ran inside of immunity debugger and it was designed to actually completely remove the protections or a number of protections that malware would have in place that would prevent you from analyzing it. So this is all an automated inscriptable framework built on top of the immunity debugger that it would for example a lot of malware wants to know "am I being debugged?" So am I currently being run under debugger and so our framework would actually reach into the malware and begin to undo those checks. And it had multiple ways of doing that.
29:15 Another thing that malware would do for example is that it will walk the list of running processes on the system, looking for any virus products, looking for debugging products and so what Muffy would do is again, it would go in there and basically start removing things from the list or it could actually patch out the malware's ability to check for those processes. So aside from some of those big ones, and again, primarily I didn't spend most of my career being a malware analyst I do some now, but the big thing to me was that with all of these tools like debuggers and even things as Idle 29:59 pro having Python built in, it allows you to kind of if you are seeing the same thing in malware sample after malware sample after malware sample instead of spending five hours on doing some protection every time you spend 5 hours once writing code to automatically do it for you. And then that's fixed for you, kind of for life, you can kind of deploy that code whenever you need it and Python is wonderful for that.
30:26 So you built up like a set of libraries that perform these functions, take down the debugger defenses, take down the anti virus protection, and just chain on together and go after uncloak 30:36 it so then you can understand it yeah?
30:41 Yeah, that is exactly it. And then, there are another cases too, where you might be analyzing a piece of malware that implements some very simple like x or encryption and maybe it's got some special little routine that it does. So lots of times what we'll do is we are always dealing in assembly code so we'll look at the assembly and say ok they have this decryption function here, that's got maybe 10 or 20 assembly instructions and you convert that directly into Python and we can then begin executing any string or any piece of data that comes across that network, we can begin to actually processing it directly in Python rather than letting the malware have to run through the decryption the routine itself.
31:23 It's been a long time since I have had some kind of virus or malware that I know of, on any of my machines, but last time I remember that I did have one, the way I found out was very bizarre, I had a firewall like- what was it called- one of the original firewalls you can put on windows XP and-
31:45 Zone alarm or something?
31:45 Yes, thank you, Zone alarm. And I rebooted my computer at work and it said no pat31:50 to act to the server on your network, and I was, "oh that can't be good". And it looks like no 32:00 but you can bet that it wasn't right. A lo of our computers at this office were letting no 32:08 as a server, it was not good. So my question was you know, there were anti viruses we installed, and they said oh we removed the problem. If something like this happens do you think it's ever safe to use your computer again or does it just require like a format straight away.
32:26 I don't know, it's really tough to say. You know, the amazing thing about the security community is that it always seems like every year we want a one up ourself, so you know, it used to be, yeah you get an infection just remove it and then people are like, I don't know, you know actually they figured out how to persist in the bios so whatever and then it;s like ok well maybe let's format, well format actually doesn't solve the whole bios problem. Ok maybe it's format and reflash the bios, and then guys started infecting the hard drive control of it. So they are actually on the chip that controls the hard drive, how do you get rid of that? So it's one of those things that I think depending on the strain, and when I say strain I mean really what that means is that most virus products are looking at the hash and they are saying, "hey this is bad" so if you get infected by a known kind of variant and you have a good idea of and in most cases you can just go read the report of what that malware actually does, if there has never been evidence of that malware actually downloads and installs a rootkit or some other low level tool that I think yeah, a full kind of hard drive format is going to do the trick for you , but in some cases that is not going to be enough. It's one of those things I don't remember the last time I personally have been infected with something but I'm on OSX and one of my good friends Russel Nolan just did a great presentation on OSX malware and how he kind of hunted it using kind of big datasets and Python oddly enough using pandas and so some of the stuff that Russel- and you can check that over at the- it was at the conference call counter measure so you can check the talks will be posted. Some of the stuff that Russel was finding was pretty impressive stuff that they are writing for OSX as well.
34:30 Yes, so what you tell them is that even format of the computer is not enough, I need to smash it.
34:36 Yeah I would totally smash it in the back yard, turn the hose on and go and buy a new one.
34:44 It gets expensive as possible for yourself because then it will totally make you like way more vigilant in the future.
34:52 That's right, the next time I am definitely not opening that document with the cat videos.
34:57 Yeah, that was from me.
This episode is brought to you by Codeship. Codeship has launched organizations, create teams, set permissions for specific team members and improved collaboration in your continuous delivery workflow. Maintains centralized control over your organization's projects and teams with Codeship's new organization's plan.
And as Talk Python listeners, you can save 20% off any premium plan for the next 3 months. Just use the code TALKPYTHON.
Check them out at codeship.com and tell them "thanks" for supporting the show on Twitter where they are at @codeship.
35:49 So, another thing that you are into is something that you said was called Open Source Intelligence. And I am guessing this is not like GPL license intelligence?
35:58 No, that is right. So Open Source Intelligence is kind of like, it's a general term for gathering information from Open Sources so non classified sources, not involving you know spies on the ground and not involving satellites in space but what can we gather form sources like the news, social media, even things like mobile applications, what kind of intelligence can we gather in general?
36:27 So that's kind of something that in the security community you use it all the time because when you are modeling a particular target for penetration test you want to learn everything there is to know about that target, and especially when it comes to social engineering and fishing attacks, being able to perform open source intelligence for example if I want to do attack you I would want to figure out where is your Facebook page, where is your Twitter page, what do you have on Linkedin, can I find out information about your hobbies, your kids, all these stuff..
37:00 And then basically I would model you as a target and I am going to watch for things that seem to kind of emotionally register with you so that when I write you an email or I send you a twitter direct message or you know, I'm communicating with you in some way that includes a link meaning I want you to click on this link, that I am communicating to you in a way that you are going to definitely click on that link. So Open Source Intelligence plays a huge role in that. Among other areas.
37:30 Sure. So make it feel familiar and then it is much more likely to get that first step into the whole social side of things, right?
37:38 Yeah, that's right, and I mean that's this specific use case for OSINT for the security in the community but it's really used in a whole bunch of other ways you know, if there is a riot in the city police forces are using OSINT to take a look at what is going on what are they talking about, are there people gathering in the particular location, same thing when we had the Paris attacks here a couple of weeks ago, a lot of it is Open Source information you can go to bellingcat.com for example and they have a detailed analyses on one of the Paris attackers and the information they have found out about him only through Open Source means for example. So it is kind of this amazing hammer that you can hit many different nails with.
38:22 Interesting. And speaking of nails, you said you had actually used this technique to find extreme ISIS supporters.
38:29 On Twitter, yes, that's right, so, last year I did a presentation at a conference where I used Python because I can't really program in much else to be honest, so I used Python-
38:42 Why would you want to?
38:42 Yeah, why would you want to right? What I did was I was looking at how to identify ISIS supporters on Twitter and so this was kind of before, I've been doing some of these stuff and some of this research on the side for a number of years probably long before it was kind of vogue, there was lots of people doing it now, but basically I was kind of the question I had was how do I do this when I can't speak or read Arabic, right. This is a big deal, because as you know this is a terrorist group that has people from all walks of life, speak all kinds of different languages.
39:21 Text analyses has always kind of seen like been sentiment analyses to go with it, like that is kind of the sexy thing people do, when they are analyzing Twitter network. And for me, what I did instead was I said, "you know what actually, I think images are the way to go because images don't require language, right?" So what I set out to do was use Python along with Open CV which is a computer vision platform with Python bindings and I built a classifier that would detect that black flag of ISIS.
39:53 So it was quite common for people who supported ISIS or were actually part of the group to use that black flag in the profile picture on Twitter or to use it in imagery like propaganda videos for example, not uncommon when you have a video of some Syrian army tank blowing up that you see the black flag in the top right hand corner of the video. So this classifier's job is just to find that black flag.
40:21 So then on top of it I wrote Python to interact with the Twitter API so what this thing would do is basically I would just point it anywhere and part of it as well was asking the question of like the six degrees of Kevin Bacon so I wanted to know how far away the nearest terrorist was in my social network, so I literally just pointed this tool at my Twitter account and it just basically went through all of my friends and followers looking for the black flag and then it went through all of their friends and followers and then as you can see this kind of grows out exponentially until it started finding that black flag in propaganda or in profile pictures.
41:01 And so actually this work really well for me because in a very short period of time I was able to build up a data base with 2 or 3000 extremist's accounts. Now, the trick was that this was actually semi automatically because if you have ever used Open CV before to do kind of image detection or just kind of logo detection stuff, if you are not a computer vision expert, which I definitely am not, you are going to run into kind of this high rate of false positives. So, there were cases where it would pick up a black cat and say, hey that's an ISIS supporter.
41:39 It could have been an evil cat.
41:41 Yeah, it could have totally been an evil cat. So what I did was I actually used Python to solve the semi automatic problem too. So after it was done crawling everything let's say it had you know a few thousand images and there was maybe a few hundred that might be kind of garbage so what I wanted to do is to filter through them really quickly by hand. So I used WX Python and I wrote a little game.
42:04 And all this game did was it would pull in all of the images from this directory where I stored them and then I could hit space bar if it was an ISIS supporter, enter if it was not. So very quickly I could cycle through all the images very quickly kind of playing duck duck goose, and amazingly enough, it sounds like a lot, like really like oh man, you did this with thousands of images, and I'm like yeah, but it took like ten minutes. Because you very quickly you know it becomes this very quick game that you play and it is very fast to cycle through all of them.
42:34 So I use Python to kind of help me deal with that. Now, you know, any computer vision experts who are listening to this, they already have like their head in their hands like oh man, I can't believe you did that, but it worked for me and it was fast. And then, you know kind of on top of that, the tail end of my presentation is really is about how again, using Python to push all of this data into a elastic search and then just because it's the elastic search bindings for Python are beautiful, it's like one line of code you can take a dictionary and shovel in it to a database you know, like that is for those of us who have been around the block long enough, that was one of the most eye opening amazing things I have ever seen.
43:18 Like you import this thing and you do es.index and like literally you are done, there is no scheme or a design, there is nothing else you had to do, so I thought it was just amazingly wonderful elastic search. And so it was actually a friend of mine Chris who had said you've got to check on elastic search it is totally easy to get data into, not so easy to get data out of, which was totally true. But then I was able to do some interesting stuff where I could look at you know, the geo tagging of tweets and I could see where the concentrations of supporters were, and I could begin to do analyses like hey what was the most popular cellphone they use to tweet with for example. It was a great use of Python and Open Source Intelligence and it was really well received.
44:06 Yeah, it sounds really, really interesting. I am sure it was. What was the number, your index, like your Kevin Bacon number?
44:13 It was really low.
44:14 I'm sure.
44:15 It was like 3 I believe, 3 or less than 3. Now, I actually- but it was kind of a biased sample because I follow a number of counter terrorism researchers and the number of terrorists like to follow counter terrorism researchers so they know what they are saying, right?
44:32 Of course, of course. It's a little self selecting but still, right.
44:35 It was, but it's actually, it was shocking because I did pick other accounts and it was very- I didn't know what the answer was going to be which is always the exciting thing about research when you actually set out to- and you truly had no idea what the answer is going to be, but it was very low, it was always like 3 or sub 3 anywhere I ran it so that was kind of interesting to me.
44:59 That is interesting. It doesn't really surprise anybody. It doesn't make you feel warm and fuzzy either I suppose?
45:05 No, not really.
45:09 So, you have a cool course on sort of automating open source intelligence and kind of taking people through a lot of the techniques you are kind of employing there right?
47:46 Yeah, it's really cool, and that's a like an asynchronous type of course, you sign up and you can take it from anywhere online, more or less, right?
47:56 Yeah, that is right. And it's just driven by videos and then written material and code samples and then you have skill test where I give you to go and solve problems with Python, and then you have to submit them to me for grading and then once a month I run student sessions where I hop online with whatever students can make it I hop online for an hour and then I usually try to tech something that is not in the course, so last month actually I taught people how to connect Python to the Tor network so that you can actually scrape web pages inside of Tor for example.
48:31 That's really cool. I'll be sure to put a link to your course in the show notes.
48:36 Awesome, thank you.
48:37 Yeah, you bet. So we have time for a few more questions. Let's see, so, you must have over the years seen a lot of crazy stuff. What's the most unusual or entertaining thing that you kind of run across in this whole space?
48:53 That is a very good question. So I saw when I was doing some of this ISIS research I found a Twitter account who actually showed up initially as an extremist and he was actually a satirist, but he would literally write some of the most like convincing kind of tweets and he would take for example images that ISIS would use to kind of instill fear and then he would make them kind of hilarious. And so I have found this account, and as I am reading through there is like these Jihadists who are not very happy with him, they are like trying to get him kicked off of twitter but twitter won't really kick him off and they are like threatening him and he is kind of responding back with like pictures of goats and other stuff, you know.
49:36 So I thought it was great, like I thought this person has got guts, and is completely counteracting their message. I mean, nobody was really paying attention to his account which is unfortunate, I think if we had more people paying attention to that guy's account that we did paying attention to the ISIS guys, we would be winning. But it was really hilarious because this guy was like a never ending source of entertainment for me that I could go back and check on him.
50:02 It seems like a really nice breath of fresh air with all that sort of negativity out there, to just turn it around and like here, we put a cat picture on top of your tank or something.
50:14 Yeah, exactly. It's pretty funny.
50:16 Yeah, how funny. One thing I wanted to ask you about because as a programmer I have one view of the world, and I know a lot of non programmers, so I see their view, but from a computer security type person, you may have a different perspective and that sort of like computer hacking in sort of cyber security in the popular media?
50:39 Right. You are already laughing. Yeah, I'm just thinking of you know some quote like, "I'm going to write a WB script that is going to track down the IP address..." what are you saying, right?
50:51 I mean that's the thing, right, you look at the original kind of hackers movie, you know, sneakers was probably more realistic than people give it credit for, more so than a lot of other stuff, for the most part like in popular media it's pretty much 99% of it is garbage and then within the last year we had the Mister Robot Series come out, which was a completely game changer and you know, it's they really fundamentally get what it's about and part of that is actually they have a guy on their staff his name is Michael Bazzell, he is a very popular guy in the open Source Intelligence world and he is kind of the main technical guy behind it so he id the one who is driving a lot of the kind of technical and hacker type stuff, and I can personally say that Michael is a very smart guy he knows what he is talking about.
51:44 And so this is the whole key to me, it's that having someone like that who is like, you know what, we are not going to put a bunch of bs with like 3D cubes and whatever people hacking on touch screen and like whatever, virtual reality, because that's not how hackers work. It' like mundane and it's through the terminals you know, for the most part. So I think that finally, that for me was I was like, "oh finally, somebody is actually covering this properly". But I can tell you that most hackers you would not want to look over the shoulder while they work because it really is like it is mind numbingly mundane stuff picking through thousands of lines of code looking for a bug. You can do that for two weeks before you hit that one place in the code that you know, "oh man, right there is exactly what I'm looking for". And then it gets exciting. But it can totally be the most mundane work ever and that's just not good TV show.
52:48 No it's not. I think you are totally right about Mr Robot, I love that series and I think I have just the final episodes to watch still, I'll put the trailer in the show notes so that people can check it out. They are talking about Tor, VPNs, there is Linux, there is the command line, the previous show I just had the PyCharm guys on, there is like segments of the show where they are working in the PyCharm, like this is a really good show, it's obviously fiction and it's on the outer edge of believable fiction but at the same time, it's not based in like funky 3D cubes that like mean nothing, right?
53:24 Yeah, exactly.
53:27 Very cool, very cool. One other quick question in the sort of non fictional space but kind of popular culture- there have been it seems like increasingly many security breaches target, home depot, just one after another. Are things becoming less secure, more secure, what is your like general feeling when you are out on the internet- fear or generally ok?
53:55 I really don't- I'm not that full of fear that’s for sure, but I used to joke when I have to do like press interviews for like ok, you know, it's December actually this time of year would be perfect, because they would call us up and say, "what's your predictions for 2016?" And I would say whatever happened in 2015 is going to just happen again. Maybe bigger maybe smaller, just copy whatever I told you last year and just use it again. And sadly, that's really where we are at, right, like whether it's target, whether it's Ashley Madison or whatever it is. Securing your data is an incredibly difficult thing to do. And so for me, I was always breaking stuff not necessarily fixing and defending stuff, and the defenders have an incredibly difficult job.
54:43 So, for me, I don't think things are getting better, or worse, I think there are parts of the underlying security infrastructure that are getting better, I think there are parts of the philosophy of security that are getting worse. For your own device for example, BYOD is one of the perfect examples of the worst idea ever and never ever let anybody do it, but people are still doing it, oh you want to bring your laptop from home and connect it to the corporate network what's the worst that could happen? So to me it's like there is these opposing forces at times where we are getting better on the technology front I think but the philosophy front I think we have ways to go but again, it's very tough, I mean there is going to be no shortages of database dumps in 2016 like we saw in 2015. I don't think that is going to change.
55:32 That's really great answer, thanks. I have two questions before you get out of here. The first one is if you are going to write some Python code, what editor do you open up?
55:43 Hands down, Wing Ide, I have been using it for- I don't even know how many years, a long time. Same with all of my students, when you sign up for one of my courses you get Wing Ide pro as part of the course, I standardize all of my videos on it, everything I do is in Wing Ide and anytime someone asks me what should I use, 100% Wing Ide. The big thing for me is the debugging capability is just out of this world, I love it, they have a great team there, they have an accessible support stuff, I don't even remember actually last time I had to file a ticket with them. So hands down it's Wing Ide , that being said, I know you had the PyCharm guys on here, people speak very highly of PyCharm but for me the inertia to try a different IDE when I need to be really productive every day, it's just too much for me to have to give it a fair shake, but I hear a lots of good stuff about it.
56:40 I've used Wing Ide a little bit, not a lot but I'm definitely a fan of the IDE side of the story so yeah, I'd like to hear that, cool.
56:48 Final question, what's your favorite PyPi package or library out there?
56:53 Oh man, ok, I mean Request is probably the one I use the most which is just awesome, but the other day I found a library called datutil, and maybe the entire internet knows about datutil already but datutil allows you to like feed it any kind of date string like any format and it basically gives you back a day time object, which is amazing, you don't have to use formats strings, you don't have to use any crazy conversions or strings slitting to clean it up, it just does it.
57:28 That's awesome. Yeah, I hate working with dates, like in pretty much any language it always seems to be painful and so that sounds really cool, I'm going to check it out. Awesome. Justin, this has been a fascinating look inside of a world that most of us don't really look at that often. So, thank you for sharing the story.
57:51 Thank you very much for having me on, this has been great.
57:55 Yeah, you bet. And I'll make sure all the cool stuff we talked about is in the show notes. So, talk to you later, thanks again.
58:00 Fantastic. Thanks Michael.
This has been another episode of Talk Python To Me.
Today's guest was Justin Seitz and this episode has been sponsored by Hired and Digital Ocean. Thank you guys for supporting the show!
Hired wants to help you find your next big thing. Visit hired.com/talkpythontome to get 5 or more offers with salary and equity right up front and a special listener signing bonus of $4,000 USD.
Codeship wants you to ALWAYS KEEP SHIPPING. Check them out at codeship.com and thank them on twitter via @codeship. Don't forget the discount code for listeners, it's easy: TALKPYTHON
You can find the links from the show at talkpython.fm/episodes/show/37
Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes and direct RSS feeds in the footer on the website.
This week's theme music was "Secrets from the future" by M.C. Front Alot. He has at least 4 excellent albums in the genre he invented called nerdcore. Check him out at frontalot.com. His song zero day is also a perfect match for this episode.
So thanks for listening. Here's the full song, "Secrets from the future". Enjoy and see you next time!