Monitor performance issues & errors in your code

#128: Pythonic Networks with NAPALM Transcript

Recorded on Monday, Jul 17, 2017.

00:00 Michael Kennedy: When you think of networks, you probably think of physical things, routers, switches, firewalls, and that kind of stuff. But increasingly, network engineers are managing massive networks that are better managed via software than by admin applications. On this episode, you'll meet David Barroso, who created NAPALM, a vendor-neutral cross platform open source project that provides a unified API to network devices. This is Talk Python To Me, Episode 128, recorded July 17, 2017. Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython. This episode is brought to you by Linode and Rollbar. That's right, welcome to Linode who has joined Talk Python To Me as a major sponsor. Be sure to check out what both of them are offering during their segments. It really helps support the show. David, welcome to Talk Python.

01:16 David Barroso: Thank you for having me here.

01:17 Michael Kennedy: It's been a long time coming that we've talked about Python and all sorts of things but no networking really, right? And so, you're here to help us fix that problem and talk about Python and network programming.

01:28 David Barroso: Yeah, here I am.

01:31 Michael Kennedy: Yeah, it's going to be great. We're going to talk about your project, NAPALM. You're doing a bunch of exciting stuff there. It seems like it's still going strong. It's been around for a few years. But before we get into all that, let's start with your story. How'd you get in to programming, networking, Python, that sort of stuff?

01:46 David Barroso: My background is network engineering, but I haven't always been a network engineer. I started like many, many years ago. More like systems engineer, so just dealing with Linux, Windows, Apache, Nginx, whatever. Then I was just doing anything, Bash, Perl, Python, whatever you had to use to do the job. Then at some point, I had to move some data I had on a Wikipage into something that I could actually query. So I was just investigating what the best tools for that would be. I came into some page that was actually talking about Django and I thought that that was kind of like the perfect fit. I could actually say that Django was what really introduced me into Python and drove me to learn more about it. Then just like regular evolution, just career path, I just landed more into the networking side of things which made me more focused on the network stuff rather than on systems or actual Python or something, but Python has been always there just trying to help me get the job done somehow.

03:00 Michael Kennedy: I see, that's really cool. So you've done a lot of work with these systems engineering, with configuring services like Nginx and things like that, and just like Python has always played an important role in sort of making that work, huh?

03:11 David Barroso: Yeah, I remember using CFEngine like more than a decade ago and yeah, doing a lot of Perl. Like back in the day was mostly Perl. But luckily, it's not that much Perl nowadays, it's Python.

03:23 Michael Kennedy: Yeah, I definitely think there's less Perl in the world than there used to be, that's good. What are you doing day to day? You're doing network stuff for your main job, right?

03:32 David Barroso: Yes. I'm a network systems engineer at Fastly. I spend most of my day both automating the network and building a control plane where we can actually integrate with applications. The network is not only automated, but it's somehow directed by replication. Replication can say like, oh yeah, I want to send this packet via this path and I want to send this other packet via this other path. I'm working on these two areas.

04:01 Michael Kennedy: That's pretty major stuff. I'm not sure everyone knows what Fastly is, so maybe give them the quick elevator pitch.

04:07 David Barroso: We're a CDN. The cool thing about Fastly is that we expose all of our services and configuration by an API, so you can actually integrate with us. It's not like the traditional CDN where you just send your objects and they cache them until you can send a new object and then they cache them as well. I want the CDN to be able to respond in this way if the HTTP headers are like this, and you can actually do that in real time just by configuring it via the API. That's kind of, if you are a web developer, you should check it out.

04:47 Michael Kennedy: Yeah, that sounds really cool. You guys probably handle a little bit of network traffic, huh?

04:51 David Barroso: Yeah, yeah, we do a lot. I don't recall the numbers exactly, but I think it's about five million requests per second and a few terabytes of bandwidth. So yeah, it's quite a lot of traffic.

05:07 Michael Kennedy: Yeah, that's crazy. You think about a number of request per second for like a RESTful API or something like that, that might just be a couple of bytes response of JSON or something, but typically CDNs serve images and videos and files, right? Like that's a crazy amount of traffic.

05:24 David Barroso: Yeah, we do all kinds of traffic like video, music, images, JavaScript, even API. We have a few APIs behind us and that's because of the integration that I was mentioning before. But that's not really my eye of expertise.

05:39 Michael Kennedy: No, no, but it's all built upon a network, which is super cool. Traditionally, people may be configure networks by logging into like a Cisco router or switch or something like that and maybe using the CLI there or using some tool. But working with networks these days is becoming more and more a programming type of experience, right?

06:03 David Barroso: Yeah, I think that the network is following the same evolution that the system world experienced like 10, 15 years ago. Like before you would just edit your apache.conf on the virtual host files and all that kind of beyond. And CFEngine came along, Puppet. The network is kind of now having that transition nowadays. Like people was used to go through the console, SSH or maybe some tools used to, use SNMP to try to configure certain things. But yeah, nowadays it's tools like Puppet, Ansible, SaltStack, StackStore that are starting to come to the network as well, which is interesting and cool.

06:47 Michael Kennedy: Yeah, that's great. That sounds really exciting. How is that affecting the whole ecosystem of network engineers? Are people excited about this change? Are they like, ugh, my experience is no longer relevant, it's not fun anymore? What's the general feel, do you think?

07:05 David Barroso: You will find both. Some people, it's like, oh yeah, this sounds cool, a different way of doing things. I can just automate this and focus on more interesting stuff. While other people is probably more scared because I don't know, maybe they are older engineer that they don't really have the time to pick up the new tools or the new programming languages or they might be a bit more averse to the change. I think yeah, you will find all sort of people having all sorts of opinions.

07:36 Michael Kennedy: Sure, of course. One of the terms that I hear frequently around this space is software defined networks or SDNs. What's a SDN?

07:45 David Barroso: That's mostly marketing. Vendors keeping pitching SDN, but SDN doesn't really mean anything. The original term was coined by some people at Stanford that were working with OpenFlow, which was like a super low level API to try to program this feature, but that didn't translate well into hardware. So it never took off. OVS in the Linux stack uses OpenFlow for example plus some things, but that's software based, which is fine because you don't have the hardware limitations to install all those flows. But yeah, in the hardware world, the real SDN never took off because of hardware limitation, and now it's just used as a marketing term by many, many vendors.

08:33 Michael Kennedy: Okay, of course we have software defined networks. There is this whole level of automation though. There is the ability to run scripts against Cisco routers or various other types of routers. That's kind of where NAPALM came in, right? Previously there was, for each vendor, each type of device you're going to work with, there was like a totally unique way of working with it, right?

08:58 David Barroso: Yeah, like each vendor has its own API. If you go, you log in into a Nagios device by Cisco, for example, and just start typing commands. The commands are going to be completely different from a Juniper device, for example. The CLIs, they're completely different. They return different things to different commands. They may have even different interfaces to actually connect to them programmatically. Like for example, IOS until very recently didn't have anything so you would use things like Netmiko which is a Python library that works on top of Paramiko. It's basically just SSH. Like Juniper just have NETCONF, well, we still have NETCONF, which is actually a network protocol to configure devices. Some other devices may have like a REST API. Like there are just many ways. When you have to do something as simple as I want to connect to my network and I want to retrieve the IP addresses that you have configured on the interface, you have to start just like adding a lot of boilerplate like okay, if device equals to IOS, do this. If device equals to this other vendor, just do this other thing. So yeah, it used to be kind of a nightmare. That's pretty much what I tried to solve with NAPALM.

10:22 Michael Kennedy: Okay, so NAPALM is an acronym, right? What does it stand for?

10:25 David Barroso: Acronym is actually, it has actually an interesting story. It was reverse engineered. I first came up with the acronym and then I tried to find some meaning to it. Before, it was all these if-else statement that was mentioned. Like if vendor, this, if vendor, that. I was writing some script, I was super pissed. Like I mean, this is just stupid. This is just insane. I want just to light this on fire. The name came out of that and then I started just thinking, okay, now let's try to make some meaning out of it. So I came up with this network automation and program ability abstraction layer with multivendor support. Yeah, the name was actually reverse engineered.

11:09 Michael Kennedy: Yeah, that's pretty cool. You have the important letters. You got the N, the A, the P, and then you got to get creative about the A, L, M, right?

11:18 David Barroso: Yeah. I wanted something with fire basically because that's actually my feeling at that moment.

11:24 Michael Kennedy: You're like, all right, were got to burn this down, this is like messed up, right?

11:27 David Barroso: Yeah.

11:28 Michael Kennedy: I've heard that your co-creator of this library, was her name Elsa? I might be remembering it but...

11:34 David Barroso: Elisa Jasinska.

11:35 Michael Kennedy: Elisa, that's right, yes. You guys started this together, right?

11:39 David Barroso: Yeah. I started working on this when I was at Spotify and then I was just talking with her after a conference. She was like, oh, that's super cool. I actually have exactly the same problem. So then I just went back to my mind and like, okay, yeah, I've been talking with people and it turns out that there's some interest on collaborating on this. So I just open sourced it and started working with her. Then other people came in. Now, it's actually a pretty big community right now.

12:07 Michael Kennedy: Yeah, it's going really strong now. I had heard there were even some places where people were using more or less like screen scraping to get configuration out of some of these tools and some of these systems 'cause it was just so hard to do, right?

12:21 David Barroso: Yeah, there is a lot of screen scraping in devices like IOS for example by Cisco. Those devices traditionally don't have any interface to interact with them, so it was always like, yeah, SSH to the device and then just type this command, look for the prompt, now let's try to parse this thing, and there is still a lot of that to be honest. We do a lot of screen scraping with NAPALM because if they don't provide the tools, we have to fix it ourselves, but at least we fix it. When we are trying to solve the problem, like meaning, what you are trying to solve at your job, you don't have to care about parsing because we also fixed that problem.

13:01 Michael Kennedy: Right, so maybe something like that happens deep down when you call a function for a particular device but nobody knows and nobody cares.

13:09 David Barroso: Yeah, I hope they care, so they connect their vendors or the vendors will return some JSON, XML, I don't care, just some structured data.

13:16 Michael Kennedy: A lot of what you guys do with NAPALM is not write these vendor specific communications per se, right? You might find libraries that are out there and just integrate them into NAPALM, right?

13:29 David Barroso: Yeah, what we try to do is just provide an abstraction layer, which means that if there is a library already that can talk with the device, we'll just use it. Like for example, for Juniper, they maintain a library to work with those devices, so we just use that one. That means that we only have to care about transforming data, for example, and providing like common behaviors. Like, this method behaves in the same way regardless if it's Juniper or Cisco. When you run this other method, the data that you return, it comes back normalized. We don't really bother in dealing with the transport and connecting to the device. Someone else is caring about that, we just provide the abstraction.

14:12 Michael Kennedy: Yeah, that's really cool. So you basically just do like an adapter to make it all look the same, the various libraries that are out there. What are some of the major libraries, Python libraries that people or that maybe you're built upon?

14:23 David Barroso: The familiar ones, I would say that it's the Juniper library I mentioned which is called PyEZ, that's Py E-Z. The other one is Netmiko. That's quite popular actually. It's built on Paramiko and it's specifically to be able to interact with network devices. Because each one has different idiosyncrasies. One has this, I don't know, like the new line, it's using this code which is super old from the early Unix days. This other one changes the prompts, so you have to take into account. So yeah, it builds on top of Paramiko but it knows how to, the tiny details of each platform.

15:06 Michael Kennedy: Yeah, that sounds like something you don't want to write yourself, that you would just like to use.

15:11 David Barroso: Yeah.

15:13 Michael Kennedy: Before we get into how it works, what hardware vendors do you support? The list is getting long, right? It used to be pretty short at the beginning.

15:20 David Barroso: Yeah, the original one was just core vendor because that's actually the ones that I and Elisa have in our networks, but nowadays it's, so from Cisco, it's IOS, IOS-XR and NX-OS, because why have only one operating system when you can have three?

15:38 Michael Kennedy: Right, of course, why not?

15:41 David Barroso: It's Juniper. It's Arista. It's Fortinet. It's, who else? Mikrotik. There's people working on Brocade as well. There is Palo Alto as well. There are probably a few more, but I don't recall them right now.

15:58 Michael Kennedy: Yeah, of course. It's pretty comprehensive these days, huh?

16:01 David Barroso: Yeah, it's quite extensive and people keeps on working on having more and more.

16:05 Michael Kennedy: One of the really important building blocks was these libraries that we talked about, the actual communication with the various devices. The other one is Ansible and you actually added SaltStack and StackStorm as well.

16:19 David Barroso: Yeah, we try to provide just the library. We try not to have opinions on how people should be doing things. We try to just provide the basic Python library so people can either integrate with their own framework or just write their own script. Then three major tools that we integrate with are Ansible, SaltStack, and StackStorm.

16:43 Michael Kennedy: That's really cool. Let's just pick Ansible for example, but it would be similar to others, can you maybe describe to me how we would use NAPALM and Ansible? Let's have some kind of goal. Let's suppose you want to set up a load balancer, a couple of web frontends, and maybe a database server and some caching tiers and try to build that network altogether so that only the right pieces can see each other, and things like that. How would that work?

17:15 David Barroso: The way it would work is that you would have template for your services, right? But now with Ansible, you would have another template to map how this new service maps into the network as well. That may be like a new VLAN, a new IP address somewhere else. You would just write another template as you did for Nginx and, I don't know, MySQL or whatever you got to use in your network. When you just compile this template with the data and you get like an actual output, the only thing you have to do is tell NAPALM to use that configuration file and apply it into the device. Then you can do two things. Either you apply it straight into the device or you just get a diff that you can actually peer review or something like that.

18:01 Michael Kennedy: That's a pretty nice experience that you can go to your network and reach out to all the devices. You'd set up your Ansible scripts and one for the load balancer, one for the web tier maybe, things like that, and you'd set them up, and you could say, go query them the way they are now. Figure out the changes that you would push to them and then generate a text diff of what we're about to do to it, right?

18:25 David Barroso: Yeah, I mean, the only difference between managing your web server and your network with Ansible and NAPALM is going to be that instead of reloading the service, you're going to be applying the configuration with NAPALM into the device. But the rest should be exactly the same workflow. Just the template model, the data coming from your backend or YAML file.

18:47 Michael Kennedy: Yeah, it's really nice. That probably makes it pretty easy to store these in like GitHub or somewhere like that and have like a history of the changes that you've applied to your devices.

18:57 David Barroso: Yeah, that's pretty much what everybody is doing. Just how it does any other template in on your GitHub or whatever you are using.

19:05 Michael Kennedy: One thing you guys talked about is you push the entire configuration for the device over to it, but somehow the device knows to actually only apply the delta. How does that work?

19:17 David Barroso: Yeah, you can do both things. You can either apply just as needed for configuration. Like if you just want to configure a VLAN, you can choose to configure that single VLAN or you can apply the entire configuration. The problem with applying snippets of configuration is that the way that devices work, you don't have a single configuration file where you just like apply it on the device and then reload the service, and only what's on that file is what's going to be applied. Actually, on a network device, you actually tell the network device how to do things. It would be like going to a Linux box to start configuring the network using ip route commands, ip route at this interface or ip route at this routing here. You're actually telling the device what to do and how to do it. If you're merging snippets of configuration, you have to be aware that if you want to remove a VLAN, for example, you have to tell it remove this VLAN. It's not good enough to just send the list of VLANs that you want. That's why I'm trying to, this is my kind of like a personal opinion. I like instead just somehow compiling the entire configuration of the device, send it to a device, and tell the device, now reload the service. Which is exactly how I would manage Nginx and MySQL or something. Just like, here's my config file, now reload. I don't care what you had before, just do this.

20:40 Michael Kennedy: All right. I want you to be like this, now, be that way.

20:42 David Barroso: Yeah. That works well with most devices, but this feature for NAPALM to use it has to be supported natively. Most of the devices nowadays like IOS supports it. Arista supports it, Juniper. The major vendors support this feature so you can actually send the entire config file and reload the service. It's a great cool code on the reload the service.

21:07 Michael Kennedy: Yeah, yeah. This portion of Talk Python To Me is brought to you by Linode. Are you looking for bulletproof hosting that is fast, simple, and incredibly affordable? Look past that bookstore and check out Linode at talkpython.fm/linode. That's L-I-N-O-D-E. Plan starts at just $5 a month for a dedicated server with a gig of RAM. They have 10 data centers across the globe, so no matter where you are, there's a data center near you. Whether you want to run your Python web app, host a private Git server or even a file server, you'll get native SSDs on all the machines, a 40 gigabit network, 24/7 friendly support even on holidays, and a seven-day money back guarantee. Want a dedicated server for free for the next four months? Use the coupon code python17 at talkpython.fm/linode. One of the consequences of the way NAPALM works is that if somebody goes and manually starts changing the network settings or something, right? They SSH in there and they make a few changes and then just log out and they don't save that anywhere, right? NAPALM will basically wipe that away, won't it?

22:17 David Barroso: If you're just in the disk replace operation that I just described, yeah, that's exactly how it would work. It's just like, yeah, I just created this VLAN by hand and you didn't save it in your template or in your database or something, yeah, it's going to be completely wiped out. If you're just in the other method where you are just applying snippets of configuration, then unless you're explicitly removing the configuration, NAPALM won't remove it because NAPALM won't even know that the configuration is in there.

22:48 Michael Kennedy: Right, unless that snippet has exactly to do with that part that was changed, they might not never interact, right?

22:53 David Barroso: Yeah. Some people are combining both manual with automated operation, so they prefer to use this method where you just tell the device, apply this snippet of configuration. So, it depends a bit on what you are trying to accomplish and what your network looks like. For some people, the network is so full of snowflakes that actually automating every tiny bit is literally impossible.

23:19 Michael Kennedy: Yeah, of course. I can imagine places like that. It never was really built all at once. It just kind of grew this way.

23:27 David Barroso: Yeah, organic design.

23:29 Michael Kennedy: Yes, of course. Is there a way to go and reverse? Like, I know I could come up with a bunch of Ansible scripts and I could generate a network out of it, but is there a way to go to a network and say, now generate me the set of Ansible scripts that would reverse engineer it so that you could sort of start from an existing network, come up with these scripts, and then going forward, go in reverse, use NAPALM to work with it.

23:55 David Barroso: What I had done to migrate from this snowflake environment to a fully automated environment is I just copy the configuration file, I mean, I just copy the configuration of the device and just put it into a static file. I just deploy that static file with NAPALM. The only thing I'm doing is just override the configuration that is already there. The configuration is just completely static. Then I start just taking bits of the configuration that is just defined statically into templates. That way I can just start migrating bit by bit, like okay, now I have automated the VLAN. So I can just remove the VLAN from the static configuration file and use a template for that. The only thing I have to do is append one file into the other with Ansible. Now, I'm going to automate like the interface generation. So I just take that away and automate that bit. You can start doing this step by step and just keeping your configuration statically defined somewhere else.

24:54 Michael Kennedy: Yeah, that's a pretty good idea. Just copy it over and go, this is how it is, and then we're going to pull out the little pieces that change every now and then that we want to automate, and everything else, we could just forget it, leave it. That makes a lot of sense. NAPALM is written entirely in Python, right? If I can trust GitHub's little measure of what languages are involved in this project, it's like all Python.

25:17 David Barroso: Yeah, it's all Python.

25:18 Michael Kennedy: Yeah, that's really cool. You don't have too many performance considerations or anything where you might bring in Cython or other types of things, right? It's more just communicate over the network to the device, and the device has to do the heavy lifting?

25:31 David Barroso: Yeah, maybe you're talking about something that you're interacting with, I mean, you're interacting with the device over the network and you're probably using some API or method to interact with the device which is not extremely fast. It's not like you're doing mathematical computation. I don't think that performance is a concern. No one has brought that up at least.

25:59 Michael Kennedy: Yeah, and if it is, it's probably not your performance issue that you can work with, right? It's like, this device is just slow when I do this to it.

26:06 David Barroso: Yeah, I mean, most of the devices like you type, I don't know, show interface or something, maybe the device as like 100 interfaces, so the device takes like one or two seconds to actually return all the data. It's like, we could optimize 10 milliseconds out of two seconds. So, okay.

26:22 Michael Kennedy: Yeah, of course. Yeah, I guess the real check would be to make sure you're using the latest APIs and libraries from the various vendors. Like if you were previously doing some weird screen scraping thing and now they have a JSON API, switch to that, right?

26:36 David Barroso: Yeah, that's actually the main challenge we have. Trying to use the latest APIs, usually they don't go that well because users could have like networks running operating system from the, I don't know, early 2000. I mean, I have seen a lot of people with, yeah, my device has 10 years up time. Like, maybe you should count that on security advisories instead of in years.

27:03 Michael Kennedy: That's really interesting. I'm paranoid about security issues and things like that on my servers and on things like Nginx and stuff. If there's an update, like that thing is, even if it causes a reboot, it's getting applied because it's on the internet, that's such a bad thing. To think that device is up for 10 years, that's probably need some attention.

27:29 David Barroso: Well, the thing is that for most legacy networks, reloading a device is super hard because you might bring down a lot of systems, because everything uses the network.

27:41 Michael Kennedy: It's the very foundation. It's like taking away memory or the CPU or something, basically.

27:46 David Barroso: Exactly. Most of the networks are actually completely isolated from the internet, so the only way of actually trying to exploit like some bug would be by first getting inside your network by just hacking the VPN or the bastion that you might have in your network. There's certain degree of isolation in the network as well, so upgrading is not usually.

28:10 Michael Kennedy: It's not as critical as like a front end web server that's taking traffic off the internet.

28:14 David Barroso: Yeah.

28:15 Michael Kennedy: It's one thing to pass the traffic along, it's another to be the endpoint to like send it into your executing code. Okay, interesting. One thing I wanted to ask you about is there's a lot of cloud computing and other sort of programmable automatable systems like virtual private servers, things like Linode or DigitalOcean, and now we've got EC2 and we've got Azure and Heroku, how does this whole network automation story fit with that? Is there any place for say NAPALM in AWS or is this really I have my own data center, I want to manage that or I would go do something with AWS?

28:55 David Barroso: The main focus of NAPALM is actually like physical infrastructures. We don't really have any hooks with any of the other cloud companies, like none of them. Yeah, we mostly operate on switches, routers, firewalls, all these kind of things.

29:12 Michael Kennedy: Yeah, you rarely have that low level of access in any of those places.

29:17 David Barroso: Yeah, I mean, there could be like maybe Yarn. I'm not a user myself of the public cloud, so I'm not sure how they work, but there might be some room for like, I don't know, maybe integrating with how, with their firewalls and load balancers that, yeah, I don't know. I haven't heard of anyone even considering something like this.

29:39 Michael Kennedy: Sure, there are APIs for the load balancers and APIs for the firewalls, but, yeah, I'm not sure. I'm not sure it makes sense, but maybe it makes sense somehow, right? But maybe not the same. Maybe say, I want to connect these three machines and I want to have this in a virtual private network and what not, like, that possibly could be a thing, but I'm not sure if it really makes sense to mix these tools.

30:00 David Barroso: I will say if someone comes up with something after hearing this podcast.

30:06 Michael Kennedy: Yeah, exactly. Maybe you'll get a pull request. What about Docker? If I'm using Docker or like I've got a bunch of them running together, managed like by Kubernetes, but say it's on my own network.

30:17 David Barroso: If it's on your own network, I'm pretty sure that you could somehow orchestrate configuring the network somehow with NAPALM. I don't know the details of Kubernetes but I know of people doing exactly that with OpenStack. So, their provisioning system just uses NAPALM to orchestrate the physical network with the rest of the, I don't know how it's called, Neutron, I think it is, the network plugin for OpenStack.

30:46 Michael Kennedy: Interesting. What's the story between NAPALM and OpenStack? OpenStack being like...

30:51 David Barroso: I don't know the details, to be honest. I know that someone was working on just, like, they have their, they're using OpenStack to just deploy their virtual infrastructure, and one of the pieces is actually configuring the network. Someone had just written a plugin which I don't know if it was just in production actually, to just like configuring like for example, if I'm deploying this service, I need to configure this BGP session in here or this VLAN in there. So it was just a matter of doing that with NAPALM.

31:25 Michael Kennedy: There probably is some opportunities for some kind of integration there, but I don't do enough with OpenStack to honestly...

31:32 David Barroso: I'm with you there.

31:34 Michael Kennedy: All right, so can you maybe give us like, the general workflow of working with NAPALM? We started, break down the devices into the different services, right? Like, here's a firewall, here's a web server, things like that, right?

31:49 David Barroso: It depends on what you're doing, actually, because we try to just provide the API as generic as possible so you can write your own workflow. We just set up methods, for example. One set of methods just interact with the device configuration. So you can say like, okay, I want to deploy this configuration or I want to replace the existing configuration with this one. This is the method that was mentioned before where you just wipe everything out. Now I have this configuration loaded. I want to either discard it, commit it, or just get updates back. These are just basic primitive, so you can actually build your own workflow. You can just integrate it with, I don't know, with Jenkins for example to state the change and get a diff back, open a pull request and have someone validate it, like for example, That's a common workflow. The other set of methods is getting data out of the net. Like, I want to get the interfaces of my devices or I want to get the BGP neighbors on my network, stuff like that. You can actually have into your Jenkins job like after the change has been made. Just get the data out and verify that the interface that you're configuring is actually configured as you want it. Yeah, we don't really have built-in workflow, but we try to just have the basic primitive so people can write their own workflows and just integrate us as they want.

33:19 Michael Kennedy: Yeah, it's pretty agnostic to the way that the vendors want you to work. You can come up with your own workflow and because it'll talk to all the devices the same, that more or less should work, right?

33:30 David Barroso: Yeah. For example, I like more the workflow of just generating the entire config and send it to the device because that way I know that what I told the device it's going to be in there, rather than having to tell the device how to do things. Like, yeah, I don't know how to do. Just get this done. But other people prefer more like another sort of operation where they have like a self service portal and they just click somewhere like, okay, create this, delete that. Yeah, it depends on your preference, I guess.

34:00 Michael Kennedy: Yeah, and probably how you're using your network and how often it changes and what not. You mentioned Jenkins. What's the story with continuous integration? What are you seeing people do with NAPALM and CI?

34:12 David Barroso: I would say that the common thing that people does with CI and NAPALM is mostly automated deployment. Like for example, I just changed some data on a YAML file. That triggers Jenkins jobs that just runs an Ansible Playbook, for example. Then Ansible Playbook connects to the device, loads the new candidate configuration and retrieves back a diff. The diff just gets, it's posted into some PR sort of tool. It could be Gerrit or maybe another GitHub repo. That's a common workflow. Another workflow I have seen is something weird. I don't remember who did this. But we have a framework within NAPALM where you can define certain rules. Like for example, I want to verify that all my BGP neighbors are sending me at least five prefixes. We have a framework to describe things like that, for example. Or I want to check that my interfaces have no errors. Just a YAML file where you can just describe certain data and validate it. I've seen some people that when they deploy something on their application, they run this sort of validation tool to verify that the network is working fine. Like for example, we started this backup. We want to check that we're not having an interface error at the same time. I've seen people now connecting completely unrelated pieces and verifying stuff with CI as well.

35:40 Michael Kennedy: Okay, and things like, I made a change to the network but I don't expect the number of endpoints to change, so let's make sure that that is still a constant number or something like that, right?

35:51 David Barroso: Yeah, it was a kind of weird workflow. They were trying to explain to me, I was like, okay, that sounds both confusing and cool. I didn't really understand it very well.

36:03 Michael Kennedy: Yeah, interesting. This portion of Talk Python To Me is being brought to you by Rollbar. One of the frustrating things about being a developer is dealing with the errors. Ugh, relying on users to report errors, digging through log files, trying to debug issues, or getting millions of letters just flooding your inbox and ruining your day. With Rollbar's full stack error monitoring, you get the context insight and control you need to find and fix bugs faster. Adding Rollbar to your Python app is as easy as pip install rollbar. You can start tracking production errors and deployments in eight minutes or less. Are you considering self hosting tools for security or compliance reasons? Then you should really check out Rollbar's compliant SaaS option. Get advanced security features and meet compliance without the hassle of self hosting, including HIPAA, ISO 27001, Privacy Shield, and more. They'd love to give you a demo. Give Rollbar a try today. Go to talkpython.fm/rollbar and check them out. So to test this out, do you actually have to have the devices like Juniper device or a Cisco device or is there like the equivalent of virtual machines for these network devices?

37:10 David Barroso: Most platforms nowadays have a VM equivalent, so we can use that for testing. But we rely a lot of mock or mocks. Like for example, most of the time we, like most of the code is actually just normalizing data by just connecting to a device, retrieving the interface information, and normalizing it into a common model. For those things, that's easy to mock. You just need the regional data, a few use cases, and just test that the parser normalizer works fine. So yeah, we can get away a lot with just mocks. But yeah, there are certainty that you need a VM or something.

37:54 Michael Kennedy: Yeah, that makes sense. So, run some particular command on some tests lab setup and just capture that text and go, now tell it that it gave you this.

38:03 David Barroso: Yeah, yeah.

38:04 Michael Kennedy: With the mock layer in your CI.

38:07 David Barroso: The thing that you don't want to mock is the configuration management bit because that's so critical. If you mess something up in there, like you might be causing an outage. So, that, we try to always, always test with VMs. We want to play it safe there.

38:24 Michael Kennedy: Is it stressful to work on this project to just the things like people might be running this against huge networks, either your own at Fastly or someone else can just, okay, I really have to get this right because it could have real consequences if I mess it up.

38:40 David Barroso: No, I don't think it's stressful. I don't think so. The primitives that we provide with NAPALM to configure the network, it's actually five or six methods. Those ones that are written, they are written. You don't really have to build it. I mean, it's a bit delicate when you're writing them for the first time. But then, once that's written, it's just a matter of, yeah, just keep testing.

39:07 Michael Kennedy: I see what you're saying. It's kind of low enough that the parts where it make it tricky or not quite right is actually at the person using NAPALM, not NAPALM itself.

39:18 David Barroso: I mean, obviously, you're integrating into something as low level as ..., you have to do your homework and actually do some integration testing yourself and make sure that the exact version that you're running on the network, it actually works well with NAPALM. Like maybe you have this super tiny bug that we have on the operating system or the network, I mean, that we haven't encountered before. That's just, yeah, basic integration testing areas.

39:44 Michael Kennedy: Do you get request for people to add support for particular devices that you don't have access to?

39:51 David Barroso: Oh yeah, all the time.

39:53 Michael Kennedy: How do you deal with evaluating those PRs?

39:55 David Barroso: For the configuration management part, we always ask to have proof that it's working. That could be as simple as he just creates an SSH user for me so I can actually validate the test in person while he is running the test on his machine or I might be running the test on my machine. But the thing is that if I don't have access to device, at least the configuration bit, I need to be able to test it the first time myself. Because then once it's tested and it's working in there, it will probably barely change. What retrieves data from the device, that's easy. That's just a matter of mocking the output. So yeah, that's not really a worry.

40:36 Michael Kennedy: Yeah, it just seems like when you work with all these different devices, it's tricky 'cause you probably don't have access to every single variation of them, right?

40:43 David Barroso: Yeah, correct. If we just try to at least the first time to validate, not take only the word of the contributor.

40:51 Michael Kennedy: Could you do like screen sharing and just watch them?

40:54 David Barroso: Yeah, I mean, anything that works for them. If they don't want to share the screen, they can just give me access to the device, I can just pull their branch and test it myself. That's fine.

41:05 Michael Kennedy: Cool. I just wanted to highlight a video presentation that you did at NANOG. What was the conference? I might be thinking wrong.

41:15 David Barroso: Yeah, NANOG.

41:16 Michael Kennedy: In NANOG. I thought that was really helpful. It's a little bit dated, right? That was I think from 2015, but it still shows a lot of the workflow and how people might do it. So if they're really interested, I'll be sure to put that video in the shownotes so people can check it out.

41:29 David Barroso: Yeah, that's actually the video that we recorded when we went live with the project like, Here we are.

41:35 Michael Kennedy: Yeah, here you are a couple years later. I remember there's just the few devices you guys supported and now there's a whole bunch.

41:42 David Barroso: Yeah.

41:43 Michael Kennedy: So, you want to tell us about some of the major deployments or users of NAPALM?

41:49 David Barroso: I don't really have an official list. I never thought about that. But I was actually going through the contributors and just trying to see where they work. This list I'm going to mention now is based on trying to tackle where people works. I don't know if they're actually using it on their networks. It might be that the contributor is just, I don't know, doing it on its own spare time. I don't know, just disclaimer here. I have seen contribution from people working at places like some trading firm, IMC, Nike, Netflix, obviously Spotify, which is where everything started like eBay.

42:29 Michael Kennedy: Yeah, you were at Spotify when you started working on this, right?

42:32 David Barroso: Yeah, right. Yeah, I was working at Spotify when I started the project.

42:36 Michael Kennedy: eBay is a big user of Python, so that makes a lot of sense.

42:38 David Barroso: Yeah, I know, I have seen some contributions from people working there. Again, I'm talking based on where I have seen they work and I don't really what they're doing. There are a few ISPs like Linx and Danix, which is German you probably know who they are. On Fastly, we use it as well. And another big contributor is Sulphur which is another CDN. There are a few contributors. They seem to be working in interesting places.

43:08 Michael Kennedy: Yeah, that's great. Are you looking for more contributors? People always ask me if there are good places to get started in open source that are kind of new. They want to check this out. Are you looking for competitors and what kind, I guess.

43:19 David Barroso: Anyone could help actually. There's a lot of work to do both in, I don't know, something as basic as documentation. That's kind of like, even if you don't know Python or you're starting with Python, you can always help. Then also, something you have to bear in mind is that this is mostly built by network engineers which means that it might not be the best Python. So even if you don't know anything about networking but you just want to help a project by just reviewing the code and proposing improvements or something like that, yeah, NAPALM would be a great fit for that because we are mostly network engineers. Yeah, we don't know what we are doing. We're just trying to solve problems.

44:03 Michael Kennedy: And we made it work, that's great. I asked you earlier about how you deal with the hardware that you don't know about, but if somebody's got some new piece of hardware and they want to integrate it into NAPALM, what are all the steps that they got to go through for that to work?

44:16 David Barroso: A general vendor which actually has happened already, it's the same workflow as if you were just a regular contributor. You just come to us proposing the idea and that's just so we know that you are working on that because someone might already be working on that. Just trying to avoid duplicating effort. But I don't know, we don't really have a formal structure. There are certain ways that you have to integrate with the project, but that's just part of how the project is kind of designed. It's not really like formalities like, oh yeah, you have to support at least these three methods. And if they are not supported, it's a completely no go. No, we don't really have a strict rule like that. If it's useful, it's useful even if it's just a couple of methods. Someone else might just take it up and improve it.

45:06 Michael Kennedy: Right, it's just getting it started might be a big deal. Okay, that's great. What's the future of NAPALM? What are you guys planning to add? Where is it going?

45:16 David Barroso: Now, we just launched a new library where try to normalize syslog events, for example. This is completely new. It was the first release. It was launched like maybe a couple of weeks ago. In the same way that every device has its own CLI and output from different calls, yeah, in the same way that that happens, they also send different syslog events with different data and everything. So now we are trying to be able to normalize that so you can just start this service. Use it as a syslog endpoint. Just normalize the data and send it somewhere else. Like whatever, logstash or whatever you're doing. I don't know, we're trying now to integrate as well with OpenConfig. OpenConfig is an industry effort to normalize the data the devices return, for example. When we started, OpenConfig didn't exist yet, so we designed our own data models to call them in some way. Now we are trying to start integrating with this data models.

46:29 Michael Kennedy: Yeah, so if there's going to be a standard around all the data, then you guys definitely want to be part of it.

46:34 David Barroso: Yeah, so now we're trying to integrate with that instead of just completely designing everything ourselves. If someone is solving that problem for me, why not?

46:42 Michael Kennedy: If people want to get in touch, so you have both a slack channel and a mailing list, right?

46:46 David Barroso: Yeah, we were in Slack. The Slack organization is called Network to Code. We can probably add a name on the shownotes so the page register. There is also a mailing list, although it's not very active, to be honest. Most people prefer Slack nowadays. It looks like email is slowly dying. We actually launched today a web page to keep all the news posted there and links to the Slack channel and everything. That's actually like...

47:15 Michael Kennedy: Wow, that's great. Just today, huh?

47:17 David Barroso: Yeah, they just, like a couple of hours ago.

47:18 Michael Kennedy: Nice, be sure to put that in the shownotes that we're sharing so we can make sure everyone has that.

47:24 David Barroso: Yeah, because before we had just the GitHub. GitHub is not great if you want to keep people up to date and to like, oh yeah, we just launched this or we have fixed this or make sure that you are updating your requirements and stuff like that. Yeah, we've build this web tool.

47:41 Michael Kennedy: Yeah, nice. What's the Python3 versus Python2 story? Do you support both or one or the other?

47:49 David Barroso: Yeah, we actually support both. Kirk which is one of the main contributors and happens to be also the creator of Netmiko, the tool that I mentioned before, he really like Python3 so he worked very, very hard and he got all the code working there.

48:04 Michael Kennedy: That's great. Very well, I see you don't have to worry in a couple of years when Python2 gets phased out. That's good. All right, so, NAPALM really sounds like a cool project if you're doing network automation that seems really excellent. I wanted to ask you while I have here a few more general questions just about networking type things. What is the feature for network? What does it look like for network engineers, you think? More stuff like what you're doing with NAPALM or is it changing? What do you think?

48:35 David Barroso: Even though I know what the question is, everybody keeps asking that. I don't know. I mean, the way I see it is somehow the network engineer will transition into an SRE type of role where you're not just CLI jockey that knows all in and outs of network device, all the knobs. Network is becoming simpler and simpler because people keeps just building bigger and bigger, so there's less room for a snowflake, so I see more like yeah, network engineers transitioning to a necessary type of role.

49:08 Michael Kennedy: Do you think things like this data standard you talked about and NAPALM itself where you don't actually have to know the details as specifically about the devices. Do you think it's more automation programmability less specific knowledge about the individual devices?

49:25 David Barroso: The thing is that networks traditionally were full of knobs. Like, you go and you configure SDF and then you enable this super obscure option in here, then this other knob in that other place. When your network is not so small, you keep just trying to optimize everything. But now, when just our building things at the bigger scale, like there is not that much room for those kinds of things. Like if you need to build something fast, regardless of if you're doing it manually or in an automated way, you need to start standardizing things. My guess is that eventually things will happen. I mean, it's the same that happened with other services. Like before you have like, I don't know, 20 instance of MySQL and each one was configured differently, trying to optimize every detail. Nowadays, you have like a MySQL and they all look the same. They don't really care about anything, like, yeah, whatever.

50:21 Michael Kennedy: Yeah, it's more about trying to manage 100 things than it is about tweaking the little bits. 'Cause if you can manage 100, you could just add a few more and get better performance or whatever.

50:30 David Barroso: Yeah, with cloud, it's cheap, right? You just have more compute.

50:33 Michael Kennedy: That's right. The other thing I wanted to ask you about was IoT devices. We saw recently one of the, it was the biggest denial of service attack. The denial of service attack actually came from a bunch of hacked web cams or something like this, a bunch of these IoT things. It seems to me like having all these crappy unsecured non-updated devices on networks is a problem that's going to be more and more of a problem as things go. From a network perspective, what can be done about these things?

51:06 David Barroso: The problem is where those IoT device is located. Are they located behind a corporate network or are they located behind like someone's home?

51:16 Michael Kennedy: Probably someone's home connected to a crappy firewall of a crappy nat router that's like hasn't been updated in five years.

51:22 David Barroso: Yeah, then you have a problem because there is no actual way of automating that bugs that you're provider gave you and didn't even provide you the user and all those, right? Yeah, I guess there is little hope even there.

51:36 Michael Kennedy: Not a whole lot I hope. Do you have any hope for it? What do you think is going to happen?

51:43 David Barroso: I don't know. I hope that at some point, the people building those devices actually take some responsibility and pride, I guess, on what they are building. They start building them properly. I mean, that's exactly the only thing we can hope for because you don't what they are going to be. If they are behind a corporate network, you can always secure the perimeter somehow. But I mean if it's at your home, I don't know.

52:07 Michael Kennedy: Yeah, it's kind of tough. Even if you put them on isolated networks like my router will let me create multiple like a guest network, and another one, I can put it on there and it won't get to my stuff. But still, it becomes a denial of service thing on the internet which is not great.

52:21 David Barroso: Yeah, and you're tech savvy, right? I mean, if I my parents buy like this Phillips bulb that can do million things, they have no idea about computer. They just bought it with the remote control that it comes and it just works.

52:36 Michael Kennedy: Yeah, it seems to me that the silly cheap light bulbs and other things like this, are going to have an interesting fix in the future and networking will probably somehow be part of it. But I don't know what it is yet. All right, well, thanks for those questions. I think we should wrap it up. We're getting kind of at the end of the show. A couple of questions for you. If you're going to write some Python code, what editor do you use?

52:56 David Barroso: VI, always.

52:57 Michael Kennedy: VI, awesome. I saw you using that at your demo there in the video, that's great. A notable PyPI packages. Obviously, we can pip install NAPALM, right?

53:08 David Barroso: Yeah, correct.

53:09 Michael Kennedy: There's a good one. Any others?

53:12 David Barroso: I would say Hammock which builds on top of Requests, and Netmiko. Those are probably the two I would mention.

53:19 Michael Kennedy: Okay, I don't know about Hammock. That's cool, what does it do?

53:22 David Barroso: It's mostly a wrapper on top of Requests that you can start doing like, you create your instance and then you do like myapi.api.v1.host.get. Then the URL just builds itself, so it's kind of like...

53:37 Michael Kennedy: I see.

53:38 David Barroso: That's the use.

53:39 Michael Kennedy: It uses like the dynamic nature of types themselves to express the URL. That's really cool, I got to check that out. All right. This is very nice. Thanks for being in the show. Any final call for action if people want to get started. How do they do it?

53:53 David Barroso: Well, if people wants to get started, they can always visit our new webpage where you can find links to our GitHub repo to the documentation as well. We have instructions on how to use VMs, so you can actually start poking up NAPALM without having to deal with the actual infrastructures. So yeah, our website is probably the best place to begin with.

54:14 Michael Kennedy: Okay, excellent. David, thank you so much for being on the show. It was great to talk about networks with you.

54:19 David Barroso: Thank you for having me here. It was a lot of fun.

54:21 Michael Kennedy: You bet. This has been another episode of Talk Python To Me. Today's guest was David Barroso. This episode has been brought to you by Linode and Rollbar. Linode is bulletproof hosting for whatever you're building with Python. Get your four months free at talkpython.fm/linode. Just use the code python17. Rollbar takes the pain out of errors. They give you the context insight you need to quickly locate and fix errors that might have gone unnoticed until your users complain, of course. As Talk Python To Me listeners track a ridiculous number of errors for free at rollbar.com/talkpythontome. Are you or a colleague trying to learn Python? Have you tried books and videos that just left you bored by covering topics point by point? Check out my online course Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python. If you're looking for something a little more advanced, try my Write Pythonic Code course at talkpython.fm/pythonic. Be sure to subscribe to the show. Open your favorite pod catcher and search for Python, we should be right at the top. You can also find iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm. Are you looking for a way to support the show? Taking and recommending one of our courses is really the best way. But if that's not for you, you can become a patron. Visit patreon.com/mkennedy for details. You can give as little as $1 an episode. This is your host Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon