« Return to show page
Transcript for Episode #229:
Building advanced Pythonic interviews with docassemble
0:00 Michael Kennedy: On this episode, we dive into Python for lawyers and a special tool for conducting legal interviews. Imagine you have to collect details for 20,000 participants in a class-action lawsuit. Docassemble, a sweet Python web app can do it for you with ease. Now you may be thinking I'm not a lawyer, so this isn't for me, hang on for a sec. Docassemble is actually a general purpose tool. If you ever have done anything like run a survey on somewhere like SurveyMonkey or created a Google Form to gather a bunch of information, you could do something way more advanced with Docassemble and control the workflow with Python in a really creative and unique way. Join me as I talk with Jonathan Pyle, creator and maintainer of Docassemble. This is Talk Python To Me, Episode 229, recorded August 27th, 2019. Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem and the personalities. This is your host, Michael Kennedy. Follow me on Twitter where I'm @MKennedy. Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @TalkPython. This episode is brought to you by Linode and Datadog. Be sure to check out their offers during their segments. It really helps support the show. Hey folks, before we get to the interview, I have some exciting news. We've teamed up with Humble Bundle to launch a great bundle of Python educational goodness. For a couple of weeks, you can get three of our courses along with great content from Real Python, PyBites and many others for as little as just $1. If you've been on the fence about trying one of our courses, here's a chance to get three of them along with a bunch of other great stuff. Just visit talkpython.fm/hb2019. That's hb2019. And be sure to check it out before time runs out. Now let's get to that interview. Jonathan, welcome to Talk Python To Me.
2:00 Jonathan Pyle: Thanks for having me.
2:01 Michael Kennedy: Yeah, it's great to have you here. We're going to cover a topic that we have I believe yet to cover really at all and that is Python for lawyers or writing code that helps practicing lawyers do their work. That should be a lot of fun to talk about.
2:15 Jonathan Pyle: Yeah, the intersection of law and coding is not very big but there actually are a fair number of people doing it.
2:21 Michael Kennedy: Yeah, there are definitely some people doing it and it's like I was saying before we hit record, that a good friend of mine is a lawyer who does a lot of Python. And so I know that there's some areas where there's really interesting stuff happening but it's maybe not as easy to apply programming to the problems as say like chemistry or something like that, right? But I think there's still a lot of really interesting things going on there and I'm looking forward to diving into them with you. But before we get to those, let's start with your story. How did you get into programming in Python?
2:51 Jonathan Pyle: I got into programming longer than I can even remember. My dad had an IBM PC with a monochrome monitor, green screen and I was just magnetically attracted to the thing. It had like BASIC in ROM BIOS and so I'd just go over and type programs into it which I found by reading like Compute Magazine and Byte Magazine which in the 80s you could buy in the supermarket and they had source code listings and you could type them in.
3:17 Michael Kennedy: Do you remember when you would go to like book stores and stuff and there was a whole computer section of these magazines and these books and like computer shop, for building computers out of parts and yeah, it was a different time right?
3:30 Jonathan Pyle: Yeah, I think the books were just kind of more interesting back then. A lot of real do-it-yourself stuff.
3:35 Michael Kennedy: Yeah, for sure. Okay, so you started with started typing these in, like a fair number of folks have to kind of get started programming. Like okay, well it says if I type this I get a little game or a little something on my program, on my computer because there was no internet, right? This human transcription was how programs got transferred.
3:53 Jonathan Pyle: And what was great in BASIC as you could do interactive stuff very easily. There's this command called input which got you a line of input, and that's kind of hard to do in a lot of languages these days because interfaces are more complicated. So I started with basic, but then as I got older I taught myself C and Assembly language and tried to do some real low-level stuff on the IBM PC. But then I went off to college, majored in physics and then went to law school and became a lawyer. But I never really got away from computer programming 'cause wherever I went, I just found applications for it because people were doing stupid things and I thought I can just write some code that gets around this problem. So I was like using OCR text and converting it into spreadsheets using regular expressions in Perl and saved a client like $500,000 by doing that. It's kind of weird. So I just sort of randomly found myself coding a good percentage of my time as a lawyer, just because I could do it. And so I've been using Linux and Perl and like just random scripts for about 20 years. But I didn't start learning Python until I decided to start with Docassemble, which was about four years ago.
5:02 Michael Kennedy: Yeah, Docassemble is a really cool project but before we get into any of that, you talked about converting OCR text into spreadsheets using regular expressions. That's not super easy to do but it feels like you can kind of piece together some of these libraries and these tools to make it happen. And it's one of these examples of in this rant that I'm always on is that, we don't necessarily need 10 times more programmers but all the people out there with their specialty, they could have a little bit of programming skill, they could solve like major problems in their area like law or biology or whatever if you could say you know what, this problem actually is really easily solved or pretty easily solved with a little programming if you just knew what to do, right? It sounds like that's a great example of it.
5:45 Jonathan Pyle: Yeah, and I think just because of my background I was the only person in the entire firm who knew what a regular expression was and even knew that there was a solution to this problem 'cause everybody else would just be like oh, we need to hire some low-cost labor in Vietnam to just type everything in manually. And I was like seriously? There's got to be a cheaper way.
6:06 Michael Kennedy: Well, and more reliable and faster, right? So suppose you're like we actually have this firm in Vietnam and these folks are really good, we can get this done. If it's that much text that you're talking about, does it really make sense to wait the month for them to type it in and then you're always a little worried maybe about like a mistake slipping in, right?
6:25 Jonathan Pyle: Yeah, but I think the way that the legal field is adapted is that they kind of are not so worried about mistakes. They kind of understand that they're to be expected. But yeah, there are a lot of people who want to hold you accountable for every single OCR being absolutely 100% correct. And that results in a lot of waste of money.
6:45 Michael Kennedy: Yeah, I can imagine. All right, so that's the background and whatnot. How about day to day, what do you do? Are you still working as a lawyer now?
6:53 Jonathan Pyle: I'm still working as a lawyer. I don't work at law firms anymore, I've been in the nonprofit legal services world for about 10 years. So I work at this nonprofit called Philadelphia Legal Assistance which provides free legal services to low income people in civil matters. So nothing criminal but things like child custody, how to get welfare benefits, how to get out of mortgage foreclosure. And I have a job there in management and administration. I'm responsible for compliance with government regulations but because of my computer background, I actually have sort of changed the way that we do business here and so I automate a lot of the stuff that would otherwise be done manually. And there are a lot of grants available for tech work in the legal aid space to help low income people. So I've worked on a number of grants, trying to bring computer programming and data analysis into the legal aid world so that we can better advocate for low-income people as a whole.
7:50 Michael Kennedy: That's sound like a really good use of programming and legal skills. Now I don't know very much about nonprofit legal services, like I certainly understand how for-profit legal agencies work, right? But just give me a real quick sense of like how do businesses like that even run for folks who are not in the legal field and know about it?
8:13 Jonathan Pyle: The biggest thing is government funding and in the United States, the Legal Services Corporation is a federal government agency that hands out money to every county in the country. So the low-income people have some law firm that they can go to and get help for free. And they've been a big supporter of technology. In fact I think the legal aid nonprofit world is more technologically forward-thinking than the big law firms that have all the money in the world. Because we understand that it has benefits to helping low-income people get answers to their legal questions. So we've been pretty innovative in the legal aid nonprofit world.
8:51 Michael Kennedy: That's cool, of course thinking about it, right, the large legal firms, their incentives are not necessarily aligned with automating and massively speeding up some of these actions, right? Like if I can push a button and get an answer in a couple of milliseconds, they can't bill for that, right?
9:13 Jonathan Pyle: I mean I think they're, very slowly, there are some price pressures on the big firms. Clients say I'm not willing to pay for legal research because I know that this can be done with computers and AI. And so I think they're eventually going to come along but yeah, they don't have any big incentive to save the clients much money.
9:31 Michael Kennedy: Once they get the job, right, like I can definitely see that in the competition like hey, we could do this for half price of those guys because we're not going to charge you for this thing, we automate it but yeah. I guess it's a mixed bag, interesting. So let's talk about this project that you created called Docassemble, which it's a really nice project. What is it and what is it used for? Let's start there.
9:55 Jonathan Pyle: So it's a free and open source platform for developing guided interviews. And if you don't know what a guided interview is, think about TurboTax. It's something that asks you one question at a time, it can get very detailed, it can be very long but at the end of it you get, say a tax return or in other contexts, you might get a legal document that you can file in court or you could even get just like legal advice at the end, like legal information or be directed to a resource or do an application. There are a variety of guided interview software packages out there but none of them were free and open source or did what I want. So I created Docassemble because it would serve my purposes and it'd be something that we could develop and crowdsource through the free and open source software movement.
10:40 Michael Kennedy: Yeah, it's really nice and it's a beautiful-looking site. I don't know too much about the legal websites but I know like some of these gather your signature sites and stuff, you go to them and they just look so bad and so old-school and sketchy. You're like I really don't know if I want to put personal information into this thing. So it's nice that this is a good-looking web app that inspires confidence I guess.
11:21 Michael Kennedy: Well, part of that programmer's skill, right? Yeah, so if people want to experience this and get a sense for what it's about, I guess the first thing is you can drop over at docassemble.org, right? And then there's a, at the very bottom you can run a demo and then it just goes through and asks you questions. Like what is your name? And then you enter it in, what is your location? And then possibly based on what state you're in it might ask you different questions, right? It can kind of flow different questions together based on your answers, right?
11:51 Jonathan Pyle: Yeah, and while you can do some of that stuff in like SurveyMonkey or Google Forms, this is kind of like the very advanced version of that where you might have incredibly complicated logic that would be so very difficult to manage if you are trying to like hand code endpoints in a Flask app for example. You'd just have so many endpoints, it would be hard to keep track of them. But it's sort of a system for abstracting away the logic and making it easy and maintainable to go from what the law is or what the domain knowledge is to an interview that gathers information.
12:28 Michael Kennedy: Yeah, and I guess it's worth pointing out and you do on your website or the GitHub repo, I can't remember but I read somewhere from your documentation that this is developed in the context of a practicing lawyer, but it is not specific to law, right?
12:42 Jonathan Pyle: Yeah, I think one of my first users who found it on GitHub was in France and they were using it to diagnose problems in mechanical equipment. So anything that's amenable to asking one question at a time where you don't want to have to like hand code all of those screens, you could use this system for.
13:03 Michael Kennedy: Yeah, I can see this almost like tech support even, right? Like in that context. Does the machine turn on? Yes or no. Does smoke come from it? Yes or no. First turn it off, okay? Now what's the next question? Yeah, that's pretty interesting actually and it definitely seems more flexible than SurveyMonkey, and you know all those things are commercial services that are SaaS and you can just take it the way it is or leave, right? And this is obviously written in Python, something you can download from GitHub and customize.
13:54 Michael Kennedy: Yeah, that's super cool. And it's a Python web app based on bootstrap, so that probably means that all the fancy nice bootstrap themes that you can find over at like wrap bootstrap or start bootstrap or probably a bunch of others I don't know about, you go find all these either super cheap like $10, $20 themes or even free and open source ones. You can probably plug those in and get the look and feel you want, right?
14:18 Jonathan Pyle: Yeah, that's kind of why I picked bootstrap because it's widely used and it's themable, and there are a lot of different options if you don't like the standard look and feel.
14:28 Michael Kennedy: Cool, so I definitely want to dig into the technical side of things but maybe just another quick question or two, the high level to set the stage is, so you're working, helping out these folks at the legal nonprofit. How do you use this in your day to day job?
14:42 Jonathan Pyle: Well, I'm so busy maintaining the platform and working my day job that I don't really have that much time to deploy stuff. But I have been working on a very complicated interview that asks all the questions necessary to help somebody file for bankruptcy. And that's primarily being done by a nonprofit called Upsolve, but they're one of our sub-grantees. So if you check out upsolve.org and see what they've done, they've kind of democratized Chapter 7 bankruptcy for the nation. Whereas before you would have to hire a lawyer for $2,000 or try to find a pro bono lawyer to do it for free, which is very...
15:20 Michael Kennedy: Which sounds horrible when you're literally filing for bankruptcy. You're in a like a financially bad place and then you got to go pay to dig out from the hole, right? Which is rough.
15:30 Jonathan Pyle: Yeah, it costs a lot of money to be poor. You can go on Upsolve site and you can go through a very long guided interview that's using Docassemble and it gathers all the information necessary for an 80-page bankruptcy petition. And then they have a lot of other custom code after that, but the questionnaire that they do is based on Docassemble. So I do get to work on some of that during the day. But I also use the system in legal aid to do things like gather retainer agreements from clients. I can send them a link, they can click on it with their smartphone, sign their name with their finger and the signature goes into the document. So stuff like that is pretty useful.
16:06 Michael Kennedy: That's really cool and it seems like it really should be used a little bit more even outside of the legal space 'cause it seems quite interesting. So maybe one of the things that we could talk about is just why Python. It's not like you are afraid of other languages, right? You obviously did a bunch of C and Assembly language, right? That's a pretty hardcore language. So why did you choose Python for it?
16:32 Jonathan Pyle: Well, when I started this system, I was sort of a Perl hacker and I loved Perl. But my idea for the system was I want to make this into this high level language so that you can basically code the law and that a lawyer could sit down with minimal knowledge of computer programming, like just to do if-else statements that set true/false variables for example. I mean that is not rocket science and I wanted them to be able to understand code and read it and work with it. Maybe they would get help from somebody to clean up the syntax. So I was looking around for something that was very clean and readable. Perl is great but it has so many punctuation marks whereas Python is like so neat and clean. And also I saw these books in the bookstore where it was like teach Python to your kid and like integrate Python with Minecraft if you're 12 years old. So I thought well if this language is good enough for six-year-olds, then it's good enough for a lawyer who has an advanced degree. So I thought that would be a good general-purpose programming language to base this on and I definitely wanted a general-purpose programming language. A lot of people argue with me and they say oh if you're encoding legal rules you really should have a declarative programming language. But the problem is that all those declarative programming languages were developed in academia. They don't really get used much in the real world and they don't have loads and loads of packages that you can just install if you want to integrate it with Slack for example.
18:03 Michael Kennedy: Give us some examples of the declarative languages you were considering or people were suggesting.
18:07 Jonathan Pyle: Oh, I don't even remember them because I didn't give them too much of a thought. But every one that I, like first of all it was not that easy to read because it would use like weird Greek notation or whatever. I also read an article where they tried to do some of this stuff in the 80s using one of these declarative languages and they found that the attorneys just first wrote out procedural code and then converted their procedural ideas into this declarative syntax. So I thought well maybe this whole declarative stuff is really just something that's academically interesting but the way that people think is more aligned with, the way the general-purpose programming languages that are procedural actually work.
18:49 Michael Kennedy: Yeah, I can imagine. So python has that joke I guess, I don't know if you've seen it, says it has like a little paper or a file or something with some pseudo code. It says how do you convert the pseudo code into Python? You put .py on the end of the file. Python is one of the languages that's closer to the pseudo code in the way that people might like sketch it out in words. Not flowcharts but like little words, right or statements. So it's pretty nice as opposed to, I don't know, Java or C# or, well now we go create a class and you put the public static main void in here to get started. Like whoa whoa, what is all this?
19:29 Jonathan Pyle: Yeah, and I remember when I was a teenager, I was reading a lot of Donald Knuth who did LaTeX and stuff and he had a whole book on literate programming and I was really inspired by that. Like can we make programming as much like English as possible? And I think Python really does that to a big extent.
19:45 Michael Kennedy: Yeah, for sure.
20:14 Jonathan Pyle: Yeah, it's not not horrible but...
20:17 Michael Kennedy: No, no...
20:17 Jonathan Pyle: But Python just has so many advantages.
20:19 Michael Kennedy: Yeah, especially when your goal is to write, create a way for folks who are not programmers to write simple logic into it, right? This portion of Talk Python To Me is brought to you by Linode. Are you looking for hosting that's fast, simple and incredibly affordable? Well, look past that bookstore and check out Linode at talkpython.fm/linode. That's L-I-N-O-D-E. Plans start at just $5 a month for a dedicated server with a gig of RAM. They have 10 data centers across the globe so no matter where you are or where your users are, there's a data center for you. Whether you want to run a Python web app, host a private Git server or just a file server, you'll get native SSDs on all the machines, a newly upgraded 200-gigabit network, 24/7 friendly support even on holidays and a seven day money-back guarantee. Need a little help with your infrastructure? They even offer professional services to help you with architecture, migrations and more. Do you want a dedicated server for free for the next four months? Just visit talkpython.fm/linode. One of the things that I thought was interesting was your use of YAML and Markdown. Now Markdown I kind of expected, like that's not super surprising. But if you want to create one of these interviews, you might want to ask the question like what is your name and what is your age? Oh, and the age has to be an integer and things like that, it has to be a number. So maybe talk a little bit about how you're using YAML to let people create these interviews in flows?
21:46 Jonathan Pyle: Well, I picked YAML in part for the same reasons I picked Python because it was machine readable and human readable at the same time. Like rather than use JSON as a way to structure things like lists and dictionaries, I thought it made sense to use YAML just because it had the minimum of punctuation. And also attorneys are used to doing outlines 'cause when they go to law school, that's how you study is by creating an outline of the subject. And I just thought YAML looks so friendly because it's just like bullet points made of hyphens. So that's why I settled upon that. I needed something that wasn't just code, it was more of a data structure.
22:25 Michael Kennedy: Yeah, it looks really nice and clean and I can certainly see if you give like a little template, an example to somebody they're like oh yeah.
22:31 Jonathan Pyle: Yeah, I try to teach by example.
22:33 Michael Kennedy: Yeah, that's definitely a good way to do it. Yeah, so you can have like a drop-down with the list and really 'cause that's pretty easy, right? And it's actually not that different than Markdown, like the dash dash dash item in YAML is also, would work in Markdown as well. That's pretty interesting that they're kind of similar.
22:48 Jonathan Pyle: Yeah, and I chose Markdown just because I didn't want a lot of HTML characters and also it can convert into so many different forms. So Markdown is used sometimes for documents that get turned into PDF and if it's also used for stuff that appears on the screen. So it's very flexible way to format stuff.
23:07 Michael Kennedy: Yeah, and the thing that I like about Markdown, the reason I use a lot is you can use other formats that are maybe richer, like HTML fragments and stuff. But if the slightest thing goes wrong with it, everything is wrecked, right? It's so bad. And you've also got the potential problem of user input that could be malicious, right? If you're accepting this definition from someone else, right? But if it's marked down it's pretty safe. Yeah, now another thing that you do that I think is pretty interesting is you can define these questions like what is your favorite number? And it defines a variable like best_number potentially and has a datatype and so on. But then you can write Python code that has conditionals, right? We talked about, so if I said I was in Oregon versus a different state, it might ask me a different question because the rules in Oregon are different than they are in Pennsylvania for example, right?
24:02 Jonathan Pyle: Right.
24:03 Michael Kennedy: And so you've got this little example here that says something like if user_is_citizen or user_is_legal_permanent_resident, user.is_eligible == True, else user.is_eligible = False. Now that alone will actually sort of trigger some of the questions that get shown or a flow in which the questions are asked. And so that sounds like magic, how is this like simple Python conditional and like little tests like that that I'm writing actually controlling the flow and the questions?
24:35 Jonathan Pyle: Yeah, so the core logic engine of Docassemble is that it tries to evaluate some Python code and then it traps any name errors. So if you go along and you write some Python code and you refer to a variable that hasn't been defined yet in the name space of the interview answers, then it triggers a NameError which is just like core Python stuff.
25:00 Michael Kennedy: Right, if I say user.is_citizen and that the user doesn't have a .is_citizen, obviously you might get an error, or something like that, right?
25:08 Jonathan Pyle: Yeah, that would be an AttributeError but if you referred to just a name is not defined like a trap that...
25:13 Michael Kennedy: If user itself is not defined for example.
25:15 Jonathan Pyle: Yeah, if user's not defined, I wrote code that then takes that variable name and then goes and looks for a question in the YAML file that offers to define that variable. So it's sort of like, and then it goes and once it sets that variable which might take a question to the user, then it evaluates it again from the start.
25:36 Michael Kennedy: That's funny, so it like runs it and goes out. We crashed on users, guess we have to ask the user question. Then you have the data value set for the user and you ask it, you like rerun the Python code again, you get a little farther, you're like, is_eligible is AttributeError, we got to figure out if, it is_eligible or whatever, right?
25:54 Jonathan Pyle: Yeah, so it's like every time the screen loads, it reevaluate everything from the top which is moderately inefficient but with computers it's very fast. And so at the end of it, if it gets all the way through then you're done with the interview. You have all the information you need, you've gone through the logical paths that you need to go through. And the nice thing about Python is that if a user._is_citizen or user.is_eligible_alien, it'll stop at a user.is_citizen and it won't even try to evaluate the second part. So won't trigger any name errors or any other errors on the part after the or. So therefore the interview can be as parsimonious about what questions it asks of the user. So the user is only asked for information that is logically necessary and that's all done sort of by tapping into the way that Python is parsed and evaluated.
26:53 Michael Kennedy: Yeah, interesting. Because Python short-circuits the or, you might not have to ask them both are you a citizen and are you a illegal resident. It's two separate questions. Only if they say no, I'm not a citizen do you ask them about the residency?
27:06 Jonathan Pyle: Yeah, exactly.
27:07 Michael Kennedy: Huh, all right so this is pretty interesting. This is not how normal python works but this is a pretty creative and I would say in this context, really a positive way to build like an extensible way to script out this flow. That's definitely better than flowchart draggy drop back-end or something.
27:26 Jonathan Pyle: I still have a lot of people who would prefer that I created a flowchart GUI interface for them, but I just find with any service that offers a no code solution, what that means is the easy stuff is easy and this moderately difficult stuff is nearly impossible. Where in code the easy stuff is kind of hard and the moderately difficult stuff is a little bit harder and really really complex stuff is doable. And it's I'd rather have the latter.
27:55 Michael Kennedy: Yeah, of course and that makes a lot of sense because if you just want to ask like three questions and whatever, SurveyMonkey or something like that or a Google Form or, you name it, right?
28:06 Jonathan Pyle: I like the idea of attorneys just being able to concentrate on specifying the law in if-then-else statements and not having to worry about interview flow. They don't need to really think about like what question was asked when. They can just concentrate on what they're good at which is envisioning the law and just writing it out and let the computer do the work of figuring out what questions to ask in order.
28:32 Michael Kennedy: It's really creative that you just rerun it over and over until it stops crashing. You just like working with, I think that's pretty creative and I've not seen anything like that before.
28:41 Jonathan Pyle: And I was giving the example of a NameError where you refer to a name that doesn't exist but then in order to do attributes and indexes, indices, I had to create a new object that would raise special exceptions because Python variables and this is one of the limitations of Python. Python objects are not self aware. They don't know their own name, they're just kind of like a value that has pointers to them somewhere in the core. So I had to sort of give each object an inherent identity. So that confuses things but it all works.
29:16 Michael Kennedy: Yeah, I know, it sounds like it works pretty well. So you define these interviews in YAML files and then you define the flow by like stating the law or your desired sort of flow of the interview in Python and then it just Docassemble just pieces of all together, makes it real.
29:32 Jonathan Pyle: Yeah, you just write some logic that where the end point is presenting it, some final screen to the user. And Docassemble just kind of uses dependency satisfaction to ask all the appropriate questions in order to get there.
29:38 Michael Kennedy: Of course, this wouldn't work with some kind of compiled language, all right? Like a compiled language you'd have to try to compile and then run it and it would have to have all the elements available at compilation time, not just the ones that it's trying to crash into as it makes its way down the branches.
30:02 Jonathan Pyle: Well, there are people smarter than me who've built things like that, I don't know how to pronounce it, the Jinja2 templating system.
30:09 Michael Kennedy: Yep, that's right.
30:11 Jonathan Pyle: I think what they do is they actually like parse out all of the variables that are used and then creates these like stand-in objects for them. I think there are some ways to sort of compile everything and then do the logic on it later, but I've just used the sort of exception trapping system.
30:29 Michael Kennedy: Which I thought was interesting. So on the website which I said it really presents things pretty nicely for this open source project, you have a bunch of features and I think it might be worth going through those features and then like digging into the technology behind it as a way to see more of the various libraries and technologies that are at work here.
30:48 Jonathan Pyle: Sure, sound good.
30:48 Michael Kennedy: Yeah, so let's just start at the top one. Says you have a WYSIWYG, what you see what you get editor and you can compose your templates as a Word document using a Word add-in to get started. So how does that work? We talked about YAML being the definition of these things but then what's happening here?
31:07 Jonathan Pyle: Yeah, a lot of lawyers like to create Microsoft Word documents and very helpfully there is a Python package called Python docx template that somebody developed. Uses two other packages, one is docx which is kind of a utility for writing Microsoft Word files and the other is Jinja2 which I was just talking about. And just kind of mash them together by using Jinja2 on XML because Microsoft Word files are actually XML inside of a zip file. And so it created the system where you could do Jinja2 on a Microsoft Word file. So I also figured out that Microsoft had a pretty neat tool for putting an add-in into Microsoft Word, both the online version and the desktop version that ran in a little sidebar. So I was able to create a sidebar for Microsoft Word that had the variables in your interview and you could just click them to insert them into your Word document. So that's how I was able to do some nice Microsoft Word templating and to make it as WYSIWYG as possible. I'm actually not a fan of WYSIWYG. I put that out on the website because other people are. I would prefer that everybody use Markdown. I have another document assembly tool that doesn't use Microsoft Word files that just converts markdown to PDF. And that's what I like better because it uses Pandoc on the back end which in turn uses LaTeX. I'm a big LaTex fan.
32:45 Jonathan Pyle: Yeah, the canvas I think it's transmitted as a PNG file encoded in a URL. It's like a Base64 conversion and it turns, it's like a data URL. And so you just transmit that in a POST request up to the server. It's really pretty simple.
33:02 Michael Kennedy: Yeah, I've never tried that but that sounds totally simple. You also have live chat. So if you were hosting an interview, like you're the person receiving all the answers, right? You can assist users in real time with even screen sharing and remote screen control. What's up with this one?
34:27 Michael Kennedy: I see, so you're like this is literally the Dom that they are looking at, right?
34:32 Jonathan Pyle: Yeah, basically. It's inside of a little iframe. I also figured out it was fairly easy for then the operator to seize control of the user's browser just by sending over some events over the WebSockets. So as the operator you can click a button and control the other users' interview and type stuff into their text boxes or click on their buttons. They can see you doing this in real time. It sounds pretty advanced but it was actually pretty simple to implement.
34:59 Michael Kennedy: Wow, it sounds really cool and I haven't heard of too many things that are like at this level. I've definitely seen these little chat programs and stuff with the operators and whatnot, have some experience with that but that sounds like a really cool feature. I honestly didn't expect that to be in here.
35:14 Jonathan Pyle: Yeah, I thought it was a cool feature too but like nobody is using it. I have no idea why. Sometimes you create something and you think you're going to get massive users and nobody actually cares.
35:23 Michael Kennedy: Well yeah, that is always the challenge, isn't it? I suspect if I had to guess, right, like this is super cool that you can help people this way. But it's also challenging to have a person sitting there that can always help. I don't know, it's not a lot of fun to be a chat operator to be honest. I've been one temporarily.
35:41 Jonathan Pyle: But I do think we need to get out of the paradigm of either it's a 100% robotic service or it's a 100% human service with the human touch. There has to be some middle ground where a human gets involved when necessary but otherwise they're using a a web app. So I think it's good to have close contacts with your users at least for part of the time of your app development, 'cause then you you really see what their pain points are in the real world.
36:09 Michael Kennedy: Yeah, that's for sure and you know some of the stuff that you're doing is, it's super important what the right answer is, right? It's not like well, I clicked this thing and this like spreadsheet app that you build and it didn't quite do what I wanted, right? This is are you eligible for bankruptcy or something, right?
36:25 Jonathan Pyle: Yeah, and you don't want your users to commit perjury when they're talking to a court about what their property is for example. It's important to get this stuff right.
36:35 Michael Kennedy: Yeah, exactly and so I think the stakes are pretty high here. So that's pretty awesome that this is a feature. This portion of Talk Python To Me was brought to you by Datadog. Get insights into your Python applications and infrastructures with Datadog's fully integrated platform. Debug and optimize your code by tracing requests across web servers, databases and services in your environment. Then correlate and pivot between distributed request traces, metrics and logs to troubleshoot issues without switching tools or contacts. Get started today with a 14-day trial and Datadog will send you a free T-shirt. Visit talkpython.fm/datadog for more details. Kind of along those lines you also have SMS and email but I guess that's pretty much to be expected, but maybe tell us about that real quick.
37:22 Jonathan Pyle: Yeah, isn't there some adage that every piece of software bloats until it can send and receive email. So yeah, so my software does, sending email is not that difficult although I found a way to do it with Mailgun which uses HTTP. Because SMTP is just so slow. And then the text messaging was also super easy. You just get a Twilio API key and sending messages is very easy. And then I have a another feature using email that nobody uses where you can actually run a mail server on your server and you can sort of email into your interview. So if you have an interview session you can if you want, you can mail documents to it. And so I programmed the mail server to intercept those messages and then sort of make changes in the appropriate interview session.
38:13 Michael Kennedy: Yeah, that's cool. I've done that before as well. You can either go to the site and log in and answer this or type it in or whatever or you just reply to this email, right? And then it just folds it back into the database as if you had done that. Yeah, that's a different level than just sending email but it is cool and it works. Now, you talked to earlier about the WebSockets and the live share and Flask and all that, let's talk a little bit about the hosting. We have Redis, we have Flask. We have some kind of database, I suspect. This sounds kind of out of the realm of standard lawyer technical Linux capabilities.
38:47 Jonathan Pyle: Yeah, and unfortunately it's not a pure Python package. The way I distributed it is through Docker because it has so many, it has so many non-Python dependencies and services that need to be running. And it's just you can script the orchestration of that with Docker instead of giving people complicated instructions to run, and you can get it nicely containerized. So there isn't much you can do with it as a pure Python package. Although I tried to abstract it away so that you could sort of use the core logic engine and Python by itself. But yeah, Docker has been extremely useful because I don't think I could have gotten anybody to use it unless installation was one-line of a bash command or something.
39:30 Michael Kennedy: Yeah, that's cool. Docker is really interesting in this regard, like a lot of times it solves these problems but it also, it has its kind of its own complexity. Like Docker never feels super simple to me. Okay, well I can start this but then how do I make sure it's running or how do I like update a new version. As a beginner, those things always feel pretty challenging.
39:51 Jonathan Pyle: Yeah, that's one of the big challenges of distributing software that gets updated is you have to take into account all the existing users. And so like with something like the SQL back-end it was really helpful that I used SQLAlchemy, I don't know how to pronounce it. And it also has this sort of add-on feature called Alembic which gives you this method for upgrading your SQL if you wanted to add a column for example.
40:18 Michael Kennedy: Right, yeah, SQLAlchemy is awesome in that it lets you write simple Python classes that map to your database, so they even create the tables and the indexes and the relationships. But boy, if you change anything it hates it, right.
40:30 Jonathan Pyle: Well, I think with Alembic, it kind of adjusts for that in a pretty elegant way. So I haven't had problems.
40:36 Michael Kennedy: Right, but if you don't have Alembic, right, you as soon as you make this changes so you need to go and create the migrations and then automatically apply them like the next time you start the app and all that kind of stuff, right?
40:46 Jonathan Pyle: Yeah, so I actually spent the whole last week taking a vacation and working on upgrading from Debian stretch to Debian Buster and Python 3.5 to 3.6, and migrating the web server from Apache to Nginx or however you pronounce it. All that Docker stuff does take a lot of time 'cause you have to get it just right 'cause you don't want some user to be like oh my system crashed, what do I do?
41:13 Michael Kennedy: Yeah, what's nice though about Docker is you build the base images, you figure out how to create a Debian server with Nginx set up correctly, it's good, right? You don't have to think about it. Now it's good, it's all set up, right? And so you just kind of build it layer at a time but yeah, the real challenge I see is is migrating that over time. So what do you tell folks if you say well we're going to give them a new version, what do you say that they should do?
41:38 Jonathan Pyle: There's two ways that you can upgrade. One is by just clicking a button that does a Python upgrade, that just runs pip, and gives you the new Python packages. And it installs in a virtual environment and restarts the services that use Python. So that is pretty painless and doesn't have a lot of errors associated with it. But sometimes you need to upgrade all of the backend stuff and the way that you would do that is by basically stopping and removing your Docker container entirely and then running a new one with the new Docker image. But then the problem is well what about all your users' data? And so you have to have systems for using Docker volumes or the recommended thing is using cloud services like S3 or Azure Blob storage. And so I have these complicated systems where every time you shut down the server, it backs up all the information to the cloud and then when you start it up again, it restores from the cloud.
42:34 Michael Kennedy: oh, that's nice. And so that is... An automatic backup which is probably something that's also challenging for folks.
42:41 Jonathan Pyle: Oh yeah, yeah, and so I've got a Cron job running on this Docker machine that does backups. So people have run into problems. They tend to do crazy things that you would never anticipate but yeah, it does work pretty reliably to back up to the cloud that way. And the nice thing about having everything sort of cloud-based is that it was very simple to make it scalable. It's not a big deal to add another web application to your cluster. You can bombard your Docassemble system with loads and loads of requests and you're really only limited by the speed of your SQL server or your Redis server.
43:19 Michael Kennedy: Yeah, so do you have like one Docker container for Redis, one for SQL, and then like separate web front ends you can fire up more of them, or something like this?
43:27 Jonathan Pyle: Yeah, you can do it that way or you can use like a hosted solution from your cloud provider for your Redis or your SQL which I actually recommend 'cause they have nice backup systems built in.
43:40 Michael Kennedy: Yeah, that's good, just point it at thing. And they already know how to make sure it's safe and fails over and whatnot.
43:46 Jonathan Pyle: Yeah, but the problem I have is that people who really have no experience with system administration are trying to teach themselves Amazon Web Services and like multi server systems. They just get confused. The problem with making things easy for people is that you get all these curious people who don't have enough experience to sort out problems. So like they run into an error and I say well, get yourself a command line and they're like how do I do that? ssh to your machine, how do I do that?
44:14 Michael Kennedy: I typed ssh, it doesn't work. Yeah, then you go down a very login windy path.
44:27 Jonathan Pyle: They're like can you teach me how to do this and I'm like, I learned how to do this over a period of 20 years. How am I going to teach you?
44:34 Michael Kennedy: Yeah, I mean that's part of the trick of like being a programmer or in technologies, people look at you and they're like you know all these things and they think that means you're either super smart or you have some, or amazing way to learn and you're like no, you went through the same painful steps but you just layered on these skills one at a time. You're like yeah, ssh used to be hard but I figured out how to get the keys registered and like everything. That's not a problem, it's gone onto autopilot now, I'm on to the next problem like database migrations or whatever.
44:59 Jonathan Pyle: Exactly, yeah.
45:00 Michael Kennedy: Yeah, it's tough to communicate all that. I don't even really know how to do that all at once.
45:05 Jonathan Pyle: Yeah, and they expect it to work so perfectly and I'm just like don't you understand what benefit you're getting from Docker? It's amazing, we didn't have this 20 years ago. We used to do it the hard way.
45:16 Michael Kennedy: Yeah, let's keep going down the thing I think that's interesting here. So you have multiple language support. So like I could conduct the interview so the person could say I prefer Spanish or I prefer English and then they may get their questions in different languages, right?
45:31 Jonathan Pyle: Yeah, I had a lot of features to help with translations and multiple languages. You can have like multiple YAML files. One for Spanish questions and one for English questions and use them all in the same sort of logical interview. You can also write everything in English and then generate an Excel spreadsheet that has all the text in it. And then send that spreadsheet to a translator and they translate the English into some other language and then you load that spreadsheet into the system, and it will substitute the English with the other language.
46:05 Michael Kennedy: Yeah, that's cool.
46:06 Jonathan Pyle: So people are using it for some, for like five language interviews. And I think using the Excel spreadsheet is a nice medium because that's what translators are comfortable with.
46:16 Michael Kennedy: Yeah, sure. So another one that's interesting is extensibility. Obviously this is where it gets to more of the developer side of things, right? You can use the power of Python to extend the capabilities, maybe give us some examples so we know what's going on here. 'Cause you also have APIs and integrating with third-party apps, maybe talk about those two at the same time 'cause they kind of seem the same but not exactly.
46:38 Jonathan Pyle: Yeah, I tout the extensibility just because everybody wants to do their own idea, and I can't anticipate all the features that they're going to need. So I just give them the power of Python and then they can install a package to do whatever they want. So some people wanted to do integrations with Google Sheets so I looked up and found that there is a package for that on PyPI, and I can show them how to do that and how to set it up. And so people have integrated all sorts of things just by importing the package into their system.
47:10 Michael Kennedy: Yeah, really nice. And then API is for integrate with third-party stuff?
47:14 Jonathan Pyle: There are a number of things that I have like a GitHub integration. There's this authoring system where you can write your own YAML right in the web browser. And I have a GitHub button that then runs git on the backend and does pushes and commits and stuff like that.
47:30 Michael Kennedy: Yeah, cool.
47:31 Jonathan Pyle: And for the login system, I'm using the built in Flask username and password system but a lot of people want to have social logins or auth0. So I have some APIs that integrate with that. People also want to, they like writing the stuff in the web browser but they also want it saved to their desktop and so I have an integration with Google Drive, so that you can press a button and sync to your Google Drive and then run your interview files that you've just synced. So there are a lot of different ways that Docassemble talks to other applications that people like to use.
48:06 Michael Kennedy: Yeah, cool, another one kind of related to that is you say you can package your interviews and use GitHub and PyPI to share your work with the Docassemble user community. So can you create like extension packages or something like that, and then people can add them as a dependency of the app?
48:22 Jonathan Pyle: Yeah, so the way that I have structured the system is you write your YAML but you can also write modules like .py files. And the way you package and distribute things is using just the plain old Python packaging system. And it will create the Python package for you and there's a button to upload it to PyPI as well as one for GitHub. I'm just sort of using the exact same software distribution system that already exists but the YAML files and other files are just under a data folder in your Python package folder system.
49:00 Michael Kennedy: Yeah, okay.
49:01 Jonathan Pyle: So it's great, I didn't have to invent my own package distribution system. I'm just like do what everybody else does on GitHub and PyPI.
49:08 Michael Kennedy: Yeah, awesome. It also has support for background tasks. Even when people are not interacting with the website, it could be running stuff in the background. Is that using Redis or how's that happening?
49:19 Jonathan Pyle: That mostly refers to Celery. So Celery is a distributed task queue system in Python which is amazing because the problem with the web applications is you have to do everything quickly or else the browser is going to time out. The user's going to be sick of looking at a spinner and they'll stop the connection or something. But if you use celery you can have long-running code execute in a separate process and then save its results to the place where you have your interview answers stored. And the other great thing is they queue up in there. So I have some cool Celery stuff like where, if you upload a PDF file, it makes a PNG image out of every single page and then sort of in parallel OCRs them for you using Celery and all of its queueing magic.
50:09 Michael Kennedy: Yeah, it, grab a PNG of the page and throw it up there and say when you get a chance, OCR this and store it here or something like that, right?
50:15 Jonathan Pyle: It's really useful because sometimes people have really long documents that they need to assemble. And if it takes 30 seconds to do that, you want to be entertaining the user while that happens and so you put it into a background task and maybe have these or answer some other questions. And then you just check oh, is the task ready? And then if it's ready then you you get the document. So yeah, that's been a real lifesaver.
50:40 Michael Kennedy: Nice, I guess maybe the last one to touch on here is the secure bit with server-side encryption and document redaction and things like that. Maybe using the Let's Encrypt or something else along those lines. Want to talk about those things?
50:54 Jonathan Pyle: Yeah, a big concern of lawyers is that they're getting client information, they're getting personally identifiable information and they want some reassurance that whatever software they're using is not going to reveal those personal details to the world. So there are a number of features that I use to try to increase security, one of which is I have Let's Encrypt built into the deployment system. So all you need to do is give it your email address and set up your DNS properly and it'll do let's encrypt it for you and it'll renew your Let's Encrypt certificates. It also does server-side encryption. And the way interview answers are stored I'm using, I'm letting the interview answers just be a Python namespace which is just a dictionary and I pickle that using the Pickle package. And then I encrypt it with the user's password. Every time they contact the server, they send this sort of secret password and I use that to encrypt and decrypt. So that password is never stored on the server and so I can have encryption of the pickled, a serialized data structure right there in the SQL database. And then other features too like multi-factor authentication for login. That was very easy to implement with the various apps and SMS messaging. And redaction, like I figured out a way to replace text with like blocks of black ink or whatever. So there are a lot of different ways that people can feel secure. I haven't figured out the whole encryption of files on the file system problem yet, so if you upload a file that's not encrypted server side. But everything else is.
52:34 Michael Kennedy: You know the project is open-source and you probably would accept a pull request that would add that feature or something like that, right?
52:41 Jonathan Pyle: Oh, absolutely.
52:42 Michael Kennedy: Speaking of which, are you looking for contributors to this project? Are other people already working on it with you? What's the story there?
52:49 Jonathan Pyle: But what I found is that there's no magic to open-source and crowdsourcing. Maybe there is in other areas but there are a lot of people who are using the system. Some of which are programmers themselves but it's pretty rare that people contribute something substantive to the code. So I still found that even though I never took any CS classes and I just do this on the nights and weekends, I'm still doing 99.9% of the coding. Just hasn't magically happened that other people contribute. So if there is somebody who wants to really dig into this and contribute, I would totally welcome that. Although I think because I've had 99% control, I kind of have a strong feeling of authorship. And so I might have strong opinions about what gets brought into the system.
53:37 Michael Kennedy: Yeah sure, I mean one of the things that people do sometimes that can be frustrating is they see something they're like oh, I should add this feature or fix this thing and they'll create pull request and do all the work and then submit it in but maintainer will say well that doesn't fit with my view of this or whatever. You're like but I did all this work, right? So maybe people could open like an issue in GitHub say hey I'm considering this feature, here's what I'm thinking about doing to add it for you, would you be interested or would you hate this idea of having in this project?
54:06 Jonathan Pyle: I love that, I guess if they, people talk about it. We also have a very active Slack channel where we discuss these things.
54:12 Michael Kennedy: Yeah, that might be also a decent place. But GitHub is nice because Slack is super transient, right? There might be five people that all have this idea that would say yeah that's great or actually I see it this way or right, but if it's in Slack it's gone if you weren't there.
54:25 Jonathan Pyle: Yeah, and so I use both Slack and GitHub for that sort of stuff. But I'd be grateful if there were more Python programmers out there who wanted to try their hand at adding something to this system or just using it and contributing bug reports is also really helpful.
54:41 Michael Kennedy: Right, for sure. So one of the things that I noticed, that jumped out to me when I was going through the demo example, right? I was going through and answering the various questions. And there's a button up at the top, I suspect this is not in the real one but in the demo one it is where you can press and say show me the source and it'll actually pull up and show you the YAML file and the various other pieces and like some of the performance analysis stuff which is pretty cool. People can go check that out if they are looking for help. But another thing that I think is nice is you have this readability score I guess. Not readability of the app but like how easy is this to read? Like the Flesch reading ease or the Flesch Kincaid grade level? Things like that. That sounds pretty helpful if you're trying to have questions that people want to answer correctly. You want to keep that probably as simple as possible, right?
55:35 Jonathan Pyle: Yeah, I think that's what a lot of people don't understand is that really the hard part about developing guided interviews is getting the language right and being able to be precise, but also use plain English. And lawyers have a tendency to just go on for too long and use big words, but if you, so I added that tool to sort of tell you what the grade level of the language you're using on that question was in the hopes that people would go for a sixth grade reading level and keep working on their language until it got to that level.
56:08 Michael Kennedy: Yeah, that's cool.
56:09 Jonathan Pyle: That's thanks to this a great package called Text Stat. So one of those things because of the magic of the community I was able to integrate it very quickly.
56:18 Michael Kennedy: Yeah, that's cool and that's what the main reason I wanted to prompt you to ask about, it's like how will you generate that in Python?
56:24 Jonathan Pyle: Yeah, it's that Text Stat package.
56:25 Michael Kennedy: Okay, that's really cool. I can see lots of uses. Maybe you don't show that to the user but in your app you're like actually, the CMS or whatever you're building maybe it wants to have that kind of analysis in it. That's cool, yeah. Awesome, well Jonathan I think that pretty much covers it. I think this is a nice project and if you're out there listening and you need to conduct surveys or interviews that are slightly better, more controllable than like the standard SaaS products. This seems like a pretty good option.
56:53 Jonathan Pyle: Well, thanks for having me.
56:54 Michael Kennedy: Yeah, you bet. Now before we get out of here though I've got the two questions at the end of the show that I always ask you. So let me ask those two now. If you work on Docassemble, you're going to write some Python code, what editor do you use?
57:05 Jonathan Pyle: Emacs, I've always used Emacs. I don't think I'm ever going to do anything else. I basically live in Emacs. I also use Org-mode to manage my life and track my time.
57:13 Michael Kennedy: Okay, yeah, so you run the Emacs operating system basically?
57:17 Jonathan Pyle: Yeah, somewhat, yeah.
57:19 Michael Kennedy: Cool, and then you've talked a lot about cool and interesting and unique Python packages here. But maybe if you want to give a shout out to any additional packages that you think are great for people to know about.
57:31 Jonathan Pyle: Well, one that I really encourage people to use is a software called Lettuce which is a Python version of another package called Cucumber. So the idea of Lettuce is it's a testing platform that uses behavior driven design, I think it's called where you can express your tests in plain English and then it uses the Selenium package to do web browser automation or some other type of automation to then carry out those tests. So when I write an interview I also in tandem write a test script which is human readable using this Lettuce package. And the selenium package for web browser automation is the best thing ever. I've done web browser automation with lots of other tools but Selenium is amazing.
58:16 Michael Kennedy: Yeah, those are both really interesting, I love 'em.
58:18 Jonathan Pyle: Yeah, that's all the packages I can think of.
58:20 Michael Kennedy: Yeah, great. So final call to action. People want to get involved with conducting these interviews using Docassemble, what do you tell them?
58:27 Jonathan Pyle: I think they can check out the website and join our Slack channel and get involved in the Docassemble community. We also have annual conferences now called Dhaka Khan which take place in the summer every year. And the other thing I just think people, what I would like Python developers to do is get jobs at law firms and then sort of infiltrate and then find ways to automate what they're doing because there aren't enough people with programming skills in the legal field as a whole. And so I think as a result we're kind of behind the times.
58:59 Michael Kennedy: Yeah, that's good advice and it sounds like Docassemble is coming along strong. It's pretty wild that you already have a conference about it so very cool. Well, congrats on the project and thanks for being on the show.
59:09 Jonathan Pyle: Thank you.
59:10 Michael Kennedy: You bet, bye. This has been another episode of Talk Python To Me. Our guest in this episode was Jonathan Pyle and it's been brought to you by Linode and Datadog. Linode is your go-to hosting for whatever you're building with Python. Get four months free at talkpython.fm/linode. That's L-I-N-O-D-E. Datadog gives you visibility into the whole system running your code. Visit talkpython.fm/datadog and see what you've been missing. They'll in a free T-shirt. Want to level up your Python? If you're just getting started, try my Python Jump Start by Building 10 Apps course. Or if you're looking for something more advanced, check out our new Async course that digs into all the different types of Async programming you can do in Python. And of course if you're interested in more than one of these be sure to check out our everything bundle. It's like a subscription that never expires. Be sure to subscribe to the show. Open your favorite podcatcher and search for Python, we should be right at the top. You can also find the iTunes feed at /itunes, the Google Play feed at /play and the direct RSS feed at /rss on talkpython.fm. This is your host, Michael Kennedy. Thanks so much for listening, I really appreciate it. Now get out there and write some Python code.