Deploy Your App: Announcing the Talk Python in Production book.

Building advanced Pythonic interviews with docassemble

Episode #229, published Thu, Sep 12, 2019, recorded Tue, Aug 27, 2019

On this episode, we dive into Python for lawyers and a special tool for conducting legal interviews. Imagine you have to collect details for 20,000 participants in a class-action lawsuit. docassemble, a sweet Python web app, can do it for you with easy.

Now, you may be thinking, I'm not a lawyer so this isn't for me. Hang on for a sec. docassemble is actually a general-purpose tool. If you've ever done anything with a site like survey monkey or Google forms, you could do something more advanced with docassemble.

Join me as I talk with Jonathan Pyle, creator and maintainer of docassemble.

Episode Deep Dive

Guest Introduction and Background

Jonathan Pyle is a practicing lawyer at a nonprofit legal services organization and the creator of DocAssemble. He has a long history of doing programming “on the side,” starting with BASIC on a monochrome IBM PC, moving into Perl, and eventually adopting Python. Jonathan built DocAssemble to solve the significant challenge of collecting structured legal information from clients in an accessible, powerful way—and to do so in a general-purpose manner.

What to Know If You’re New to Python

If you're coming to this discussion without extensive Python experience, focus on these points:

  • Python’s clean syntax makes writing “logical rules” for interviews simpler compared to many other languages.
  • The conversation references common Python web patterns (Flask, Celery, Docker), but don’t worry if you haven’t used them yet—understanding the broader concepts is enough.
  • A key takeaway is that Python’s readability helps non-programmers (like lawyers) translate complex processes (like legal rules) directly into code.

Key Points and Takeaways

  1. DocAssemble: A General-Purpose Guided Interview Framework DocAssemble is a free and open source platform primarily designed to collect complex data in a “TurboTax-style” interview flow. Although Jonathan’s initial focus was on legal documents, DocAssemble can apply to any structured interview scenario—surveys, mechanical diagnostics, and more. It leverages Python logic under the hood to dynamically decide which questions to present.
  2. Why Lawyers Need Better Automation The legal profession often relies on manual data entry, time-consuming processes, and repetitive tasks. Jonathan explained how simple scripts and regular expressions often saved his organizations huge amounts of time and money. DocAssemble fits into this landscape by automating data collection and even generating fully populated legal documents, such as bankruptcy petitions.
    • Links / Tools:
      • Upsolve.org – Example nonprofit using DocAssemble for bankruptcy filings
  3. Python’s Readability for Non-Programmers Jonathan noted that Python’s “pseudocode-like” syntax is an essential ingredient for lawyers or other domain experts to begin automating tasks themselves. Many alternative “declarative” languages exist, but Python’s large ecosystem, plus its clean style, make it a practical choice.
  4. Interview Logic via Name Errors and Dynamic Evaluation DocAssemble cleverly uses Python’s NameError or missing attributes to determine what question to ask next. Whenever the code references an undefined variable, DocAssemble triggers a user prompt to define that piece of data. This short-circuit approach leads to highly flexible branching logic.
  5. Beyond Legal—Surveys and Advanced Flows Although DocAssemble originated for legal interviews, it’s truly general-purpose. Jonathan gave examples of using it for mechanical diagnosis and even advanced research surveys. Anywhere a user needs to answer a lot of potentially branching questions, DocAssemble can manage the workflow.
  6. WYSIWYG and Document Assembly The platform provides two main ways to generate final documents: (1) Markdown/Pandoc-based PDF generation and (2) MS Word templates processed via Jinja2 templating. Users can even install a Word add-in that lets them insert variables directly into the document, bridging the gap between typical Word usage and dynamic data generation.
  7. Docker, Redis, and Flask Under the Hood DocAssemble is distributed through Docker images due to its multiple service dependencies (Flask web server, Redis for caching/socket management, Celery for background tasks). Users can scale their interview system by running more containers, offloading data storage to services like AWS S3, and upgrading easily with containerized deployment.
  8. Chat Support and Screen-Sharing A unique feature is the live chat and remote screen-control interface built with WebSockets. This allows a support agent or lawyer to see a user’s interview screen (via HTML diffs, not full pixel streams) and even take over to help fill out fields in real time, all from the browser.
  9. Security, Encryption, and Privacy Because sensitive personal details are often handled, DocAssemble integrates Let’s Encrypt for HTTPS certificates, offers server-side encryption of interview data, and can incorporate multi-factor authentication and password-based data encryption. Jonathan emphasized that lawyers feel more comfortable with these added layers of security.
  10. Testing and Behavior-Driven Development Jonathan uses the “Lettuce” framework (a Python BDD tool similar to Cucumber) plus Selenium to test guided interviews. This approach allows domain experts to write human-readable test steps (“Given I’m on the homepage… When I click…”), bridging the gap between pure coding and subject-matter verification.
  1. Open Source Contributions and Community DocAssemble is open source, but Jonathan found that while many organizations adopt it, few users actively contribute code. He encourages more experienced Python developers to get involved, help with pull requests, and shape the tool’s future, especially if they’re curious about the intersection of law and technology.

Interesting Quotes and Stories

"DocAssemble is actually a general-purpose tool. If you ever have done anything like run a survey on somewhere like SurveyMonkey or created a Google Form to gather a bunch of information, you could do something way more advanced with DocAssemble." — Michael Kennedy

"I got into programming longer than I can even remember. My dad had an IBM PC with a monochrome monitor green screen, and I was just magnetically attracted to the thing." — Jonathan Pyle

"The intersection of law and coding is not very big, but there actually are a fair number of people doing it." — Jonathan Pyle

Key Definitions and Terms

  • Guided Interview: A step-by-step questionnaire that uses prior user input to direct future questions (like TurboTax for legal or survey data).
  • Behavior-Driven Development (BDD): A software development process that encourages collaboration between developers, QA, and non-technical participants by writing scenarios in human-readable language.
  • Pickle: A Python module that serializes and deserializes objects to/from byte streams.
  • Docker: A containerization platform that packages applications with all dependencies for consistency across environments.

Learning Resources

Overall Takeaway

DocAssemble shows the power of Python’s readability and flexibility in a space not normally associated with complex software solutions: the legal industry. Jonathan’s work illustrates how an accessible programming language, combined with solid tooling, can revolutionize data collection and document generation. Whether you’re in law, finance, or simply need a next-level survey tool, DocAssemble offers a fast, open, and extensible path to building sophisticated, interview-driven solutions.

Docassemble: docassemble.org
Python-docx-template: docxtpl.readthedocs.io
Pandoc: pandoc.org
Mako: makotemplates.org
Celery: celeryproject.org
textstat: pypi.org
Flask-SocketIO: flask-socketio.readthedocs.io
SQLAlchemy: sqlalchemy.org
Alembic: pypi.org
pattern.en: clips.uantwerpen.be
Lettuce: lettuce.it
docassemble on Twitter: @docassemble
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy

Episode Transcript

Collapse transcript

00:00 On this episode, we dive into Python for Lawyers and a special tool for conducting

00:04 legal interviews. Imagine you have to collect details for 20,000 participants in a class

00:09 action lawsuit. DocAssemble, a sweet Python web app, can do it for you with ease. Now,

00:14 you may be thinking, I'm not a lawyer, so this isn't for me. Hang on for a sec.

00:17 DocAssemble is actually a general purpose tool. If you ever have done anything like run a survey

00:24 on somewhere like SurveyMonkey or created a Google Forms to gather a bunch of information,

00:29 you could do something way more advanced with DocAssemble and control the workflow with Python

00:34 in a really creative and unique way. Join me as I talk with Jonathan Pyle, creator and maintainer

00:39 of DocAssemble. This is Talk Python to Me, episode 229, recorded August 27, 2019.

00:58 Welcome to Talk Python to Me, a weekly podcast on Python, the language, the libraries, the

01:03 ecosystem and the personalities. This is your host, Michael Kennedy. Follow me on Twitter

01:08 where I'm @mkennedy. Keep up with the show and listen to past episodes at talkpython.fm

01:13 and follow the show on Twitter via at Talk Python. This episode is brought to you by Linode and

01:18 Datadog. Be sure to check out their offers during their segments. It really helps support the show.

01:23 Hey folks, before we get to the interview, I have some exciting news. We've teamed up with

01:28 Humble Bundle to launch a great bundle of Python educational goodness. For a couple weeks, you can

01:34 get three of our courses along with great content from RealPython, PyBytes and many others for as

01:39 little as just $1. If you've been on the fence about trying one of our courses, here's a chance to get

01:44 three of them along with a bunch of other great stuff. Just visit talkpython.fm/HB2019.

01:51 That's HB2019. And be sure to check it out before time runs out. Now let's get to that interview.

01:57 Jonathan, welcome to Talk Python to Me.

02:00 Thanks for having me.

02:01 Yeah, it's great to have you here. We're going to cover a topic that we have, I believe, yet to cover

02:06 really at all. And that is Python for lawyers or writing code that helps practicing lawyers do their

02:14 work. That should be a lot of fun to talk about.

02:15 Yeah, the intersection of law and coding is not very big, but there actually are a fair number of

02:20 people doing it.

02:21 Yeah, there are definitely some people doing it. And it's like I was saying before we hit record,

02:25 that a good friend of mine is a lawyer who does a lot of Python. And so I know that there's some

02:31 areas where there's really interesting stuff happening, but it's maybe not as easy to apply

02:37 programming to the problems as say, like chemistry or something like that, right? But it's, I think

02:43 there's still a lot of really interesting things going on there. And I'm looking forward to diving

02:45 into them with you. But before we get to those, let's start with your story. How'd you get into

02:50 programming in Python?

02:51 I got into programming longer than I can even remember. My dad had an IBM PC with a monochrome

02:58 monitor green screen and I was just magnetically attracted to the thing. It had like basic in

03:04 ROM BIOS. And so I would just go over and type programs into it, which I found by reading like

03:11 compute magazine and byte magazine, which in the eighties you could buy in the supermarket and they

03:15 had source code listings and you could type them in.

03:17 Do you remember when you would go to like bookstores and stuff and there was a whole

03:21 computer section of these magazines and these books and like computer shopper for building

03:26 computers out of parts and yeah, it was a different time, right?

03:29 Yeah. I think the books were just kind of more interesting back then.

03:32 And a lot of real do-it-yourself stuff.

03:35 Yeah, for sure. Okay. So you started typing these in like a fair number of folks have to kind of

03:41 get started programming. Like, okay, well, it says if I type this, like I get a little game or a little

03:46 something on my program, on my computer, because there was no internet, right? Like

03:49 this human transcription was how programs got transferred.

03:53 And it was great in basic because you could do interactive stuff very easily. There's this

03:57 command called input, which got you a line of input. And that's kind of hard to do in a lot

04:02 of languages these days because the interfaces are more complicated. So I started with basic,

04:06 but then as I got older, I taught myself C and assembly language and tried to do some real low-level

04:13 stuff on the, on the IBM PC. But then I went off to college, majored in physics, and then went to law

04:18 school and became a lawyer. But I never really got away from computer programming because wherever I

04:23 went, I just found applications for it because people were doing stupid things. And I thought,

04:28 hey, just write some code that gets around this problem. So I was like using OCR text and converting

04:35 it into spreadsheets using regular expressions in Perl and saved a client like $500,000 by doing that.

04:42 It's kind of weird. So, so I just sort of randomly found myself coding, you know, a good percentage

04:48 of my time as a lawyer just because I could do it. And so I've been using Linux and Perl and like

04:54 just random scripts for about 20 years, but I didn't start learning Python until I decided to start with

04:59 Doc Assemble, which was about four years ago.

05:02 Yeah. Doc Assemble is a really cool project, but before we get into any of that, you know, you talked

05:07 about converting OCR text into spreadsheets using regular expressions. You know, that's not

05:12 super easy to do, but it feels like you can kind of piece together some of these libraries and these

05:16 tools to make it happen. And it's one of these examples of, you know, this rant that I'm always

05:21 on is that we don't necessarily need 10 times more programmers, but all the people out there with

05:27 their specialty, they could have a little bit of programming skill. They could solve like major

05:32 problems in their area like law or biology or whatever. If you just say, you know what, this problem

05:38 actually is really easily solved or pretty easily solved with a little programming if you just knew

05:43 what to do, right? It sounds like that's a great example of it.

05:45 Yeah. And I think just because of my background, I was the only person in the entire firm who knew what

05:51 a regular expression was and even knew that there was a solution to this problem because everybody else

05:56 would just be like, oh, we need to hire some low cost labor in Vietnam to just type everything in

06:02 manually. And I was like, seriously, there's got to be a cheaper way.

06:06 Well, and more reliable and faster, right? Like, so suppose like you're like, you know,

06:11 we actually have this firm in Vietnam and these folks are really good. We can get this done. If it's that

06:16 much text that you're talking about, does it really make sense to wait the month for them to type it in?

06:21 And then you're always a little worried maybe about like a mistake slipping in, right?

06:25 Yeah. But I think the way that the legal field has adapted is that they kind of are not so

06:31 worried about mistakes. They kind of understand that they are to be expected. But yeah, there are a lot

06:38 of people who want to hold you accountable for every single OCR being absolutely 100% correct. And that

06:43 results in a lot of waste of money.

06:45 Yeah, I can imagine. All right. So that's the background and whatnot. How about day to day? What

06:51 do you do? Are you still working as a lawyer now?

06:53 I'm still working as a lawyer. I don't work at law firms anymore. I've been in the nonprofit legal

06:59 services world for about 10 years. So I work at this nonprofit called Philadelphia Legal Assistance,

07:05 which provides free legal services to low income people in civil matters. So nothing criminal,

07:10 but things like child custody, how to get welfare benefits, how to get out of mortgage foreclosure.

07:15 And I have a job there in management and administration. I'm responsible for compliance with

07:20 government regulations. But because of my computer background, I actually have sort of changed the

07:27 way that we do business here. And so I automate a lot of the stuff that would otherwise be done manually.

07:31 And there are a lot of grants available for tech work in the legal aid space to help low income people.

07:38 And so I've worked on a number of grants trying to bring computer programming and data analysis into

07:44 the legal aid world so that we can better advocate for low income people as a whole.

07:49 That sounds like a really good use of programming and legal skills. Now, I don't know very much about

07:56 nonprofit legal services. I certainly understand how for-profit legal agencies work, right?

08:05 But just give me a real quick sense of, you know, like how do businesses like that even run for folks who are not in the legal field and know about it?

08:12 The biggest thing is government funding. And in the United States, the Legal Services Corporation is a

08:18 federal government agency that hands out money to every county in the country. So the low income people have some

08:26 law firm that they can go to and get help for free.

08:29 And they've been a big supporter of technology. In fact, they, I think the legal aid nonprofit world is more

08:35 technologically forward thinking than the big law firms that have all the money in the world,

08:40 because we understand that it has benefits to helping low income people get answers to their legal questions.

08:47 Yeah.

08:47 So we've been pretty innovative in the, in the legal aid nonprofit world.

08:51 That's cool. Of course, you know, thinking about it, right? The, the large legal firms,

08:56 their incentives are not necessarily aligned with automating and massively speeding up some of these

09:05 actions, right? Like if I can push a button and get an answer in a couple of milliseconds,

09:10 they can't bill for that. Right.

09:12 I mean, I think they're very slowly. There are some price pressures on those, on the big firms.

09:18 And clients say, I'm not willing to pay for legal research because I know that this can be done with

09:24 computers and AI. And so I think they're eventually going to come along, but yeah,

09:28 they don't have any big incentive to save the clients much money.

09:31 Once they get the job, right? Like I can definitely see that in the competition. Like,

09:35 Hey, we could do this for half price of those guys because we're not going to charge you for this

09:38 thing. We automated, but yeah, it's, I guess it's a mixed bag. Interesting. So let's talk about

09:44 this project that you created called doc assemble, which it's a really nice project. What is it? And

09:53 what is it used for? Let's start there.

09:54 So it's a free and open source platform for developing guided interviews. And if you don't

10:00 know what a guided interview is, think about turbo tax. It's something that asks you one question at a

10:05 time. It can get very detailed. It can be very long, but at the end of it, you get say a tax return,

10:12 or in other contexts, you might get a legal document that you can file in court,

10:16 or you could even get just like legal advice at the end, like legal information or be directed to

10:21 a resource or do an application. There are a variety of guided interview software packages out there,

10:27 but none of them were free and open source or did what I want. So, so I created doc assemble because

10:32 it would serve my purposes and it'd be something that we could develop and crowdsource through the free

10:39 open source software movement. Yeah. It's, it's really nice. And it's a beautiful looking site.

10:43 I don't know too much about the legal websites, but I know like some of these gather your signature

10:51 sites and stuff, you go to them and they just look so bad and so old school and sketchy. You're like,

10:58 I really don't know if I want to put personal information into this thing. So it's nice that

11:02 this is a good looking web app that, you know, inspires confidence, I guess.

11:06 Yeah. I don't quite understand why people pay $30 a month for some service that gathers the

11:12 signature. Cause I figured out how to do that with a canvas element in JavaScript. It was really easy.

11:19 I don't really, I don't really know what the big deal is.

11:21 Well, see that's part of that programmer skill, right? Like, yeah.

11:24 Yeah. So if people want to experience this and get a sense for what it's about, I guess the first

11:30 thing is you can drop over at docassemble.org, right? And then there's a, at the very bottom,

11:35 you can run a demo and then it just goes through and ask you questions. Like, what is your name?

11:39 And then you hit it in you, what is your location? And then possibly based on what state you're in,

11:45 it might ask you different questions, right? It can kind of flow different questions together

11:49 based on your answers, right? Yeah. And while you can do some of that stuff in like SurveyMonkey

11:55 or Google Forms, this is kind of like the very advanced version of that, where you might have

12:01 incredibly complicated logic that would be so very difficult to manage if you were trying to

12:06 like hand code endpoints in a Flask app, for example, you'd just have so many endpoints and be

12:13 hard to keep track of them. But it's sort of a system for abstracting away that the logic and making

12:19 it easy and maintainable to go from what the law is or what the domain knowledge is to an interview

12:26 that gathers information. Yeah. And I guess it's worth pointing out and you do on your website or

12:31 the GitHub repo, I can't remember about, I read somewhere from your documentation that this is

12:36 developed in the context of a practicing lawyer, but it is not specific to law, right?

12:42 Yeah. I think one of my first users who found it on GitHub was in France and they were using it to

12:50 diagnose problems and mechanical equipment. So, you know, anything that's amenable to asking one

12:56 question at a time where you don't want to have to like hand code all of those screens, you could use

13:02 this system for. Yeah. I can see this almost like tech support even, right? Like in that context,

13:07 does the machine turn on? Yes or no? Yeah. Does it, does smoke come from it? Yes or no?

13:12 First turn it off. Okay. Now what's the next question? Yeah. That's, that's pretty interesting

13:16 actually. And it definitely seems more flexible than SurveyMonkey and you know, all those things are

13:21 commercial services that are SaaS and you can just take it the way it is or leave, right? And this is

13:27 obviously written in Python, something you can download from GitHub and customize.

13:32 And people do like to customize. So a lot of people don't like my standard bootstrap front end. And so

13:38 they, they write their own CSS. So there's really no limit to your ability to customize if you know

13:45 how to write JavaScript or know how to write Python. That's one of the nice things about open source

13:50 software is that like, I have no problem with ultimate extensibility. Yeah. That's super cool.

13:55 And it's, you know, a Python web app based on bootstrap. So that probably means that all the

14:01 fancy, nice bootstrap themes that you can find over at like wrap bootstrap or start bootstrap, or,

14:06 you know, probably a bunch of others that you don't know about. You can go find all these either

14:10 super cheap, like 10, $20 themes, or even free and open source ones. You can probably plug those in

14:16 and get the look and feel you want, right? Yeah. And that's kind of why I picked bootstrap

14:20 because it's widely used and it's themable. And there are a lot of different options if you don't

14:26 like the standard look and feel. Cool. So I definitely want to dig into the technical side

14:30 of things, but maybe just another quick question or two of the high level to set the stages. So

14:35 you're working, helping out these folks at the legal nonprofit. How do you use this in your day-to-day

14:42 job? Well, I'm so busy maintaining the platform and working my day job that I don't really have that

14:48 much time to deploy stuff, but I have been working on a very complicated interview that asks all the

14:54 questions necessary to help somebody file for bankruptcy. And that's primarily being done by

14:59 a nonprofit called Upsolve, but they're one of our sub grantees. So if you check out Upsolve.org and see

15:06 what they've done, they've kind of democratized chapter seven bankruptcy for the nation. Whereas before you

15:13 would have to hire a lawyer for $2,000 or try to find a pro bono lawyer to do it for free,

15:18 which is very difficult.

15:19 Which sounds horrible when you're literally filing for bankruptcy. You're in a financially bad place and

15:24 then you got to go pay to dig out from the hole, right? Which is rough.

15:29 Yeah. It costs a lot of money to be poor. You can go on Upsolve site and you can go through a very long

15:35 guided interview that's using Doc Assemble and it gathers all the information necessary for an 80 page

15:41 bankruptcy petition. And then they have a lot of other custom code after that, but the questionnaire

15:46 that they do is based on Doc Assemble. So I do get to work on some of that during the day,

15:51 but I also use the system in legal aid to do things like gather retainer agreements from clients. I can

15:57 send them a link. They can click on it with their smartphone, sign their name with their finger and

16:02 the signature goes into the document. So stuff like that is pretty useful.

16:06 That's really cool. And I'm, it seems like it really should be used a little bit more even

16:11 outside of the legal space. Cause it seems, seems quite interesting. So maybe one of the things that

16:17 we could talk about is just why Python is not like you were afraid of other languages, right? You

16:23 obviously did a bunch of C in assembly language, right? Like that's a pretty hardcore language.

16:29 So why do you choose Python for it?

16:31 Well, when I started the system, I was sort of a Perl hacker and I loved Perl, but my idea for the

16:38 system was I want to make this into this high level language so that you can basically code the law and

16:46 that a lawyer could sit down with minimal knowledge of computer programming, like just to do if else

16:52 statements that set true false variables, for example. I mean, that is not rocket science. And I wanted them

16:57 to be able to understand code and read it and work with it. Maybe they would get help from somebody to

17:03 clean up the syntax. But, so I was looking around for something that was very clean and readable.

17:08 Perl is great, but it has so many punctuation marks, whereas Python is like so neat and clean.

17:15 And also, I saw these books in the bookstore where it was like teach Python to your kid and

17:21 like integrate Python with Minecraft if you're, you know, 12 years old.

17:25 Yeah.

17:25 So I thought, well, if this language is good enough for six year olds, then, you know,

17:30 it's good enough for a lawyer who has an advanced degree. so I thought that would be a good

17:35 general purpose programming language to base this on. And I definitely wanted a general purpose

17:40 programming language. A lot of people argue with me and they say, oh, if you're encoding legal rules,

17:45 you really should have a declarative programming language. But the problem is that all those

17:50 declarative programming languages were developed in academia. They don't really get used much

17:55 in the real world and they don't have loads and loads of packages that you can just install if you

18:00 want to integrate it with Slack, for example.

18:02 Give us some examples of the declarative languages you're considering or people were suggesting.

18:07 Oh, I don't even remember them because I didn't give them too much of a thought, but everyone

18:12 that I found it, like, first of all, it was not that easy to read because it would use like weird

18:18 Greek notation or whatever.

18:20 Yeah.

18:20 I also read a article where they had tried to do some of the stuff in the eighties using one

18:25 of these declarative languages. And they found that the attorneys just like first wrote out procedural

18:31 code and then converted their procedural ideas into this declarative syntax. So I thought,

18:37 well, maybe this whole declarative stuff is really just something that's academically

18:42 interesting, but the way that people think is more aligned with the way the general purpose

18:46 programming languages that are procedural actually work.

18:49 Yeah.

18:49 Yeah.

18:49 And Python has a really...

18:51 But there's a big debate.

18:51 Yeah.

18:52 I can imagine. So Python has that joke, I guess. I don't know if you've seen it. It says it has

18:59 like a little paper or file or something with some, a pseudocode. It says, how do you convert

19:04 the pseudocode into Python? Like they put dot PY on the end of the file.

19:08 Yeah.

19:09 Python is one of the languages that's closer to the pseudocode in the way that people might

19:14 like sketch it out in words, not flow charts, but like little words, right? Or statements.

19:19 So it's pretty nice as opposed to, I don't know, Java or C# where you've like, well,

19:24 now we go create a class and you put the public static main void in here to get started. Like,

19:28 whoa, what is all this?

19:29 Yeah. And I remember when I was a teenager, I was reading a lot of Donald Knuth who did

19:33 LaTeX and stuff. And he had a whole book on literate programming. And I was really inspired

19:38 by that. Like, can we make programming as much like English as possible? And I think Python

19:43 really does that to a big extent.

19:44 Yeah, for sure.

19:45 Some people want me to use JavaScript and JavaScript is very nice, but like iterating

19:51 through an object or a dictionary, it takes like all this nonsense in JavaScript.

19:57 Definitely takes nonsense.

19:58 Whereas in Python, it's so simple.

19:59 I think JavaScript is nice in that it is super executable in lots of places, right? Like

20:06 we all have browsers, you can run it, but that doesn't necessarily mean it's a nice language.

20:10 If you take away its execution story, right?

20:14 Yeah. It's not horrible, but...

20:16 No, no, no.

20:17 There's certainly things that are worse.

20:18 But Python just has so many advantages.

20:19 Yeah. Especially when your goal is to write, create a way for folks who are not programmers

20:25 to write simple logic into it, right?

20:29 This portion of Talk Python To Me is brought to you by Linode. Are you looking for hosting

20:34 that's fast, simple, and incredibly affordable? Well, look past that bookstore and check out

20:39 Linode at talkpython.fm/Linode. That's L-I-N-O-D-E. Plans start at just $5 a month for

20:45 a dedicated server with a gig of RAM. They have 10 data centers across the globe. So no matter

20:50 where you are or where your users are, there's a data center for you. Whether you want to run

20:54 a Python web app, host a private Git server, or just a file server, you'll get native SSDs

21:00 on all the machines, a newly upgraded 200 gigabit network, 24-7 friendly support, even on holidays,

21:06 and a seven-day money-back guarantee. Need a little help with your infrastructure?

21:09 They even offer professional services to help you with architecture, migrations, and more.

21:14 Do you want a dedicated server for free for the next four months? Just visit talkpython.fm/Linode.

21:20 One of the things that I thought was interesting was your use of YAML and Markdown. Now, Markdown,

21:28 I kind of expected, like that's not super surprising. But if you want to create one of these interviews,

21:33 you might want to ask the question like, what is your name and what is your age? Oh, and the age has

21:38 to be an integer and things like that. It has to be a number. So maybe talk a little bit about how you're

21:43 using YAML to let people create these interview flows. Well, I picked YAML in part for the same

21:49 reasons I picked Python, because it was machine-readable and human-readable at the same time.

21:55 Like rather than use JSON as a way to structure things like lists and dictionaries, I thought it

22:01 made sense to use YAML just because it had the minimum of punctuation. And also attorneys are used

22:07 to doing outlines because when they go to law school, that's how you study is by creating an outline

22:13 of the subject. And I just thought YAML looks so friendly because it's just like bullet points

22:18 made of hyphens. So that's why I settled upon that. I needed something that wasn't just code. It was more

22:23 of a data structure. Yeah, it looks really nice and clean. And I can certainly see if you give like a

22:29 little template example to somebody, they're like, oh yeah. Yeah, I try to teach by example.

22:33 Yeah, that's definitely a good way to do it. Yeah. So you can have like a dropdown with the list and

22:38 really, because that's pretty easy, right? And it's actually not that different than markdown,

22:42 like the dash, dash, dash item. And YAML is also would work in markdown as well. That's pretty

22:47 interesting that they're kind of similar. Yeah. And I chose markdown just because I didn't want a lot

22:52 of HTML characters. And also it can convert into so many different forms. So markdown is used sometimes

22:59 for documents that get turned into PDF, but it's also used for stuff that appears on the screen.

23:04 So it's a very flexible way to format stuff. Yeah. And the thing that I like about markdown,

23:09 like the reason I use it a lot is you can use other formats that are maybe richer, like

23:13 HTML fragments and stuff. But if the slightest thing goes wrong with it, everything is wrecked,

23:20 right? Yeah. It's so bad. And you've also got the potential problem of user input that could be

23:27 malicious, right? If you're accepting this like definition from someone else, right? But if it's

23:32 markdown, it's pretty safe. Yeah. Now, another thing that you do that I think is pretty interesting is

23:37 you can define these questions like, what is your favorite number? And it defines a variable like

23:44 best underscore number potentially. And it has a data type and so on. But then you can write Python code

23:50 that has conditionals, right? We talked about, so if I said I was in Oregon versus a different state,

23:56 it might ask me a different question because the rules in Oregon are different than they are in

24:00 Pennsylvania, for example, right? Right. And so you've got this little example here that says

24:06 something like, if user.iscitizen or user.islegalpermanentresident, user.islegible equals

24:12 true.else, user.islegible equals false. Now, that alone will actually sort of trigger some of the

24:20 questions that get shown or a flow in which the questions are asked. And so that sounds like magic.

24:27 How is this simple Python conditional and little tests like that that I'm writing actually controlling

24:33 the flow and the questions? Yeah. So the core logic engine of Doc Assemble is that it tries to evaluate

24:40 some Python code and then it traps any name errors. So if you go along and you write some Python code and

24:48 you refer to a variable that hasn't been defined yet in the namespace of the interview answers,

24:55 then it triggers a name error, which is just like core Python stuff.

25:00 Right. If I say user.iscitizen and that the user doesn't have a .iscitizen, obviously,

25:05 like you might get an error or something like that, right?

25:07 Yeah. That would be an attribute error. But if you refer to just a name that's not defined,

25:11 like I trap that. If user itself is not defined, for example.

25:15 Yeah. If user is not defined. Then I wrote code that then takes that variable name and then goes

25:21 and looks for a question in that YAML file that offers to define that variable. So it's sort of like,

25:28 and then it goes in, once it sets that variable, which might take a question to the user,

25:33 then it evaluates it again from the start.

25:36 That's funny. So it like runs it and it goes, oh, we stopped, we crashed on users. Do we,

25:40 we bet we have to ask the user question. Then you have the data value set for the user and you ask it,

25:45 you like rerun the Python code again, you get a little farther, you're like, oh,

25:49 yeah, is eligible as a attribute error. We got to figure out if we got an, is eligible or whatever,

25:53 right?

25:54 Yeah. So it's like every time the screen loads, it reevaluates everything from the top,

25:58 you know, which is moderately inefficient, but with computers, you know, it's very fast.

26:03 It's yeah. Yeah. Yeah. And so at the end of it, like it, once, if it gets all the way through,

26:07 then you're done with the interview. You have all the information you need. You've,

26:11 you've gone through the logical paths that you need to go through. And the nice thing about Python is

26:16 that if a user is citizen or user is an eligible alien, you know, it'll stop at a user is citizen

26:27 and it won't even try to evaluate the second part. So it won't trigger any name errors or any other

26:35 errors on the part after the or. So therefore the interview can be parsimonious about what it asks,

26:41 what questions it asks of the user. So the user is only asked for information that is logically

26:46 necessary. And that's all done sort of by tapping into the way that Python is parsed and evaluated.

26:52 Yeah. Interesting. Because Python short circuits the or, you might not have to ask them both,

26:58 are you a citizen and are you a legal resident? And it's two separate questions. Like only if they say,

27:04 no, I'm not a citizen, do you ask them about the residency?

27:06 Yeah, exactly.

27:07 Huh. All right. So this is pretty interesting. Like this is not how normal Python works,

27:12 but this is a pretty creative. And I would say in this context, really a positive way to build like

27:19 an extensible way to script out this flow. Like that's definitely better than, you know, flowchart,

27:24 draggy droppy backend or something.

27:26 I still have a lot of people who would prefer that I created a flowchart GUI interface for them. But

27:32 I just find with any service that offers a no code solution, what that means is the easy stuff is easy

27:40 and the moderately difficult stuff is nearly impossible. Whereas with code, the easy stuff is

27:46 kind of hard and the moderately difficult stuff is a little bit harder and really, really complex stuff

27:52 is doable. And so I'd rather have the latter.

27:55 Yeah, of course. And that makes a lot of sense because if you just want to ask like three questions

27:59 and whatever, it's like SurveyMonkey or something like that, right?

28:02 Or a Google form or, you know, you name it, right?

28:05 I like the idea of attorneys just being able to concentrate on specifying the law in if-then-else

28:13 statements and not having to worry about interview flow. Like they don't need to really think about

28:20 like what question is asked when. They can just concentrate on what they're good at, which is

28:25 envisioning the law and just writing it out and let the computer do the work of figuring out what

28:30 questions to ask in the order.

28:31 Yeah, it's really creative. You just rerun it over and over until it stops crashing.

28:36 Yeah.

28:36 You just like work your way through. I think that's pretty creative and I've not seen anything like

28:40 that before.

28:41 And I was giving the example of a name error where you refer to a name that doesn't exist,

28:45 but then in order to do attributes and indexes, indices, I had to create a new object that would

28:53 raise special exceptions because Python variables, and this is one of the limitations of Python,

28:58 like Python objects are not self-aware. Like they don't know their own name. They're just kind of

29:03 like a value that has pointers to them somewhere in the core. So I had to sort of give each object

29:11 an inherent identity. And so that confuses things, but it all works.

29:16 Yeah. Yeah, I know. It sounds like it works pretty well.

29:19 So you define these interviews in YAML files and then you define the flow by like stating the law or

29:25 your desired sort of flow of the interview in Python. And then it just doc assemble just pieces it all

29:31 together and makes it real.

29:32 Yeah. And at the end, you just write some logic that where the endpoint is presenting some final

29:39 screen to the user. And doc assemble just kind of uses dependency satisfaction to ask all the

29:45 appropriate questions in order to get there.

29:47 Yeah. And of course this wouldn't work with some kind of compiled language, right? Like a compiled

29:52 language would have to try to compile and then run it. And it would have to have all the elements

29:56 available at compilation time, not just the ones that it's trying to crash into as it makes its way

30:02 on the branches.

30:02 Well, there are people smarter than me who've built things like the, I don't know how to pronounce

30:07 it, the Jinja 2 templating system. I think what they do is they actually like parse out all of the

30:13 variables that are used and then creates these like stand-in objects for them. I think there are some

30:20 ways to sort of compile everything and then do the logic on it later. But I've just used the sort of

30:25 exception trapping system.

30:27 Yeah. Yeah. It's interesting. So on the website, which I said, it really presents things pretty

30:33 nicely for this open source project. You have a bunch of features and I think it might be worth going

30:38 through those features and then like digging into the technology behind it as a way to see more

30:42 of the various libraries and technologies that are at work here.

30:47 Sure.

30:48 Sound good?

30:48 Yeah. So let's just like start at the top one. It says you have a whizzy wig, what you see,

30:53 what you get at her and you can compose your templates as a Word document using a Word add-in

30:59 to get started. So how does that work? Like what do you, we talked about YAML being the definition

31:05 of these things, but then what's happening here?

31:07 Yeah. A lot of lawyers like to create Microsoft Word documents and very helpfully, there is a Python

31:14 package called Python doc X template that somebody developed. It uses two other packages. One is doc X,

31:21 which is kind of a utility for writing Microsoft Word files. And the other is Jinja 2, which I was

31:25 just talking about and just kind of mash them together by using Jinja 2 on XML because Microsoft

31:33 Word files are actually XML inside of a zip file. And so it created a system where you could do Jinja 2

31:39 on a Microsoft Word file. And so I also figured out that Microsoft had a pretty neat tool for putting an

31:47 add-in into Microsoft Word, both the online version and the desktop version that ran in a little sidebar.

31:54 So I was able to create a sidebar for Microsoft Word that had the variables in your interview and you

32:00 could just click them to insert them into your Word document. So that's how I was able to do some nice

32:06 Microsoft Word templating and to make it as WYSIWYG as possible. I'm actually not a fan of WYSIWYG. I put

32:12 that on the website because other people are. I would prefer that everybody used Markdown. I have another

32:17 document assembly tool that doesn't use Microsoft Word files that just converts Markdown to PDF.

32:22 And that's what I like better because it uses Pandoc on the backend, which in turn uses LaTeX.

32:28 I'm a big LaTeX fan. So yeah. Okay. Yeah. Sounds, sounds very interesting. We talked already about

32:34 gathering signatures. That's one of the things that it does. And it sounds like you're gathering those

32:38 with an HTML5 canvas in JavaScript. And then what you convert those to images or something like that?

32:44 Yeah. The canvas, I think it's transmitted as a PNG file encoded in a URL. It's like a base64

32:54 conversion and it turns, it's like a data URL. And so you just transmit that in a post request

33:00 up to the server. It's a, it's really pretty simple.

33:02 Yeah. I've never tried that, but that sounds totally simple. You also have live chat. So if you're hosting

33:09 an interview, like you're person receiving all the answers, right? You can assist users in real time

33:16 with even screen sharing and remote screen control. What's up with this one?

33:20 That was a lot of work. I figured out how to use WebSockets technology. There's a great package

33:27 called FlaskSocketIO that enables you to use this sort of WebSocket event-driven communications

33:35 protocol within sort of the Flask paradigm. And so I created this very responsive, quick chat system

33:46 where you can chat back and forth with somebody who's like works for your company, who helps out

33:51 users. And so you can get chat messages from users in real time facilitated by like Redis and this

33:58 socket IO system. And one of the cool things I figured out I was able to do with that is to have this

34:03 sort of pseudo screen share where I would store the HTML in Redis and then pull it down and let the

34:13 operator look at it. And it would be refreshed like on triggers in JavaScript. And so,

34:19 so you can just sit there as the operator and you can watch your users using the system

34:23 and it's not transmitting pixels. It's just transmitting HTML.

34:27 I see. So you're like, this is literally the DOM that they are looking at, right?

34:32 Yeah, basically it's inside of a little iframe. Yeah. I also figured out it was fairly easy for then

34:37 the, the operator to seize control of the user's browser just by sending over some events over the

34:45 WebSockets. So as the operator, you can click a button and control the other user's interview and

34:50 type stuff into their text boxes or click on their buttons. They can see you doing this in real time.

34:56 Yeah. It sounds pretty advanced, but it was actually pretty simple to implement.

34:59 No, it sounds really cool. And I haven't heard of too many things that are like at this level,

35:04 you know, I've definitely seen those little chat programs and stuff.

35:07 Yeah.

35:07 With the operators and whatnot, have some experience with that, but that sounds like a really cool

35:12 feature. I honestly didn't expect that to be in here.

35:14 Yeah. I thought it was a cool feature too, but like nobody is using it. I have no idea why.

35:18 Sometimes you create something and you think you're, you're going to get massive use and then nobody actually cares.

35:23 Well, yeah, that is always the challenges. And I suspect, you know, if I had to guess,

35:27 right, like this is super cool that you can help people this way, but you know, it's also

35:32 challenging to have a person sitting there that can always help. And I don't know, it's,

35:36 it's not a lot of fun to be a chat operator, to be honest. I've been one temporarily.

35:41 But I do think we need to get out of the paradigm of like, either it's a hundred percent,

35:46 a robotic service, or it's a hundred percent human service with the human touch. Like there has to be

35:52 some middle ground where a human gets involved when necessary, but otherwise they're using a web app.

35:59 So I think it's, it's good to have close contact with your users, at least for part of the time of

36:05 your app development, because then you, you really see what their pain points are in the real world.

36:09 Yeah, that's for sure. And you know, some of the stuff that you're doing is,

36:12 it's super important what the right answer is, right? It's not like, well, I clicked this thing

36:17 in this like spreadsheet app that you built and it didn't quite do what I wanted, right? This is,

36:21 you know, are you eligible for bankruptcy or something, right?

36:25 Yeah. And you know, you don't want your users to commit perjury when they're

36:28 talking to a court about what their property is, for example. So yeah, it's important to get the stuff right.

36:35 Yeah, exactly. And so I think the stakes are pretty high here. So that's pretty awesome.

36:38 This is a feature.

36:41 This portion of Talk Python to Me was brought to you by Datadog. Get insights into your Python

36:47 applications and infrastructures with Datadog's fully integrated platform. Debug and optimize your

36:52 code by tracing requests across web servers, databases and services in your environment. Then

36:57 correlate and pivot between distributed request traces, metrics and logs to troubleshoot issues

37:03 without switching tools or contexts. Get started today with a 14 day trial and Datadog will send you

37:08 a free t-shirt. Visit talkpython.fm/Datadog for more details.

37:13 Kind of along those lines, you also have SMS and email, which I guess that's pretty much to be

37:20 expected, but maybe tell us about that real quick.

37:22 Yeah. Isn't there some adage that every piece of software bloats until it can send and receive email?

37:28 Pretty much.

37:29 Pretty much.

37:29 So yeah, so my software does, you know, sending email is not that difficult, although I

37:34 found a way to do it with Mailgun, which uses HTTP because SMTP is just so slow. And then the text

37:43 messaging was also super easy. You know, you just get a Twilio API key and it's sending messages is very

37:49 easy. And then I have another feature using email that nobody uses where you can actually run a mail

37:56 server on your server and you can sort of email into your interview. So if you have an interview

38:02 session, you can, if you want, you can mail documents to it. And so I programmed the mail server to intercept

38:08 those messages and then sort of make changes in the appropriate interview session.

38:13 Yeah, that's cool. I've done that before as well. Like you can either go to the site and log in and

38:18 answer this or type it in or whatever, or you just reply to this email, right? And then it just folds

38:22 it back into the database as if you had done that. Yeah, that's a different level than just sending an

38:27 email, but it is cool and it works.

38:29 Yep.

38:29 Now you talked earlier about the web sockets and the live share and Flask and all that. Let's talk a

38:35 little bit about the hosting. We have Redis, we have Flask, we have some kind of database,

38:39 I suspect. This sounds kind of out of the realm of standard lawyer, technical Linux capabilities.

38:47 Yeah. And unfortunately it's, it's like not a pure Python package. The way I distribute it is

38:53 through Docker because it has so many, it has so many non-Python dependencies and services that need to

38:59 be running. And it's just, you can script the orchestration of that with Docker instead of giving

39:05 people complicated instructions to run and you can get it nicely containerized. So there isn't much

39:11 you can do with it as a pure Python package. Although I tried to abstract it away so that you could sort

39:16 of use the core logic engine and Python by itself. But yeah, that's Docker has been extremely useful

39:22 because I don't think I could have gotten anybody to use it unless installation was, you know, one line

39:28 of bash command or something.

39:30 Yeah. That's cool. You know, Docker is really interesting in this regard. Like a lot of times it solves these

39:35 problems, but it also, it has its kind of its own complexity. Like Docker never feels super simple to me.

39:42 Like, okay, well I can start this, but then how do I make sure it's running or how do I like update a new

39:47 version as a beginner? You know, there's those things always feel pretty challenging.

39:51 Yeah. That's one of the big challenges of distributing software that gets updated is,

39:56 as you have to take into account all the existing users. And so like with something like the SQL

40:01 backend, it was really helpful that I used SQLAlchemy. I don't know how to pronounce it.

40:07 Yeah, that's right.

40:07 And it also has this sort of add-on feature called Alembic, which gives you this method for upgrading

40:14 your SQL. If you wanted to add a column, for example.

40:18 Right. Yeah. SQLAlchemy is awesome in that it lets you write simple Python classes that map to your

40:22 database. It'll even create the tables and the indexes and the relationships. But boy,

40:26 if you change anything, it hates it. Right.

40:30 Well, I think with Alembic, it's kind of adjusts for that in a pretty elegant way. So I haven't had

40:35 problems.

40:36 Right. But if you don't have Alembic, right, your app crashes as soon as you make the changes. So you

40:40 need to go and create the migrations and then automatically apply them like the next time you

40:44 start the app and all that kind of stuff. Right.

40:46 Yeah. So I actually spent the whole last week taking a vacation and working on upgrading from

40:52 Debian stretch to Debian buster and Python 3.5 to 3.6 and migrating the web server from Apache to

41:01 Nginx or however you pronounce it. Yeah.

41:03 And all that Docker stuff does take a lot of time because you have to get it just right because you

41:08 don't want some user to be like, oh, my system crashed. What do I do?

41:13 Yeah. Well, what's nice, though, about Docker is you build the base images like you figure out how

41:16 to create a Debian server with Nginx set up correctly. It's good. Right. You don't have to

41:21 think about it. It's now it's good. It's all set up. Right. And so you just kind of build it later

41:25 at a time. But yeah, it's the real challenge I see is migrating that over over time. So what do

41:32 you tell folks if you say, well, we're going to give them a new version? What do you say that they

41:37 should do? There's two ways that you can upgrade. One is by just clicking a button.

41:42 that does a Python upgrade that just runs pip and gives you Python packages and installs in a virtual

41:50 environment and restarts the services that use Python. So that is pretty painless and doesn't

41:56 have a lot of errors associated with it. But sometimes you need to upgrade all of the

42:00 backend stuff. And the way that you would do that is by basically stopping and removing your Docker

42:09 container entirely and then running a new one with the new Docker image. But then the problem is, well,

42:15 what about all your users data? And so you have to have systems for using Docker volumes or the

42:21 recommended thing is using cloud services like S3 or Azure blob storage. And so I have these complicated

42:27 systems where every time you shut down the server, it backs up all the information to the cloud. And then

42:33 when you start it up again, it restores from the cloud.

42:35 Oh, that's nice.

42:36 And so that automatic backup, which is probably something that's also challenging for folks.

42:40 Oh, yeah. Yeah. And so I've got like cron job running on this Docker machine that does backups.

42:46 So people have run into problems. They tend to do crazy things that you would never anticipate.

42:51 But yeah, it does work pretty reliably to back up to the cloud that way. And the nice thing about

42:57 having everything sort of cloud-based is that it was very simple to make it scalable. Like it's

43:03 not a big deal to add another web application to your cluster.

43:08 And, you know, you can bombard your DocAssemble system with loads and loads of requests. And you're really

43:15 only limited by the speed of your SQL server or your Redis server.

43:19 Yeah. So do you have like one Docker container for Redis, one for SQL, and then like separate web front

43:24 ends, you can fire up more of them or something like this?

43:27 Yeah, you can do it that way. Or you can use like a hosted solution from your cloud provider for your

43:34 Redis or your SQL, which I actually recommend because they have nice backup systems built in.

43:40 Yeah, that's good. Just point into that thing. And they already know how to make sure it's safe and

43:44 fails over and whatnot.

43:46 Yeah. But the problem I have is that people who really have no experience with system

43:50 administration are trying to teach themselves Amazon web services and like multi-server systems.

43:56 They just get confused. The problem with making things easy for people is that you get all these

44:02 curious people who don't have enough experience to sort out problems. So like they run into an error

44:08 and I say, well, get yourself a command line. And they're like, how do I do that?

44:11 Yeah.

44:12 SSH to your machine. How do I do that?

44:14 I typed SSH. It doesn't work.

44:16 Yeah. Then you go down a very long and windy path.

44:25 I'm like, can you teach me how to do this? And I'm like, I learned how to do this over a period of 20 years.

44:32 How am I going to teach you?

44:33 Yeah. Well, I mean, that's part of the trick of like being a programmer or in technologies.

44:37 People look at you and they're like, you know, all these things. I think that means you're

44:41 either super smart or you have some super amazing way to learn them. Like, no, you went through the

44:45 same painful steps, but you just like layered on these skills one at a time. You're like,

44:49 yeah, SSH used to be hard, but I figured out how to get the keys registered and like everything.

44:54 That's not a problem. It's gone on to autopilot. Now I'm on to the next problem,

44:57 like database migrations or whatever. Exactly. Yeah. Yeah. It's tough to communicate all that.

45:02 I don't even really know how to do that all at once.

45:05 Yeah. And they expect it to work so perfectly. And I'm just like, don't you understand what

45:09 benefit you're getting from Docker? It's amazing. We didn't have this 20 years ago.

45:15 We used to do it the hard way. Yeah. So let's go, let's keep going down the thing. I think that's

45:19 interesting here. So you have multiple language support. So like I could conduct the interview

45:25 so the person could say, I prefer Spanish or I prefer English. And then they may get their

45:29 questions in different languages. Right. Yeah. I had a lot of features to help with translations and

45:36 multiple languages. You can have like multiple YAML files, one for Spanish questions and one for

45:42 English questions and use them all in the same sort of logical interview. You can also write

45:47 everything in English and then generate an Excel spreadsheet that has all the text in it and then

45:53 send that spreadsheet to a translator and they translate the English into some other language.

45:58 And then you load that spreadsheet into the system and it will substitute the English with the other

46:04 language. Yeah. That's cool.

46:06 So people are using it for some, for like five language interviews. And I think using the Excel

46:11 spreadsheet is a nice medium because that's what translators are comfortable with.

46:16 Yeah, sure. So another one that's interesting is extensibility. Obviously this is where it gets to

46:21 more of the developer side of things, right? You can use the power of Python to extend the capabilities.

46:25 Maybe give us some examples so we know what's going on here. Cause you also have APIs and

46:31 integrating with third party apps. Maybe talk about those two at the same time. Cause they kind of seem

46:36 the same, but not exactly. Yeah. I tout the extensibility just because everybody wants to do their own

46:42 idea and I can't anticipate all the features that they're going to need. So I just give them the power of

46:49 Python and then they can install a package to do whatever they want. So some people wanted to do

46:54 integrations with Google sheets. So I looked up and found that there is, you know, a package for that

47:00 on PyPy and I can show them how to do that and how to set it up. And so, so people have integrated all

47:07 sorts of things just by importing the package into their system. Yeah. Really nice. And then APIs for

47:13 integrate with third party stuff. There are a number of things that I have like a GitHub integration.

47:16 There's this authoring system where you can write your own YAML right in the web browser.

47:22 And I have a GitHub button that then runs get on the backend and does pushes and commits and stuff like that.

47:30 Yeah. Cool. And for the login system, I'm using the built-in Flask username and password system,

47:36 but a lot of people want to have social logins or Auth0. So I have some APIs that integrate with that.

47:44 People also want to, they like writing the stuff in the web browser, but they also want it

47:49 saved to their desktop. And so I have an integration with a Google drive so that you can press a button

47:55 and sync to your Google drive and then run your interview files that you just synced.

48:00 So there are a lot of different ways that DocAssemble talks to other applications that people like to use.

48:06 Yeah. Cool. Another one kind of related to that is you say you can package your interviews and use

48:11 GitHub and PyPI to share your work with the DocAssemble user community.

48:15 So can you create like extension packages or something like that? And then people can add

48:21 them as a dependency of the app?

48:22 Yeah. So the way that I have structured the system is you write your YAML, but you can also write

48:30 modules like .py files. And the way you package and distribute things is using just the plain old

48:37 Python packaging system. And it will create the Python package for you. And there's a button to upload it

48:43 to PyPI as well as one for GitHub. I'm just sort of using the exact same software distribution system

48:51 that already exists. But the YAML files and other files are just under a data folder in your Python

48:58 package folder system.

49:00 Yeah. Okay.

49:00 So it's great. I didn't have to invent my own package distribution system. I'm just like, do what

49:06 everybody else does on GitHub and PyPI.

49:08 Yeah. Awesome. It also has support for background tasks. Like even when people are not interacting with

49:13 the website, it could be running stuff in the background. Is that using Redis or how's that

49:19 happening?

49:19 That mostly refers to Celery. So Celery is a distributed task queue system in Python, which is

49:28 amazing because the problem with web applications is you have to do everything quickly or else the

49:33 browser is going to time out. The user is going to be sick of looking at a spinner and stop the

49:38 connection or something. But if you use Celery, you can have long running code execute in a separate

49:45 process and then save its results to the place where you have your interview answer stored.

49:51 And the other great thing is they queue up in there. So I have some cool Celery stuff like where if you

49:57 upload a PDF file, it makes a PNG image out of every single page and then sort of in parallel,

50:03 OCRs them for you using Celery and all of its queuing magic.

50:09 Yeah. That's cool. Just grab a PNG of the page and throw it up there and say,

50:13 I mean, you get a chance OCR this and store it here or something like that, right?

50:15 It's really useful because sometimes people have really long documents that they need to

50:20 assemble. And if it takes 30 seconds to do that, you want to be entertaining the user while that

50:27 happens. And so you put it into a background task and maybe have the user answer some other questions.

50:32 And then you just check, oh, is the task ready? And then if it's ready, then you get the document.

50:37 Yeah.

50:38 So yeah, that's been a real lifesaver.

50:40 Nice. I guess maybe the last one to touch on here is the secure bit with server-side

50:45 encryption and document redaction and things like that. Maybe you're using Let's Encrypt or

50:51 something else along those lines. I want to talk about those things.

50:54 Yeah. A big concern of lawyers is that they're getting client information. They're getting personally

51:00 unidentifiable information and they want some reassurance that whatever software they're using

51:06 is not going to reveal those personal details to the world. So there are a number of features that I

51:13 use to try to increase security. One of which is I have Let's Encrypt built into the deployment system.

51:19 So all you need to do is give it your email address and set up your DNS properly and it'll do Let's Encrypt

51:24 for you and it'll renew your Let's Encrypt certificates. It also does server-side encryption.

51:30 And the way that interview answers are stored, I'm letting the interview answers just be a Python

51:36 namespace, which is just a dictionary. And I pickle that using the pickle package. And then I encrypt it

51:43 with the user's password. Every time they contact the server, they send this sort of secret password and

51:49 I use that to encrypt and decrypt. So that password is never stored on the server. And so I can have

51:55 encryption in the, of the pickled, a serialized data structure or right there in the SQL database.

52:02 And then other features too, like a multi-factor authentication for login. That was very easy to

52:08 implement with the various apps and SMS messaging. And a redaction, like I figured out a way to put,

52:16 replace text with like blocks of black ink or whatever. So there are a lot of different ways

52:22 that people can feel secure. I haven't figured out the whole encryption of files on the file system

52:27 problem yet. So if you upload a file, that's not encrypted server side, but everything else is.

52:34 You know, the project is open source and you probably would accept a pull request that would

52:39 add that feature or something like that, right?

52:41 Oh, absolutely.

52:42 Speaking of which, are people, are you looking for contributors to this project? Are other people

52:46 already working on it with you? What's the story there?

52:49 Well, what I've found is that there's no magic to open source and crowdsourcing. Maybe there is in

52:55 other areas, but there are a lot of people who are using the system, some of which are programmers

53:01 themselves. But it's pretty rare that people contribute something substantive to the code.

53:08 So I've still found that even though like I never took any CS classes and I just do this on the nights

53:14 and weekends, I'm still doing 99.9% of the coding. It just hasn't magically happened that other people

53:21 contribute. So if there is somebody who wants to really dig into this and contribute, I would totally

53:26 welcome that. Although I think because I've had 99% control, I kind of have a strong feeling of

53:32 authorship. And so I might have strong opinions about what gets brought into the system.

53:37 Yeah, sure. I mean, one of the things that people do sometimes that can be frustrating is they see

53:40 something, they're like, oh, I should add this feature or fix this thing. And they'll create a

53:44 pull request and do all the work and then submit it. And the maintainer will say, well, that doesn't fit

53:48 with my view of this or whatever. And you're like, but I did all this work.

53:52 All right. So, you know, maybe people could open like an issue in GitHub, say, hey, I'm considering

53:57 this feature. Here's what I'm thinking about doing to add it for you. Would you be interested or would

54:03 you hate this idea of having it in this project?

54:06 I love that. Like if people talk about it, we also have a very active Slack channel where we discuss

54:12 these things.

54:12 Yeah, that might be also a decent place. But, you know, GitHub is nice because Slack is super transient,

54:16 right? Like there might be five people that all have this idea that would say, yeah, that's great.

54:20 Or actually I see it this way or right. But if it was in Slack, it's gone if you weren't there.

54:25 Yeah. And so I use both Slack and GitHub for that sort of stuff.

54:29 Yeah.

54:29 But I would be grateful if there were more Python programmers out there who wanted to

54:34 try their hand at adding something to the system or just using it and contributing bug reports is also

54:40 really helpful.

54:41 Right. For sure. So one of the things that I noticed that jumped out to me when I was going through

54:46 the demo example, right? I was going through and answering the various questions and there's a

54:52 button up at the top. I suspect this is not in the real one, but in the demo one it is where you can

54:56 press and say, show me the source. And it'll actually pull up and like show you the YAML file

55:02 and the various other pieces and like some of the performance analysis stuff, which is pretty cool.

55:09 People can go check that out if they are looking for help.

55:11 But another thing that I think is nice is you have this readability score, I guess.

55:18 Not readability, the app, but like how easy is this to read? Like the Flesh Reading Ease or the Flesh

55:25 Kinkade Grade Level, things like that. That sounds pretty helpful if you're trying to have

55:30 questions that people want to answer correctly. You want to keep that probably as simple as possible,

55:35 right?

55:35 Yeah. I think that's what a lot of people don't understand is that really the hard part about

55:39 developing guided interviews is getting the language right and being able to be precise,

55:45 but also use plain English. And lawyers have a tendency to just go on for too long and use big

55:52 words. But if you, so I added that tool to sort of tell you what the grade level of the language you're

55:59 using on that question was in the hopes that people would go for a sixth grade reading level

56:03 and keep working on their language until it got to that level.

56:07 Yeah. That's cool.

56:09 That's thanks to this great package called TextStat. So it was one of the things that because of the

56:14 magic of the community, I was able to integrate it very quickly.

56:18 Yeah. That's cool. And that's what the main reason I wanted to prompt you to ask about it is like

56:22 how you're generating that in Python.

56:23 Yeah. It's that TextStat package.

56:25 Okay. That's really cool. I can see lots of uses. Maybe you don't show that to the user,

56:29 but in your app, you're like, actually, you know, the CMS or whatever you're building,

56:34 maybe it wants to have that kind of analysis in it. That's cool. Yeah. Awesome. Well,

56:39 Jonathan, I think that pretty much covers it. I think this is a nice project. And if you're out

56:45 there listening and you need to conduct surveys or interviews that are slightly better and more

56:49 controllable than like the standard SAS products, this seems like a pretty good option.

56:53 Well, thanks for having me.

56:54 Yeah. You bet. Now, before we get out of here, though, I've got the two questions at the end

56:58 of the show that I always ask you. So let me ask those to you now.

57:01 Oh, good. If you work on Doc Assemble, you can write some Python code. What editor do you use?

57:05 Emacs. I've always used Emacs. Yeah. And I don't think I'm ever going to do anything else. I

57:09 basically live in Emacs. I also use org mode to manage my life and track my time.

57:13 Okay. So yeah. So you run the Emacs operating system, basically.

57:17 Yeah, somewhat. Yeah. Cool. And then you've talked a lot about cool and interesting and unique

57:23 Python packages here. But maybe if you want to give a shout out to any additional packages that

57:29 you think are great for people to know about.

57:31 Well, one that I really encourage people to use is software called Lettuce, which is a Python

57:38 version of another package called Cucumber. So the idea of Lettuce is it's a testing platform

57:44 that uses behavior driven design, I think it's called, where you can express your tests in plain

57:50 English. And then it uses the Selenium package to do web browser automation, or some other type of

57:57 automation to then carry out those tests. So when I read a guided interview, I also in tandem write a

58:03 test script, which is human readable using this Lettuce package. And the Selenium package for web

58:10 browser automation is the best thing ever. I've done web browser automation with lots of other tools,

58:15 but Selenium is amazing.

58:16 Yeah. Oh, those are both really interesting. I love them.

58:18 Yeah. That's all the packages I can think of. Yeah.

58:20 Yeah. Great. So final call to action. People want to get involved with conducting these interviews

58:25 using DocAssemble. What do you tell them?

58:27 I think they can check out the website and join our Slack channel and get involved in the DocAssemble

58:33 community. We also have annual conferences now called Docacon, which take place in the summer every year.

58:39 Yeah.

58:40 And the other thing I just think people, what I would like Python developers to do is get jobs at law firms

58:47 and then sort of infiltrate and then find ways to automate what they're doing. Because there aren't

58:52 enough people with programming skills in the legal field as a whole. And so I think as a result,

58:58 we're kind of behind the times.

58:59 That's good advice. And it sounds like DocAssemble is coming along strong. It's pretty wild that you

59:04 already have a conference about it. So very cool. Well, congrats on the project and thanks for being on

59:09 the show.

59:09 Thank you.

59:10 You bet. Bye.

59:10 This has been another episode of Talk Python to Me. Our guest on this episode was Jonathan Pyle,

59:17 and it's been brought to you by Linode and Datadog. Linode is your go-to hosting for whatever you're

59:23 building with Python. Get four months free at talkpython.fm/Linode. That's L-I-N-O-D-E.

59:29 Datadog gives you visibility into the whole system running your code. Visit talkpython.fm slash

59:35 datadog and see what you've been missing. They'll throw in a free t-shirt.

59:38 Want to level up your Python? If you're just getting started, try my Python Jumpstart by Building

59:44 10 Apps course. Or if you're looking for something more advanced, check out our new async course that

59:50 digs into all the different types of async programming you can do in Python. And of course,

59:54 if you're interested in more than one of these, be sure to check out our Everything Bundle. It's like

59:59 a subscription that never expires. Be sure to subscribe to the show. Open your favorite podcatcher

01:00:04 and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes,

01:00:09 the Google Play feed at /play, and the direct RSS feed at /rss on talkpython.fm.

01:00:15 This is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it.

01:00:20 Now get out there and write some Python code.

01:00:21 I'll see you next time.

Talk Python's Mastodon Michael Kennedy's Mastodon