#354: Sphinx, MyST, and Python Docs in 2022 Transcript
00:00 When you think about the power of Python, the clean language or powerful standard library may come to mind. You might certainly point to the external packages as well, but what about the relative ease of picking up new libraries or even parts of the standard library? Documentation plays an important role there, and the tools in the Python space for building solid documentation and even publishing articles and books involving live code are huge assets. In this episode, we have Paul Everett, Pradyun Gedam, Chris Holgraph, and Chris Sewell to update us on Sphinx, MyST Parser, Executable Books, Jupyter Book, Sphinx Themes, and much more. This is Talk Python to Me, episode 354, recorded January 19, 2022.
00:54 Welcome to Talk Python to Me, a weekly podcast on Python. This is your host, Michael Kennedy. Follow me on Twitter where I'm @mkennedy and keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython. We've started streaming most of our episodes live on YouTube, subscribe to our YouTube channel over at talkpython.fm/YouTube to get notified about upcoming shows and be part of that episode.
01:21 This episode is brought to you by Signalwire and Tonic AI. Please check out what they're offering during their segments. It really helps support the show. Pradyun and Paul, Chris H and Chris S. Welcome to Talk Python to Me. Great to have you all here. Good to meet you.
01:37 Thanks for doing this, Michael.
01:38 Nice to be here.
01:39 Thank you.
01:39 Yeah, absolutely happy to be doing it. I think talking about documentation and all the static site stuff and book generation stuff that we're going to talk about is going to be super fun. You all are at the center of various fulcrums of that throughout the ecosystem. So it's great to have you here. Great to be talking about that stuff. So let's just start really quick with a quick introduction and then what you're up to these days. Paul, you're a regular here, so you want to go first?
02:09 Sure. I'm Paul Everett. I'm a Jet brain developer advocate, big fan of the show, big fan, total fanboy of the other three people here. I've got all kinds of stories about where I was when I found out about MyST and the Parser and desperately wishing I could join Pragyan and all the things that he's doing. And I will add, big fan of Carol Willing when I grew up. I want to be Carol Willing.
02:36 Fantastic. And you're still JetBrains.
02:39 Still JetBrains.
02:40 Right on keeping the PyCharm flow and right on. Pragyan How about you?
02:44 I work at Bloomberg, although not what I'm going to talk about today is related to work software engineer. They work on Python infrastructure team.
02:53 I am a maintainer on PiP. I have written Sphinx theme on two, depending on how you become the second one. And I'm involved in a bunch of various efforts around Sphinx. At this point, I've sort of made myself comfortable in these spaces and sort of made my way into the various discussion forums, I guess.
03:18 Awesome. Well, welcome.
03:19 You seem to be the emergency maintainer of many open source projects like Live Reload.
03:26 I have a mild issue of not knowing when to stop.
03:30 That's good. All right. Chris H. Yeah.
03:32 So my name is Chris Holgraph. I'm the director of the International Interactive Computing Collaboration, or two. Itc, which is a nonprofit that runs cloud infrastructure for interactive computing and the Jupyter ecosystem and the sort of surrounding ecosystem alongside of It for research and education communities. My background is largely in the Jupyter ecosystem. I've worked a lot on Jupyter Hub, which serves multiple Jupyter sessions via some kind of centralized cloud infrastructure. I also work a lot on the Binder project, which focuses on more like scientific reproducibility and shareable computing environments and kind of like cloud agnostic and pretty flexible manner.
04:09 That's fantastic. For people who don't know about Binder, that's basically if you are on a GitHub repo or some published notebook that's not interactive, you can click that and it'll fire up a little environment where you can go and explore the notebook for real, right?
04:22 Yeah, exactly. And so a lot of these things are kind of surrounding this general topic of scientific communication, scientific reproducibility, bringing those kind of data workflows and facilitating them with software development, which is related to one of the stuff we're talking about today as well.
04:37 Yeah, fantastic. You also have worked on the MyST project, right?
04:42 One of the projects that I'm focusing on right now, along with Chris S and a few other collaborators, is called the Executable Books Project. And this is basically an attempt at improving the state of open source, kind of community driven tools in the Python ecosystem around scientific communication and building a lot on top of the Sphinx ecosystem because there's a lot of good material there to work with and a lot of improvements that can be made that will benefit the broader Python community as well.
05:08 Yeah. Very cool. Looking forward to talking about Jupyter Book as well. That'll be fun. Chris, you're back. Tell us about yourself.
05:15 So I work in the Excutive books. I guess essentially, my kind of focus is on making the best tools for scientists to make open reproducible science. So I work 50% of the time here in Switzerland, the EPFL on a package called AIDA, which is a Python workflow engine for running simulations, orchestration simulations, and then I work the other 50% time in Executable Books on MyST and JupyterBook.
05:45 Yeah. That's really fantastic. Is any of your work in Switzerland? Does it have to do a CERN or other projects?
05:50 No, not too of CERN. It's APFL in a group on materials discovered.
05:57 Yeah. Fantastic. Let's start by talking about Sphinx. So when I think about Sphinx, I think about Python documentation. So who wants to sort of set the stage for Sphinx? I guess it does. Say the Python documentation generator, but it does more than that as well, if you wanted to. So, guys, what is Sphinx? When do I use this? What's its value? It's been around for a long time. I know that much.
06:22 Let's do a roundtable. I'd love to hear everyone's elevator pitch.
06:26 Exactly. What's the elevator pitch for Sphinx?
06:28 I can start with sort of how I got into Sphinx and then maybe how I use things now, which is a little bit different than how I initially used it. I think that a lot of people Sphinx has been around for a really long time, and it's really powerful tool for documentation, and a lot of people use it alongside of their software projects that they work on. So my first introduction to Sphinx was Via. It was actually a neuroscience analytics package called MNE Python. And we needed documentation to describe the APIs and the functions and the classes and things like use cases and examples. And so one of the really useful things about Sphinx is that it has an inherent extensibility and a lot of flexibility so that you can both generate narrative documentation with it, but also include programmatically generated documentation, API documentation and that kind of thing. So it lets you kind of like more seamlessly merge together code documentation or documentation embedded within code and your more kind of traditional narrative examples tutorials. That kind of a thing I see.
07:25 Almost like a Wiki plus an API documentation generator in one.
07:31 Yeah. Plus that's been around for a long time and with an inherent extensibility. So a lot of different sub communities have kind of built out their own community specific documentation that builds on top of the kind of basic Sphinx, building blocks.
07:44 Sure. I bet the scientific community has got a lot of specializations. We need to be able to express stuff like this, right?
07:50 Well, so that kind of gets me to the second part of what I was going to describe, which is, I think, over time. And this is one of the inspirations behind Jupyter book, to some degree is the realization that technical and API documentation is also really useful. Well, I should say the things you need to build for really good technical and API documentation are really useful for other kinds of use cases as well. And so I saw some other groups in the scientific ecosystem. There's a really interesting one called Simpeg, which I think is just SIMPEG.XYZ. I'm pretty sure that's what it is. And they basically built out a whole geospatial analytics tutorial and sort of documentation resource just built on Sphinx. And so for me, seeing that it kind of unlocked an AHA moment in my head to realize that you could also use the same documentation engine for documenting lots of other things, not just software packages and things like that. Technical documentation is quite generic and overlaps quite a lot with scientific documentation and scholarly documentation. And so that's the space that we've been exploring over the last couple of months and years.
08:56 Yeah, fantastic. And a lot of the outputs are super flexible. It outputs HTML, which you might host on somewhere like read the docs or. Netlifly, but also Latek, which is really important for publications and EPUB for ebooks and whatnot. So pretty cool.
09:10 I think this documentation generated written in Python, primarily intended for technical documentation with the ability to sort of intertwine narrative documentation with auto generated documentation that picks up things from your code.
09:26 Right. Yeah.
09:27 And this whole thing is combined with the ability to have a variety of output formats as well as a variety of extension points within the tooling to extend basically every aspect of building that documentation.
09:43 Interesting. Yeah. So for example, testing the code snippets that you might have in your documentation or Doc strings or something like that, maybe. Yes.
09:51 So that is docktist, and that's baked into Sphinx and Python's documentation, which happens to also be written in Sphinx.
10:02 There's a lot of capability and power hidden underneath the shell of Sphinx. And as Chris was mentioning, there's really extensive customizability here that you can then take and specialize it to your specific use cases. That's both the power and there's a Con to that, which is, hey, you got to maintain this and you got to keep this functioning and stuff. But yeah, it's a really powerful documentation generation tool that is perhaps a little too powerful for its own. Good.
10:37 Very good. Yeah. So docs.python.org is generated by Sphinx.
10:42 Fantastic. All right, Chris.
10:44 I think the other point that I wanted to touch is one of the bullet points there on the Sphinx site is the cross referencing capability is just second to none. You can reference in a page, you can reference cross page within your own documentation. But also and one of the things that's really helped to build on Sphinx is the read the docs community and the work they're doing so that you can reference any other site that's built on read the docs as well. In a really nice way.
11:15 Yeah. That's something that really surprised me when I learned about Sphinx is give a quick shout out to Paul. Paul wrote a course that we hosted over on Talk Python, generating static sites with Sphinx and Markdown, which is really cool. And one of the things that surprised me is when I think about creating multiple pages. So, for example, if I'm on GitHub and I want to have this part of some read me here, point to some other README or some other markdown, I just go in there, I type here's the relative path over to that thing, and I come up with a text that goes there. Sphinx allows you to have kind of an index into all the sub elements of the page, not just the pages, but like parts of the pages, headers and whatnot and you can link to them by name. Right. And so if for some reason you change the title of a header, your ATAG text that you are linking to will change, which is pretty awesome, right?
12:09 Yeah. And also Inter Sphinx. Right. Which is a way to expose those endpoints those index points across domain or across project. Right. So I could reference to something on Python in my documentation, specifically by a reference point, not by URL, which is pretty excellent.
12:30 And I think that's a big part of why Sphinx is as adopted as it is apply the fact that Inter Sphinx exist and the fact that you can very easily cross reference bits and other parts of the ecosystem as long as that part of the ecosystem is in Sphinx as the interspense pile generators. Right. In theory, other ecosystems could other tools could generate as as well. But yeah, I think that is quite capable if the fact that it's in restructured text has some interactions with all the linking syntaxes that exist. But it's genuinely very powerful, as I said.
13:11 So let's talk about that for a little bit. Because when I thought about Sphinx originally, I always thought about restructured text, and restructured text is kind of funky. I don't know, I just have a hard time getting used to it. But all those formats and whatnot it takes them getting issue. Right. You've got let me see if I can find an example and pull up here things like image, colon, colon, some other thing, and then sort of almost like a YAML style. And to me, when I'm writing like, wow, I would just much rather write Markdown and just blaze through this and write it nice and clean. And I'm willing to give up a little formatting or something to allow me to live in a simpler world that doesn't require so much stuff. So traditionally, restructured text has been the way of the documentation through Sphinx. It's also been like the way that you put your information on PyPI and describe it there. Right. But the PyPI.org version moved over to at least support Markdown. And through some of the work you all are doing, Sphinx now has an integration layer with doing everything in Markdown as well. Right. You want to talk about that?
14:20 Maybe, yeah, I think so. As Pradyum mentioned Sphinx is just incredibly powerful, but it's trying to harness that power and make it more overwhelming to nontechnical users for people who want it. It's brilliant, and you can do everything under the sun, but trying to sell it to the masses, as it were, and trying to make it as easy to use as possible for simple use cases. But then having that extensibility there is, I guess, kind of what we've been looking at in executive books and with the Markdown most everyone now knows about Markdown common Mark House.
15:02 Right. We've got GitHub and Stack Overflow have basically forced the software development community to understand it right?
15:11 When people love it or hate it, it's there and you know it. So being able to just copy and paste things from GitHub or stack overflow or just write something that's quite intuitive to write, that's what we're really trying to be trying to get at, trying to hide some of the intricacies of Sphinx make you more user friendly on the front.
15:36 This portion of Talk Python to me is brought to you by Signal Wire. Let's kick this off with a question. Do you need to add multiparty video calls to your website or app? I'm talking about live video conference rooms that host 500 active participants, run in the browser, and work within your existing stack, and even support 1080p without devouring the bandwidth and CPU on your users devices. Signal Wire offers the APIs, the SDK and Edge networks around the world for building the real estate of real time voice and video communication apps with less than 50 milliseconds of latency. Their core products use WebSockets to deliver 300% lower latency than APIs built on Rest, making them ideal for apps where every millisecond of responsiveness makes a difference. Now you may wonder how they get 500 active participants in a browser based app. Most current approaches use a limited but more economical approach called SFU, or selective forwarding units, which leaves the work of mixing and decoding all those video and audio streams of every participant to each user's device. Browser based apps built on SFU struggled to support more than 20 interactive participants, so Signal Wire mixes all the video and audio feeds on the server and distributes a single, unified stream back to every participant so you can build things like live streaming fitness Studios where instructors demonstrate every move from multiple angles, or even live shopping apps that highlight the charisma of the presenter and the charisma of the products they're pitching at the same time. Signal Wire comes from the team behind Free Switch, the open source telecom infrastructure toolkit used by Amazon, Zoom, and tens of thousands of more to build mass scale telecom products. So sign up for your Free account at Talkpython.fm SignalWire and be sure to mention Talk Python to me. Receive an extra 5000 video minutes that's Talkpython.fm/SignalWire. I mentioned Talk Python to Me for all those credits where you guys got to add to this restructured text markdown duality here.
17:30 I think it's an interesting duality, and MyST don't know how to say this.
17:37 When I first discovered this, I was really fascinated because I was like, hey, this looks cool. This looks like a really good I found it especially amusing to see how MyST let's speak one phrase I really liked how MyST ends up reconciling the complexity or power, depending on how you look at it Sphinx and sort of fitting that into Markdown and having it still look like it belongs there, right? There's a whole bunch of directives roles or whatever you want to call them that you can use to manipulate the text and include things and have those extensible points.
18:18 Right. Because markdown is not nearly sophisticated enough to handle things like Inter Sphinx, and these other types of constructs that's in Sphinx, right?
18:28 Yes. So, like, one example of this would be the ability to have the cards that you're showing on the screen at the moment. Right. Like the closest thing you can do for that in markdown would probably be embed a bunch of HTML.
18:42 Yeah. That are tables when you get basically no styling there.
18:46 It's not cards, it's all jammed together in the table.
18:49 And even when we pass the ability to have inline sort of markup of Mark up that's in the same line as the paragraph that you're writing in, you have a very limited set of those. Right. Like bold, italics, underlines, maybe strikes if the platform you're using supports it and stuff like that. But with MyST or restructured text or whatever, you have a lot more capability there. You input to Springs, you input to Docker deals.
19:16 Which is what implements the restructured text on this particular point of the power expressing itself into markdown. I've enjoyed watching Chris and Chris over the years. Iterate through this, roll some things out, realize, wait, it's better in tooling if we do a triple colon instead of a triple back tick, because the body of the directive can then be rendered by some inline viewer or something. So I'd be interested in hearing Chris and Chris talk a little bit about your voyage of discovery. For what's the gestalt of markdown.
19:54 Yeah, it's interesting. So obviously with Jupyter book and things we're very much focused on in Jupyter Notebooks, you have the markdown set, everything recently marked out, and that historically has a single kind of renderer that renders things. So trying to harness the simplicity of markdown and trying to make it look nice as you're writing it doesn't look like a complete mess within these markdown cells, whilst still having the capabilities of like a restructured text has been an interesting challenge. So obviously we wanted to make the Mark the MyST format kind of degradable to come and Mark so that it can be passed normally with a normal markdown passer, although I wouldn't know what to do with things like notes and all these other directives and roles that you can have within Sphinx, but at the same time have all the capabilities. So it's a hard balancing act. I'd say, just say here's, all of these.
21:06 You don't want to spoil the simplicity of markdown, though, right?
21:09 Yeah, exactly. It's having the readability of source text, essentially weighing that against the flexibility of actually making these lovely HTML pages.
21:20 Right, exactly. Chris, you want to add something to that?
21:22 Yeah, I think that'd be interesting thing about markdown, and probably something that we should clarify for some of the people here is that there is no one markdown.
21:30 Yes. That surprised me, actually. I didn't realize that.
23:17 That is a big challenge. A quick question from the audience, Ryan asked, does GitHub use their own version, or do they use CommonMark?
23:25 It helps flavor Markdown.
23:28 Which is a superset of Common Mark.
23:31 Yes, indeed.
23:36 I think if you Google it, I guess Git, it has its own specification, which is built on top of the Common Mark specification.
23:47 As well.
23:49 Yes. And it adds a few extensions, a few slightly different things, like tables.
23:56 The funny that we do is not, in Common Mark, she's quite a basic thing, but that's not common mark. That's only in GitHub flavoured markdown .
24:05 Avoiding the politics of what it got to get it to where it got. There was a decent amount of this has to be maximally compliant. Like, this has to be the thing that works everywhere.
24:17 The minimum functionality, the maximum reach sort of story, right? Yeah.
24:22 And they sort of went, okay, what was basically everywhere. And that's sort of what ended up becoming.
24:28 So any effort to like, oh, how about we extend this has sort of not gone very well so far.
24:36 Sure. So I do want to give a quick shout out to this app I came across recently called 'Typora' for writing Markdown. And one of the things I think is interesting, the reason I bring it up is it has the standard markdown stuff, but it also has inline bits for mathematics and other stuff like diagrams and whatnot. And this is the kind of stuff you're talking about, Chris, right. Where you want to take the core, but there's ways to extend it in the MyST, right? Yeah.
25:03 I mean, that's the beauty of Python or a lot of languages where you can define functions is that you can show a lot more creativity and add extra functionality in a very intentional, structured way so that it's easier for you to replicate other people's work and build on top of it and things like that. In some ways that's how I think about directives and roles. It's like bringing functions but into a realm where you're dealing with human written text or code that is interpreted. But inside of these little directives and things like that, it just opens up a lot of extra room for creativity and trying out different kinds of things.
25:38 I guess because we didn't mention this a bunch in sort of Sphinx.
25:42 Glossary or terminology directors are essentially a block of text that you're all saying. It also has this characteristic associated with it like hey, present this in like a notebook or whatever. Whereas the role would probably be something in line, hey, make this bold, hey, this is actually a link to another theme and stuff like that. In case folks who are listing aren't familiar, this would probably be helpful, although I wish we had said this sooner.
26:11 I'm a refugee from Gatsby, in which markdown is almost like a database format, a lot of structure, but when it comes to markdown and extension points and things like that, oh my God, the monkey business they jumped through to try and get information from the document to the extension point or whatever. And they all have to invent their own little mechanism for packing stuff into the space after the code fence invocation.
26:51 It's just clearly obvious that they need something like what MyST has done, which is a consistent syntax that just hands things over to restructured tax directives.
27:03 I think what's interesting about Sphinx and docu titles is that intermediate document model that exists, which I think is something that differentiates it from a lot of other markdown parsers and renders that are out there. I think a lot of static website generators that are effectively going from Blobs of markdown mapped onto Blobs of HTML is a sort of like heuristic. It's a very programmatic output kind of a thing. And actually the original version of Jupyter Book, maybe like two and a half years old. Now it was a wrapper around Jackiel the Juggle website generator, but because Juggle was fundamentally just doing Blobs of Markdown to Blobs of HTML. There wasn't that intermediate rich document representation where you can do things like resolve cross references and collect bibliographic entries and collect equation and figure labels so that you can refer to them elsewhere. And once you add that extra model in, it gives you a lot of extra. I think that's where a lot of that extra like power and complexity comes from in the Sphinx ecosystem. In some ways, MyST Markdown is just like. It's almost like a front end on a user side. It's just giving you, at least in my opinion, I'm more user friendly entry point into the Sphinx ecosystem.
28:20 Now, most times you just want to write straight Mark down, but every now and then you need something like one of these references over to another part of the site or you need more control. And so MyST has this ability to say, kind of like run some inline Restructured text here, right.
28:36 That as well as the ability to just hook into the theme that Restructured text would. Right. That would be the direct. So you can embed Restructured text in line. But that's usually not what you need to do or want to do. Usually you're just able to directly use the thing that you wanted to use. There you go.
28:56 Okay, so you would go and write some Python code to process one of these directives. That's pretty excellent.
29:02 Yeah. And a bunch of these directives exist already in, well, Phoenix, essentially. And you can just use them as you would in Restructured text in MyST, except with a slightly different syntax because you're operating in a different mark up language now.
29:19 Yeah, exactly. The terminology they use is interpreted text, so it's just a block of text. And here's the name of the class function that he's going to interpret this. And here maybe some options to help you interpret it, and then we would answer it.
29:37 Nice. So does MyST come with a bunch of these extensions already that you can use for maybe scientific graphs or things like that that you can pull in? Or where do I find more of these if I don't want to write them myself?
29:50 So one of the goals of the Jupyter Book project is trying to bring in functionality from the Jupyter ecosystem around interactive computational document models like MyPY and Jupyter notebook files, and also kernels that can run arbitrary, usually data centric code for visualizations and analysis and things like that, where those code will generate outputs like PNG images or HTML interactive visualizations or tables with statistical analysis in them and that sort of stuff. And so one of the goals of the Executable Books project is also to build sort of entry points for the Jupyter ecosystem into Sphinx and into MyST Markdown. So you can kind of get the complexity of the Py Data ecosystem or the R ecosystem or the Julia ecosystem but with the ability to embed that into a documentation narrative structure as well. So I think that that's where a lot of the scientific use cases come from. It's like using scientific code that gets executed alongside of your documentation build in a programmatic fashion, and where those outputs of the code are then inserted into your document in a way that, from a Reader's perspective, looks like it's just part of the narrative flow of everything else that was there.
31:11 This portion of talk by the enemy is brought to you by Tonic AI. Creating quality test data for developers is a complex, never ending chore that eats in the valuable engineering resources. Random data doesn't do it, and production data is not safe or legal for developers to use. What if you could mimic your entire production database to create a realistic data set with zero sensitive data? Tonic AI does exactly that. With Tonic, you can generate fake data that looks, acts, and behaves like production data because it's made from production data. Using their Universal Data connectors and a flexible API, Tonic integrates seamlessly into your existing pipelines and allows you to shape and size your data to scale, realism and degree of Privacy that you need. Their platform offers advanced subsetting, security identification and ML driven data synthesis to create targeted test data for all your preproduction environments. Your newly mimicked data sets are safe to share with developers, QA data scientists, and check, even distributed teams around the world. Shorten development cycles eliminate the need for cumbersome data pipeline work and mathematically guarantee the Privacy of your data. With Tonic AI, pick out their service right now at 'talkpython.fm/Tonic' or just click the link in your podcast players show notes. Be sure to use our link talkpython.fm so they know you heard about them from us.
32:41 Maybe tell people a bit about the Jupyter Book project itself. It's an open source project for building beautiful publication quality books and documentation, as you said, from sort of taking the code in the notebook and generating the output like some of the graphs and whatnot in a live way. But it sounds fascinating. Maybe when would I use it? tell people a bit about that?
33:03 Yeah, I think that the simplest from a technical standpoint, since we're kind of riffing off of Sphinx. Jupyter Book is a distribution of Sphinx. It's basically a collection of pre configured Sphinx extensions, some of which are developed by the Executable Books team and community, others of which have been developed by the broader Sphinx community and that are just reused and contributed upstream to by people in executive books. And those extensions have been chosen to sort of all feed into this use case of scientific and technical documentation. So Sphinx like a bibliography and citation extension in Sphinx, and that's activated automatically with Jupyter Book right on.
33:44 Because of course you're going to need that and you want something like Evernote that's going to pull from some source that's always like, I always quote this article or something, right?
33:52 I mean, it's a use case that's not built into core Sphinx, like having references and citations and bibliographies, and so it pulls that workflow into Sphinx via an extension. So Jupiter Book is kind of a collection of these extensions and then wrapped in a commandline interface and a configuration structure that's a little bit more user friendly. I think that's one of the things about Sphinx is that at least historically, it has tended to be both developed by and catered to the developer community, which is a little bit different from the scientific community. A lot of scientists know just enough code to be dangerous with, myself included, and they're often not as familiar with traditional software development workflows. And so things like Jupyter Book is configurable with a YAML file rather than with in Sphinx the default configuration is with a constant Python correctly. So a little quality of life improvements to make it a little bit easier for people to get started with this more opinionated distribution of Sphinx.
34:48 Yeah, it sounds like a fan is what you've been. Yeah. What a fascinating resource for people. And I meant end note, not Evernote. Sorry. That's the one you use for references. There's some good questions in the audience I want to ask you all, but before we do, I want to ask about MyST just a little bit more. So MyST looks really interesting. And it looks like it allows me to do many more powerful things with markdown than just straight markdown. Right now, when I'm in my code, like, say, on the website, it might have to take some markdown content and turn it into a page or something. I'm just using something like markdown two or one of the arbitrary just markdown parsers. Does this make sense as something to run live in your application rather than a publication generation story? Would it make sense to replace just using that one function for the library and then allow me to write directives that do more for the site, for example?
35:42 Not exactly sure what you're asking.
35:44 I think what Michael wants is the layer under MyST.
36:07 Yeah, you hand it markdown and you get HTML. You mean. Sorry.
36:09 Yeah, markdown. You get HTML.
36:12 Yes, that's what I want. Yeah, exactly what I want is a nice way to have richer markdown in an application in production, not just as something that I run like a build process against.
37:48 Right. Runtime. Yeah. Okay. Exciting to hear you're all working on it. All right, let's bring this maybe back to the documentation side. Right now, the audience asks, could we describe the whole process that a MyST document goes through before it becomes an HTML document? Like what happens to my inputs to get either the HTML or the EPUB or whatever I get essentially.
38:10 So you have your markdown text file markdown. We then pass it, as I just mentioned, we've marked down it Py, which turns it into a bunch of syntax tokens. And then we take them tokens and we convert them into the Docu tools as syntax tree, which is what Sphinx works with. And then as we're converting that into the docu tools asked, we're running all of these directives and these roles, this interpreted text against all of the functions, the extensions within the loaded with its Sphinx. And we end up with this nice syntax tree Python thing with nodes
38:58 And then we say Sphinx, there you go. Take it away.
39:02 Right. Traditionally, Sphinx has gone through some sort of restructured text process to generate that. And you're like, we're going to generate that and give it to you in a different way. Now just do what you do to generate your documentation. Right, exactly.
39:13 So once you've passed it through this MyST or you've done research the text, you end up with exactly the same thing, the syntax tree with nodes, and then Sphinx can go. And that's the kind of agnostic to any kind of output format that's just here's a paragraph and within that, here's text and here's bold.
39:37 And we'll bring this background to what we mentioned at the start of the power of Sphinx of the extensibility of it is that the markup format that you're writing with is decoupled from anything else that you do with it, not the extensions, not the directives that you use. None of that. They're all sort of a separate step from that. And getting that separation happens through the intermediate documentary or doc tree, which if anyone's worked with Sphinx, has probably seen mentions of in the build directory and stuff. So, yeah, doc tree sort of access that separate.
40:17 You go from your input text to your doctree, then to your output for format HTML writing and on the way you mess around with all the doctree to make sure you've got into document references and all this kind of thing.
40:34 Well known fact, if you go to SourceForge and look at Docutils, you'll see that they from the very beginning have anticipated restructured text not being the only format. Restructured text is just one. It happens to be the default parser, but it wasn't intended to be the only parser.
40:53 So what we've been discussing is things that Sphinx does. Sphinx for sort of the most part is really a good wrapper around a lot of what docutell provides. It's much more friendly package, in my opinion, to interact with and generally motive, sort of how in some senses, Jupyter's book sort of wraps Sphinx into a nicer package to use. Sphinx does that for docu tools. Right. And it builds upon it to give you additional functionality. It gives you additional points to hook into the build process that you would still have the docu tells. It's clearly not doing anything addition to in the build process. In addition to what docu tell to do. It just gives you a better framework to do that. And a lot of what we've been talking about applied that intermediate format and stuff, those are all cards that are coming up from docutools into Sphinx.
41:51 Yeah, very interesting. Alvaro out there asks can we use the AST to translate between markup languages, something like Pandoc, which is a pretty neat thing. It sounds like that might already work. You've got the different output formats and stuff already, right?
42:03 Yeah. So an interesting example of this, actually, is there's a little helper tool that Chris wrote as a part of the Executive Books project called Rst to MyST. And essentially it's a converter. If you have a bunch of documentation written and restructured text and you want to automatically convert it into MyST markdown, because MyST markdown and restructured text have the same, like, fundamental vocabulary, they just have different syntaxes that map onto that. Docu tells doctrine you can go from one to the other relatively easily, and that's what the Rst to MyST package does. It parses Rst into these abstract tokens, and then it can render those tokens as MyST markdown rather than restructured text. And that's because of that sort of intermediate document format that's there. I think what has is like this huge library of rules, basically of how do you go from these abstract tokens into a billion different output formats? And that sort of speaks to the community of the Pandoc world that's been around for quite a long time and is doing a lot of really awesome work there, too.
43:05 Yeah, indeed. All right, let's bring it back to Sphinx a little bit. I know you all wanted to give a quick shout out to Juan about some of the tutorials that he created, right?
43:14 So one of the things that Sphinx, as we've been talking about, is it has a documentation side. Right. And it is a documentation generator, but it didn't have a tutorial to get you started. That's what worked on recently. And I think a bunch of that work is actually piggybacking off of Read the Docs getting funding from CZI Chancellor Initiative, as in essential Open Source software. I think it's under the program, but I could be wrong on that. And yeah, he worked on a tutorial for Sphinx that introduces people to Sphinx, essentially, and tells you what your first steps to document you project using Sphinx would be.
43:57 Yeah, very cool. It sounds certainly useful to have. Right. The API documentation is not enough to make it feel really great. All right, so the next thing I want to talk a bit about with Sphinx is sort of the look and feel side of things. Right.
44:15 Can I really quickly make a plug? Actually?
44:17 Yeah, absolutely. Go for it.
44:18 I think that one of the other reasons that Juan, for example, has been contributing some of the improvements to the Sphinx documentation is that Sphinx is pretty old in computer science world and technology world. It's like an ancient technology. It's like eight years old or something like that.
44:37 Yeah, that's true. That's like the early days of Python. So anyway, my point is that I think that the documentation about Sphinx has been around for a long time. But I think that the community's understanding of what makes for good documentation has evolved quite a lot in the last 15 years. There's just more expectations around different kinds of documentation that you expect to find embedded in one place. I put a link in the YouTube comments for a really interesting framework that's been gaining some traction lately called the Dietaxis framework. But this idea that you sort of cleanly, separate out like tutorials and how to examples and reference documentation and explanations. That's just one example. But I think that the community has sort of evolved and made more complex its own idea of what makes for good documentation in some ways faster than a lot of these Python packages that have been around for decades, in some cases have kept up with that pace. And so I think there's a lot of low hanging fruit to improve a lot of these aspects by making contributions to Sphinx stocks and other pieces in the ecosystem.
45:38 If I can make an addition to your addition, too. One of the things I think of Sphinx as is I think of it as a miracle. It's an underappreciated miracle.
45:47 They crank out bug fix updates with long lists of features and bugs fix and all over and over for a year after year, they don't get to go do Greenville development. They're still stuck in Python two, threeland or something like that. They've got this main template, which was written back when Mark Andreeson was still in grad school. It's just heroic what they do.
46:13 Yeah. It's amazing when software continues to live like that.
46:16 Right. It's foundational software to the ecosystem. And yeah, there's very little green field development there and there's very little sort of exciting work that sort of it's all complicated problems with lots of complexity to deal both terms of compatibility as not having enough visibility to how your users are using things to know for sure if a change will break them. Basically everything is a breaking change. Let's operate with that. And the constraint that brings with it.
46:49 That's a big constraint for sure.
46:52 All right, so let's talk about look and feel. So there's the whole idea of Sphinx themes. Right.
47:00 And maintained by some people we know.
47:04 Yes. That is.
47:07 Going to give a quick shout out to the themes and tell us a bit about it.
47:11 So one of the things that Sphinx has, as we've mentioned multiple times, is a bunch of variety of outputs. Right. And even within those, even within the HTML output format, you have the ability to change how your output stylzed. Right. What theme you end up using? This is roughly analogous to, I guess, Jekyll themes or Hugo themes in that you give it a bunch of templates, maybe a bit of logic depending on what you're doing. And yeah, it rendered Shinx has that. And what's probably on screen the moment as we've talked about this is Springsteams.org, which is on the site. So I helped update to be more prettier and more up to date and sort of a more curated set of useful Sphinx that you can and maybe should be using when you move away from the default. Essentially, as Chris mentioned, Sphinx is fairly old. And when you look at the themes that ship with it, they bear that aesthetic with them. They don't look like they were built last week by someone who has been doing this since not too long.
48:24 Yeah. If you ever go to the way back machine and you look at something that is a popular website from the early days.
48:35 It's an amazing experience to just like pull up Google or Yahoo to look at something like this and say, yeah, that's one of the biggest companies in the world kind of reframe your opinion. But having these themes, I think is an important aspect. I do think having them look really good is something that these are starting to come along. Right. Like, I think the Furrow theme and the book theme look really nice. There's the Py data theme. If you've done, read the docs.
49:07 Read the docs is one of the themes that people I'm sure are familiar with.
49:10 I think for a very long time, the only major good theme or one of the two major good themes was the reader document. The other one in Alabaster, which is not the default. Alabaster is not the default. And it's been fairly recent that these new themes have sort of come in and gained major adoption in sort of the timeline of Sphinx, I would say. And I think that's a good thing because I'm personally motivated to do a lot of this. Right.
49:42 One of the things you mentioned is one that I wrote from scratch, and it's been an interesting, fun experience, and I can see why there's not a lot of these.
49:54 It's tricky.
49:55 I really would like to give you credit for this. I think Furo is the tipping point, the exemplar that made people rethink what to expect from Sphinx themes. If I was to guess the two reasons people are switching to make docs, number one is Markdown. So Chris and Chris, thank you for that. And number two is it just looks a lot better.
50:16 Part of the reason Markdown or Mkdocs looks really good is because there is one theme that's really good. There Mkdocs material.
50:24 Yeah, the material.
50:25 Yeah. Right.
50:26 That's all it takes.
50:27 And I have not. And I've spent a decent amount of time working as part of research for all of this work that I do. Is there any other major theme there? And there isn't. There's just one interesting, most of the ecology there just revolves around that. That isn't to say that thing, and that isn't to say Mkdocs is just that it's not.
50:51 But at least a decent amount of the Overlap out there is well, the overlap there is huge is what I would go for. And kudos to Squid funk, whose name is not coming to my head at the moment for the amount of work he's put into that.
51:06 Yeah, absolutely.
51:06 You and Chris are working on the new chapter of this idea, right.
51:11 Which Chris, Chris H.
51:12 I believe the two of you are kind of thinking about what if we didn't have to live with the old contract?
51:18 I think what you're referring to is some of the infrastructure improvements around developing the Sphinx Ecosystem. Is that what you.
51:24 No, I thought there was a theme itself you were working on that kind of threw out the basic theme, which is the predecessor of all themes. And I thought that you had a repo where there was a new theme you were working on. There is almost like an abstract theme or something.
52:22 The name escapes me at the moment. And one of the things we found out was there's like a lot of common work across these things. We put into a layer on top of that and those things there so that it reduces the amount of duplicated work. It still gives us the bits of flexibility that we want in the individual teams to make opinionated choices, design choices and whatnot. So there's a decent chunk of, oh, we all will do the same thing like breadcrumbs, have this HTML structure.
52:55 Doing things like that will reduce the amount of duplicated effort and lower the barrier of entry essentially into writing, which is sort of what I'm personally motivated in at the moment. Having written one, it's like, oh, this thing goes by the right sort of brain areas because as I've mentioned, I'm a mateater on PIP developer workflows are a thing that I happen to be interested in, and I would say I have a decent amount of experience dealing with. So I was like, hey, this looks like a great place to put in a bunch of HTML web tech experience combined with the bunch of Python packaging and user workflows around that and sort of put in energy there. So that LEDs team builder.
53:42 Which is like fantastic. Yeah, it's been excellent.
53:45 I think it's kind of just bringing in that maybe what Sphinx have liked in the past is that kind of expertise in web design, really. I mean, there's a lot of excellent people working, as you say, within Sphinx and Doc Tills and things on the kind of back end and how all that works and all the Python code. Possibly there's been less so in the past of going, right, let's actually make all this good work and actually show it off and have these just lovely things that actually show what you can do.
54:16 It was super fascinating for me because when I jumped in, I was like, oh, recreationally, right? And to be honest, my motivation jumping into this Sphinx ecosystem was precisely this.
54:30 I would like it to look better. And funny enough, none of this would have happened if PIP hadn't gotten a grant where we had a bunch of their experience experts sit with us and our users and sort of have that channel of feedback through them as well as their expertise. And just having them state multiple times like, hey, Pip's, documentation is not that great to sort of navigate and stuff like there's content, they don't know how to get there.
54:58 In those conversations, I was like, yeah, I don't like this site. Now that you made me look at it a bunch of times.
55:05 Let me put something on the screen for you all just to put side by side here and just think about this. You all can pull this up really easily who are listing just think about this from the perspective of someone who's choosing a programming language or new to programming to deciding, is Python the space for me? If on one hand we've got just the docs Python.org and the other something like Tailwind, where you look at it and it's just like, it just feels so fresh and welcoming, whereas the other.
55:39 This is not to take away from the hard and important work of writing the documentation, but the way it feels when you land there, I think is in desperate. Yeah.
55:48 But Michael, have you seen what it looks like in a man page?
55:52 I know it probably is looking the same.
55:55 Those three main page users are really happy.
55:59 Pretty close. And that isn't to say those users are under. But to be clear. Right.
56:05 It's kind of like moving from MySpace to Instagram.
56:30 And this is really an aspect that I would sort of think I would like to put my energy into this and make these improvements. And I'm by far not the only one. There's, in fact, now a documentation working group being formed in the core developed like C Python Development Community, the folks who develop the language around and improving this. And there's like a public issue right there that hopefully going to start ramping up soon because I would like to be involved with that. And I'm aware that for this as well. And this is by no means news to the folks involved. They're aware of this. And I think the first issue itself in the Docs community issue tracker that they have is moving to a more modern documentation theme. As it turns out, there aren't many in Sphinx. So that's sort of been another thing I've picked up and gone.
57:25 Yeah. So maybe one more shout out to the Sphinx theme builder that you put together. Right?
57:28 Yeah. Sphinx Theme Builder comes in nicely into this. So there's a bunch of sphinx themes today, right? There's poor, there's Sphinx book theme, there's spidey Sphinx theme, there's alabaster, there's a bazillion of those. The first four, three that I mentioned. Right.
58:11 It even comes with a development server that does refresh the browser on change, right?
58:17 Yes. And I think that's also really nice to have in Sphinx documentation authoring as well. And Paul sort of mentioned this earlier of like taking up maintain a ship of too many things.
58:29 I went, I like this. And the repository had maintainers needed on it. So I just went and opened an issue and said Hi, add me. And I'm basically the factor maintainer on like library load as well as Sphinx auto build, which is nice if you don't use Sphinx auto build and write a lot of Sphinx docs, start using it because it's great, I think.
58:52 Yeah, awesome. I love those auto reload aspects. Are you guys, we are just about out of time. Maybe I'll open it up for if anyone else wants to just give a shout out to something or mention something while we're all here together, what have we not covered that we need to quickly talk about?
59:08 The main thing that I would just say is to reiterate that I think that what a lot of these theme conversations and improving developer workflows around the Sphinx ecosystem. One of the reason that they are so successful is because I think there's a lot of low hanging fruit in the Sphinx ecosystem to basically signal boost all of this work that's been done over the last 14 years and Docutells and Sphinx and building this documentation engine and this whole community of people writing extensions for it. In a lot of ways, I think that there's a lot of potential energy there that hasn't been unlocked yet, in part because of some of the things around developer friction or themes that don't look kind of like modern and nice and web developer. I think that there's still a lot more low hanging fruit that can be accomplished with, for example, improving documentation about Sphinx itself or the extensions or whatever. And so I think that all of the success of these projects is largely possible only because they're kind of piggybacking on top of this really well established community where a lot of work has already been done and a lot of kudos go out to that broader community for sure.
01:00:13 And one other thing that I'm sort of hoping to see coming out of this is as sort of these efforts around making it easy to present your thing nicer, making it easier to write your docs. And all of these happens. The general quality of the documentation in the Python ecosystem improves because when you have something that looks nicer that is easier to write, that's all of that. It also results in a better quality of documentation where because you're able to go, oh, actually I can see the structure of my site clearly. I think I would like to restructure it slightly to make it clearer what the flow is. And oh, I'm missing this bit of content in this section, but I have it there. Or maybe I should just add it. And when these things become more obvious to you through either clearer markup or clearer site design or whatever, it will lead to better documentation. And also more people would be willing to write those.
01:01:11 Like, why am I writing this?
01:01:13 And also moving away from Sphinx just being seen as a developer documentation into trying to do make it available for scientists and things to go. Wait a minute, I can use this relatively easily and I can share my work, or I can write tutorials science, share my research and things like stuff like that.
01:01:37 That part is really interesting to me, and it's why I did the course with Michael on static websites, not static documentation is to think bigger than just docs. I have an interest in knowledge basis. I'm a developer advocate. We create artifacts that are rich and interconnected and richly linked. Sound like something? Sphinx has inside of it. This engine Sphinx and Docu Tells has this engine inside which nothing else has. The very first contact you have with the MyST in the course that I did is just the humble link. But when you do it in MyST, it will tell you if the thing on the other side isn't there, it'll extract the title and inline it on your side. That is magic to all these other systems, and it's knowledge based kinds of things that are valuable. And I think that we could tell the story of Sphinx in a bigger way beyond documentation and start doing all of the things that people on technical teams want to do for storytelling.
01:02:42 I feel like I want to interject and mention that, hey, you know, Sphinx can do blogs.
01:02:47 Yes, there's the a blog extension.
01:02:49 Yeah, right on.
01:02:50 And Chris H at least has his blog. Yeah, that's right. He's posting the website as far as I'm there.
01:02:57 Look inside the code of A blog, which I've followed for years. It shows you tapping into the equivalent of front matter and walking through all the dock trees and looking for structure and doing back references.
01:03:12 People don't expect to be able to do that in Markdown. It's a document database. That's what you should think of Sphinx as.
01:03:19 And I think there's a lot there of unlocking. I really do.
01:03:25 Reiterate what Chris said, there's a huge body of excellent work that has been done over more than a decade. I mentioned older than four years younger or something like that. Sure, I'm four years older than it. Okay, whatever. I was wrong.
01:03:43 But yeah, there's a huge body of work there, and it sort of needs a little touch of, hey, paint look like it belongs in the sort of modern web.
01:03:57 But it also needed MyST. It needed MyST to come along and express that power in a human oriented way. And there are so many things in MyST that are mind blowing. People haven't scratched the surface with MyST yet.
01:04:11 Actually, to that point. One other quick plug that I would give is I know we've been talking about Sphinx in this conversation, but to really think that the goal of MyST is to be a tool or implementation agnostic specification for Markdown. And I think that there's a lot of exciting possibilities if we can sort of find the right standards to apply at the MyST directives level or roles level or whatever, and to see if you could get a flavor of Markdown that is flexible enough that it can be reused across like a few different kinds of applications, maybe some pure web page, maybe some that are full blown documentation, maybe Pandc or whatever. And so I invite people who, if they can think of use cases or tools that they're working on or ecosystems that they're working in that would benefit from something like a sort of tool agnostic flavor of Markdown that has natural block and inline level extension points, then reach out. And we would love to chat because I think it would be really cool to see MyST being applied in other kinds of contexts as well.
01:05:15 Yes, it would just help it grow, get stronger.
01:06:08 Yeah. Fantastic. Good call to action. Let's leave it there. Let me ask you all one of the two final questions I typically ask before we get out of here.
01:06:18 Pradyun, I'll start with you. If you're going to write some Python code, what editor are you using these days?
01:06:22 I am using Visual Studio code.
01:06:25 Used to use Sublime text, but I've sort of switched over. Yeah.
01:06:28 I feel like that's the natural transition for Sublime users. That seems to be good. I'm wondering which one you might be using.
01:06:38 No, I love PyCharm.
01:06:39 The best, baby. The best.
01:06:41 Absolutely. Right on, Chris. See? Well, mainly Visual Studio is there probably some notebooks in there as well.
01:06:51 There's probably some notebooks in there as well, like some Jupyter, partly Jupyter lab.
01:06:55 But mainly Visual Studio code. And there's also this extension, the Visual studio code, and we're working on the extension for Jupyter Lab as well.
01:07:05 Fantastic. All right, cool.
01:07:07 Chris H. I kind of split 50 50 VS Code and Jupyter Lab. Basically, if I'm doing development software development, then I do VS Code and if I do data exploration, interactive computing, that kind of workload.
01:07:18 I use Jupyter lab on the notebooks.
01:07:21 Right on.
01:07:21 All right. Well, thank you all for being here. It's been really fun. And thank you for all the hard work on bringing all this stuff up to 2020.
01:07:29 Plus, wait, two questions.
01:07:33 I know. Really quick then notable PyPI package. I don't want to take too much all time.
01:07:38 Okay. I guess for me it will be PursuedPyBear PPB. It's really cool. Educational game engine. That's way better than what I had when I got started with gaming.
01:07:49 Yeah, I just finished doing the Python Bytes podcast with Brian before we started this one. And Issa, the European Space Agency just put two Raspberry Py's on the International Space Station for kids and students to program against. I'm like that's way better than the turtle I got to drive around when I was in school.
01:08:08 All right, Paul, really quick. It's a package or library. You want to just give a quick shout out to.
01:08:12 I have a fascination with dependency injection in a human oriented way. Antidote.
01:08:20 Very interesting. That's new to me. Chris S. Got a library. You want to give a quick shout out to.
01:08:30 I hadn't used it until recently and yeah, I love you.
01:08:38 I'm all for that. People talk about how it's hard to distribute Python applications and little utilities by packaging them up because, well, you got to download the scripts and then set up the environment that is the home brew of the Python world. If you want something, I can go to the terminal and just type a command and it runs well. If youPIPX install it, it's going to just be there and it's fantastic. I love it. People should use it more for that use case.
01:09:02 Chris H. I feel like in the name of improving UI, UX and window dressing on technology that some people think is outdated, I would shout out to Rich like UI components, visualizations, whatever. I would love to find a way to get Rich into Jupyter book or even Spinx or something like that, because I think it makes for a really nice user experience. The only problem is that the maintainer of Rich is much better at Wordle than I am and so he's like consistently beating me by two or three tries everyday day.
01:09:34 Fantastic wrestling with Bit because this seems relevant. One of the things I've been doing through work as well as personal time is improving error messages in PIP, and Rich has played a decent so it's pretty likely that in the coming weeks you'll see better messages, partly thanks to Wills work on Rich.
01:09:54 Yeah, absolutely. Got the whole trace back enhancements and everything alright. Thank you all for being here. It's been great. Chat with you later.
01:10:01 Thanks Michael.
01:10:03 This has been another episode of Talk Python to me. Thank you to our sponsors. Be sure to check out what they're offering. It really helps support the show. Add highperformance multiparty video calls to any app or website with SignalWire, visit 'talkpython.fm/SignalWire' and mention that you came from talkpython to Me to get started and grab those free credits. Tonic.AI creates quality test data that does not contain personally identifiable information. Your generated data sets are safe to share with developers, UA and data scientists. Most importantly, they behave like production because they're made from production data. Check them out at 'talkpython.fm/tonic' when you level up your Python, we have one of the largest catalogs of Python video courses over at Talk Python. Our content ranges from true beginners to deeply advanced topics like memory and async. And best of all, there's not a subscription in site. Check it out for yourself at training.talkpython.fm be sure to subscribe to the show, open your favorite podcast app and search for Python. We should be right at the top. You can also find the itunesfeed at /itunes, the GooglePlay feed at /play and the Directrss feed at /rss on 'talkpython.fm'.
01:11:16 We're live streaming most of our recordings these days. If you want to be part of the show and have your comments featured on the air, be sure to subscribe to our YouTube channel at talkpython.fm. Youtube this is your host, Michael Kennedy. Thanks so much for listening. I really appreciate it. Now get out there and write some Python code.