New course: Agentic AI for Python Devs

diskcache: Your secret Python perf weapon

Episode #534, published Mon, Jan 12, 2026, recorded Fri, Dec 19, 2025
Your cloud SSD is sitting there, bored, and it would like a job. Today we’re putting it to work with DiskCache, a simple, practical cache built on SQLite that can speed things up without spinning up Redis or extra services. Once you start to see what it can do, a universe of possibilities opens up. We're joined by Vincent Warmerdam to dive into DiskCache.

Watch this episode on YouTube
Play on YouTube
Watch the live stream version

Episode Deep Dive

Guest Introduction

Vincent Warmerdam joins Michael Kennedy to dive deep into DiskCache. Vincent has an extensive background in data science and machine learning, which is what many in the Python community know him from. He currently works at Marimo (marimo.io), a company building modern Python notebooks that take lessons from Jupyter and apply a fresh, reactive approach. Vincent is also a prolific content creator, maintaining educational resources at Calmcode (calmcode.io) and contributing to open source projects like scikit-lego. His practical experience spans both data science workflows in notebooks and web development, giving him unique insight into how caching benefits different parts of the Python ecosystem.


What to Know If You're New to Python

If you are newer to Python and want to get the most out of this episode analysis, here are some foundational concepts that will help:

  • Dictionaries in Python: DiskCache behaves like a Python dictionary with square bracket access (cache["key"] = value), so understanding how dictionaries work is essential.
  • Decorators: The episode discusses using @cache.memoize decorators to automatically cache function results, similar to the built-in functools.lru_cache.
  • Serialization with Pickle: Python's pickle module converts objects to bytes for storage; DiskCache uses this under the hood for complex objects.
  • Multi-processing basics: Understanding that web apps often run multiple Python processes helps explain why cross-process caching matters.

Key Points and Takeaways

1. DiskCache: A SQLite-Backed Dictionary That Persists to Disk

DiskCache is a Python library that provides a dictionary-like interface backed by SQLite, allowing you to cache data that survives process restarts. Unlike functools.lru_cache which stores everything in memory and disappears when your Python process ends, DiskCache writes to a file on disk. This means your cached data persists across restarts, deployments, and even Docker container rebuilds. The library handles all the complexity of SQLite transactions, thread safety, and process safety behind a simple API where you just use square bracket notation like a regular dictionary.

2. Thread Safety and Cross-Process Sharing

One of DiskCache's standout features is that it is both thread-safe and process-safe out of the box. This is critical for web applications that typically run multiple worker processes (a "web garden") where each process needs access to the same cached data. Traditional in-memory caches like LRU cache are isolated to a single process, meaning each worker would have to build its own cache independently. With DiskCache, all processes can read from and write to the same SQLite file, and the library handles the locking and concurrency concerns automatically. Michael uses this on Talk Python's website where multiple Docker containers share a common cache volume.

  • SQLite's built-in locking mechanisms
  • Works across Docker containers with shared volumes

3. Massive Cost Savings: Disk is Cheap, Memory is Expensive

The episode makes a compelling economic argument for disk-based caching. Modern NVMe SSDs are incredibly fast, often approaching memory speeds for read operations, but cost a fraction of what RAM costs on cloud providers. Michael mentioned paying around $5 for 400GB of disk space on his cloud VMs, while the equivalent RAM would cost orders of magnitude more. This flips the traditional "keep it in memory because it is faster" advice on its head, especially for caching scenarios where the alternative is recomputing expensive operations or making network calls to Redis.

  • NVMe SSD performance approaches memory for many use cases
  • Reduces cloud hosting costs significantly
  • No need for separate Redis/Memcached servers

4. LLM and Machine Learning Use Cases

Vincent highlighted DiskCache as essential for anyone working with LLMs or machine learning models. When running benchmarks or experiments, you often need to call expensive LLM APIs or run inference on local models repeatedly. If the same input produces a deterministic (or acceptable) output, caching prevents wasting compute, time, and money on redundant calls. This is especially valuable during development when you might restart notebooks or rerun experiments many times. The @cache.memoize decorator makes this trivially easy to implement on any function.

  • Prevents redundant LLM API calls during benchmarks
  • Saves money on cloud API costs
  • Essential for iterative notebook workflows

5. Web Application Caching Patterns

Michael shared several practical examples from the Talk Python website. He caches rendered Markdown-to-HTML conversions, parsed YouTube video IDs from show notes, and HTTP request results for cache-busting file hashes. Each of these represents a computation that does not need to happen on every request. He maintains separate cache instances for different purposes, making it easy to clear specific caches without affecting others. The pattern of using content hashes as part of cache keys ensures that cached data automatically invalidates when the source content changes.

  • Markdown to HTML rendering
  • YouTube ID extraction from show notes
  • HTTP cache-busting hash computation
  • Separate caches for different concerns

6. The Memoize Decorator for Automatic Function Caching

DiskCache provides a @cache.memoize decorator that works similarly to functools.lru_cache but persists to disk. You decorate a function, and DiskCache automatically creates cache keys from the function name and its arguments. The decorator supports expiration times, so you can say "cache this for 5 minutes" for data that should refresh periodically, like a Reddit-style front page. Vincent discovered you can even exclude certain arguments from the cache key calculation, which solved his problem when a progress bar object was causing cache misses in notebook workflows.

  • Expiration/TTL support for automatic cache invalidation
  • Argument exclusion for objects that should not affect caching
  • Works with any picklable Python objects

7. FanoutCache for High-Concurrency Scenarios

For applications with many concurrent writers, DiskCache offers FanoutCache which automatically shards data across multiple SQLite files. Since SQLite allows concurrent readers but writers block other writers, sharding reduces contention by spreading writes across multiple database files. The default is 8 shards, but you can configure this based on your expected number of concurrent writers. This is particularly useful for high-traffic web applications or parallel data processing pipelines.

  • Automatic sharding across multiple SQLite files
  • Reduces write contention
  • Django integration uses FanoutCache by default

8. Built-in Django Integration

DiskCache ships with a Django-compatible cache backend that you can drop into your Django settings file. This replaces the need for Redis or Memcached as your Django cache backend while maintaining full compatibility with Django's caching APIs. You simply configure the backend as diskcache.DjangoCache and specify a location, and Django's existing caching decorators and low-level cache API work seamlessly. This is especially valuable for smaller deployments where running a separate cache server adds unnecessary operational complexity.

9. Custom Serialization for Compression and Special Types

While DiskCache uses Python's pickle by default, you can implement custom disk classes to control serialization. The documentation includes an example using JSON with zlib compression, which can achieve 80-90% size reduction for text-heavy data like LLM responses or API results. Vincent experimented with quantized NumPy array storage, trading minimal precision loss for 4x disk space savings. For JSON serialization, the hosts recommended orjson over the standard library for better performance and type support including dates and NumPy arrays.

  • github.com/ijl/orjson - Fast JSON library with extended type support
  • zlib compression for text-heavy caches
  • Custom disk classes for specialized serialization needs

10. Eviction Policies and Cache Size Management

DiskCache includes several eviction policies to manage cache size automatically. The default policy is "least recently stored" (LRS), but you can also use "least recently used" (LRU) or "least frequently used" (LFU). The default size limit is 1GB, which prevents unbounded cache growth but might catch developers off guard if they expect unlimited storage. You can also set expiration times on individual cache entries, which is useful for data that should automatically refresh after a certain period.

  • Least Recently Stored (LRS) - default
  • Least Recently Used (LRU)
  • Least Frequently Used (LFU)
  • Configurable size limits and TTL

11. Advanced Data Structures: Deque and Index

Beyond simple key-value caching, DiskCache provides higher-level data structures. The Deque (pronounced "deck") class provides a persistent double-ended queue useful for cross-process communication or simple job queues, potentially replacing Celery for simpler use cases. The Index class provides an ordered dictionary with transactional support, allowing you to retrieve multiple values atomically. These structures enable patterns like work distribution across processes without requiring external message brokers.

  • Deque for persistent queues and cross-process communication
  • Index for ordered dictionaries with transactions
  • Potential replacement for simple Celery use cases

12. Related Tools in the SQLite Ecosystem

The conversation touched on several complementary tools in the SQLite ecosystem. Litestream provides continuous streaming backup of SQLite databases to S3-compatible storage, making SQLite viable for production deployments with proper backup strategies. Plash is a new Python-focused hosting platform from Answer AI (Jeremy Howard's company) that provides persistent SQLite as a first-class database option. These tools reflect a broader trend of reconsidering SQLite for production use cases that previously required PostgreSQL or MySQL.

13. Vincent's Code Archaeology Project

Vincent built a visualization project called "Code Archaeology" that demonstrates DiskCache in a real-world data science context. The project analyzes Git repositories by running git blame across 100 time samples to show how code evolves over time, with sedimentary-style charts showing which lines of code survive versus get replaced. Processing large repositories like Django (550,000 lines) took over two hours, making caching essential for iterative development. The project is open source and welcomes contributions of additional repository analyses.

  • koaning.github.io/codearch - Live visualization
  • Threading combined with DiskCache for parallel processing
  • Real-world example of caching expensive git operations

14. Project Maintenance Status and Longevity

The hosts acknowledged that DiskCache has not had a release since 2023, with the maintainer (Grant Jenks) possibly busy with work at OpenAI. However, both Vincent and Michael emphasized this should not discourage adoption. The library is mature, stable, and built on SQLite which is actively maintained. Vincent stated he would need to see the library "break vividly in front of my face" before considering alternatives. The codebase is open source and could be forked if necessary, but the underlying SQLite dependency makes breaking changes extremely unlikely.

  • Last PyPI release: 2023
  • Built on actively-maintained SQLite
  • Considered stable/"done" rather than abandoned

Interesting Quotes and Stories

"It really behaves like a dictionary, except you persist to disk and under the hood is using SQLite. I think that does not cover everything, but you get quite close if that is the way you think about it." -- Vincent Warmerdam

"Your cloud SSD is sitting there, bored, and it would like a job." -- Michael Kennedy (from episode summary)

"I pay something like $5 for 400 gigs of disk. Do you know how much 400 gigs of RAM will cost on the cloud? There goes the college tuition." -- Michael Kennedy

"I vividly remember when I started college, people were always saying, keep it in memory because it is way faster than disk. But I think we have got to let a lot of that stuff just go." -- Vincent Warmerdam

"This cache needs to break vividly in front of my face for me to consider not using it. Because it does feel like it is done, and in a really good way." -- Vincent Warmerdam

"There are only two hard things in computer science: naming things, cache invalidation, and off by one errors." -- Referenced during discussion

"One thing I learned is that caching is actually hard to get right. It is on par with naming things." -- Vincent Warmerdam

"How do you fix that with a whole bunch of infrastructure? No, with a decorator." -- Vincent Warmerdam on the simplicity of DiskCache

Story: The Progress Bar Bug

Vincent shared a debugging story from building his code archaeology project. He was using the memoize decorator but noticed his cache was never being hit. After investigation, he discovered the problem: one of his function arguments was a Marimo progress bar object. Every time he reran the notebook, a new progress bar instance was created with a different object ID, causing every cache lookup to miss. The solution was DiskCache's ability to exclude specific arguments from the cache key calculation - a feature he was relieved to find already existed in the library.


Key Definitions and Terms

  • LRU Cache: Least Recently Used cache, a caching strategy that evicts the least recently accessed items first. Python's functools.lru_cache implements this in memory.

  • Memoization: An optimization technique that stores the results of expensive function calls and returns the cached result when the same inputs occur again.

  • Serialization/Pickle: The process of converting Python objects into a byte stream for storage or transmission. Pickle is Python's built-in serialization format.

  • Sharding: Distributing data across multiple storage locations (in this case, multiple SQLite files) to reduce contention and improve performance.

  • TTL (Time To Live): An expiration time set on cached data after which it is automatically considered stale and removed.

  • ACID Compliance: A set of database properties (Atomicity, Consistency, Isolation, Durability) that guarantee reliable transaction processing. SQLite is ACID-compliant.

  • Web Garden: A deployment pattern where multiple worker processes handle web requests, typically managed by a WSGI server like Gunicorn or uWSGI.

  • NVMe SSD: Non-Volatile Memory Express Solid State Drive, a modern storage interface that provides significantly faster read/write speeds than traditional SATA SSDs.


Learning Resources

Here are resources to learn more and go deeper on topics covered in this episode:

  • LLM Building Blocks for Python: Vincent's course that originally sparked this episode, covering practical LLM techniques including caching strategies for API calls and benchmarks.

  • Agentic AI Programming for Python: Collaborate with AI like a skilled junior developer. Build production features in hours with Cursor and Claude. Get real results.

  • Python for Absolute Beginners: If you are new to Python and want to understand dictionaries, decorators, and other fundamentals referenced in this episode.

  • HTMX + Flask: Modern Python Web Apps: Covers web development patterns where DiskCache caching techniques would be immediately applicable.


Overall Takeaway

DiskCache represents a powerful example of choosing the right tool for the job rather than reaching for the most complex solution. In an era where developers often default to running Redis or Memcached servers for caching, DiskCache offers a compelling alternative that requires no additional infrastructure, leverages the rock-solid reliability of SQLite, and takes advantage of modern fast SSDs that have closed much of the performance gap with RAM. Whether you are building web applications, running LLM experiments, or processing data in notebooks, the pattern is the same: expensive computations should not be repeated unnecessarily.

The library embodies the Unix philosophy of doing one thing well. Its dictionary-like API means there is virtually no learning curve for Python developers, while advanced features like sharding, transactions, and custom serialization are available when needed. Vincent's observation that this is in his "top five favorite Python libraries" and Michael's extensive production use on Talk Python speak to its real-world reliability.

Perhaps most importantly, this episode challenges conventional wisdom about caching architecture. You do not always need a separate cache server. You do not always need to keep everything in memory. Sometimes the simplest solution - a well-designed SQLite file on a fast SSD - is exactly right. As Vincent put it: "Give this cache thing a try. It is just good software."

diskcache docs: grantjenks.com
LLM Building Blocks for Python course: training.talkpython.fm
JSONDisk: grantjenks.com
Git Code Archaeology Charts: koaning.github.io
Talk Python Cache Admin UI: blobs.talkpython.fm
Litestream SQLite streaming: litestream.io
Plash hosting: pla.sh

Watch this episode on YouTube: youtube.com
Episode #534 deep-dive: talkpython.fm/534
Episode transcripts: talkpython.fm

Theme Song: Developer Rap
🥁 Served in a Flask 🎸: talkpython.fm/flasksong

---== Don't be a stranger ==---
YouTube: youtube.com/@talkpython

Bluesky: @talkpython.fm
Mastodon: @talkpython@fosstodon.org
X.com: @talkpython

Michael on Bluesky: @mkennedy.codes
Michael on Mastodon: @mkennedy@fosstodon.org
Michael on X.com: @mkennedy

Episode Transcript

Collapse transcript

00:00 Your cloud SSD is sitting there, bored, and it would like a job.

00:03 Today, we're putting into work with DiscCache, a simple, practical cache built on SQLite

00:08 that can speed things up without spinning up Redis or other extra servers.

00:13 Once you start to see what it can do, a universe of possibilities opens up.

00:17 We're joined by Vincent Warmerdom to dive into DiscCache.

00:21 This is Talk Python To Me, episode 534, recorded December 19th, 2025.

00:27 Talk Python To Me, yeah, we ready to roll.

00:29 Upgrading the code, no fear of getting old Async in the air, new frameworks in sight

00:35 Geeky rap on deck, Quart crew It's time to unite We started in Pyramid, cruising old school lanes

00:41 Had that stable base, yeah sir Welcome to Talk Python To Me, the number one Python podcast for developers and data scientists.

00:48 This is your host, Michael Kennedy.

00:49 I'm a PSF fellow who's been coding for over 25 years.

00:54 Let's connect on social media.

00:55 You'll find me and Talk Python on Mastodon, Bluesky, and X.

00:58 The social links are all in your show notes.

01:01 You can find over 10 years of past episodes at talkpython.fm.

01:05 And if you want to be part of the show, you can join our recording live streams.

01:08 That's right.

01:09 We live stream the raw uncut version of each episode on YouTube.

01:13 Just visit talkpython.fm/youtube to see the schedule of upcoming events.

01:17 Be sure to subscribe there and press the bell so you'll get notified anytime we're recording.

01:22 Vincent, hello.

01:23 Michael, Michael, we're back.

01:25 Awesome.

01:26 Awesome to be back with you.

01:27 Yeah, this is almost the sequel to the last time you were on the show.

01:32 So it's going to be fun.

01:34 Yeah, so sequel in this case, not the query language,

01:36 like an actual sequel of events.

01:38 Yes.

01:39 Yeah, you can correct me if I'm wrong, but I think what happened is you had me on a podcast a while ago

01:45 to talk about a course that I made, and a big chunk of the course that we were very enthusiastic about

01:49 was about this tool called DiscCache.

01:51 And then we kind of came to the conclusion, well, we had to cap it off.

01:54 Maybe it's fun to do an episode on just DiscCache.

01:57 since we're both pretty huge fans of it.

01:59 I think that's how we got here.

02:00 I think that is how we got here as well.

02:02 And we're going to dive into this.

02:05 Honestly, it's a pretty simple library called Disc Cache,

02:09 but what it unlocks is really, really sweet.

02:11 And I'm going to talk about a lot of different angles.

02:14 And now, even though it's just been not that long since you were on the show,

02:18 maybe just give us a quick intro of who you are.

02:20 Hi, my name is Vincent.

02:21 I've done a bunch of data machine learning stuff, mainly in the past.

02:25 That's sort of what a lot of people know me from.

02:27 These days, though, I work for a company called Marimo.

02:29 You might have heard from us.

02:30 We make very modern Python notebooks.

02:32 We took some lessons from Jupyter, and we take a new spin of it.

02:35 So that's my day to day.

02:37 But I still like to write notebooks and do kind of fun little benchmarks and also stuff

02:42 with LLMs.

02:42 And I've just noticed that for a lot of that work, boy,

02:45 disk cache is amazing.

02:47 And I also use it for web stuff.

02:48 And I think that's also what your use case is a little bit more of.

02:51 But yeah, in notebook land, you also like to have a very good caching mechanism

02:56 And on the Mremo side of things, we are also working on different caching mechanisms, which I might talk about in a bit.

03:01 But just for me, the bread and butter, the thing I've used for years at this point is disk cache whenever it comes to that territory.

03:06 Yeah, it's funny.

03:07 This was recommended to me for Python Bytes as a news item over there quite a while ago, like years ago.

03:13 And I'm like, oh, that's pretty interesting.

03:15 And then I saw you using it in the LLM Building Blocks course, and it just unlocked for me.

03:20 Like, oh, my.

03:22 Oh, this is something else.

03:24 And so since then, I've been doing a bunch with it, and I'm a big fan.

03:27 I've been on this, like trying to avoid complexity, but still getting really cool responses, performance, et cetera, out of your apps.

03:35 And I think this is a really nice way to add multi-process, super fast caching to your app without involving more servers and more stuff that's got to get connected and keep running and so on.

03:47 But before we get into the details of that, maybe let's just talk about caching in general.

03:53 Like what types of caching is there?

03:55 You know, I sort of give a little precursor there.

03:57 But yeah, dive into it.

03:58 So like in the course, the main example I remember talking about was the one--

04:03 you've got this LLM, and you want to do some benchmarks.

04:05 And it might be the case that, I don't know, using an LLM for, let's say, classification,

04:09 like some text goes in, we got to know whether or not

04:12 it's about a certain topic, yes, no, or something like that.

04:14 Then it would be really great if, suppose, the same text came

04:17 by for whatever reason, that we don't run the query on the LLM

04:21 Again, it's like wasted compute, wasted money.

04:23 So it'd be kind of nice if the same text goes in that we then say,

04:27 oh, we know what the answer to that thing is already.

04:29 We cached it, so here you can go back.

04:31 And that's the case when you're dealing with heavy compute ML systems.

04:35 But there's a similar situation that you might have, I guess,

04:37 with expensive SQL queries, or you want to reduce the load on a database somewhere.

04:41 Then having some sort of a caching layer that's able to say,

04:43 oh, you're querying for something, but I already know what it is.

04:47 Boom, we can send it back.

04:49 I think the classical thing you would do in Python is you have this decorator in functools, I think, right?

04:53 The LRU_cache.

04:57 Yeah, exactly.

04:58 Yeah.

04:58 That's a hell of a world to that.

04:59 But the downside of that thing is that it's all in memory.

05:02 So if you were to reboot your Python process, you lose all that caching.

05:05 So that's why people historically, I think, resorted to--

05:08 I think Redis, I think, is the most well-known caching tool.

05:12 It's the one I've always used.

05:13 There's Memcache, I think.

05:14 There's other tools.

05:15 You could use Postgres for some of this stuff as well.

05:18 But recently, especially because disks are just getting quicker

05:21 and quicker, people have been looking at SQLite for this sort of a thing as well.

05:25 So that's, I think, the quickest summary and also sort of the entryway to how I got started with disk cache.

05:31 Yeah, and so for this example that you highlight in the LLM

05:34 Building Blocks course, it's not a conversation.

05:38 It's like a one-shot situation, right?

05:41 You come up-- you say, I have some code or some documents,

05:43 and I have almost like an API.

05:45 I'm going to send that off to the LLM and ask it, tell me X, Y, and Z about it.

05:51 And sure, it's got some kind of temperature and it won't always give an exactly the same answer,

05:56 but you're willing to, you know, you're willing to accept an answer.

06:00 And at that point, like why ask it again and again and again, which it might take seconds,

06:05 it might cost money.

06:06 Whereas if you just remember through caching somehow, you remember it, it's like, boom, instant.

06:13 Yeah, and it tends to come up a lot in when you're doing benchmarks, for example.

06:16 So you have this for loop, you want to go over your entire data set, try all these different approaches.

06:21 And if you've got a new approach, then you want that to run, of course.

06:23 But if you accidentally trigger an old approach, then you don't want to incur the cost of like going through all those different LLMs.

06:29 I should say, like, even if you just forget about LLMs, let's just say machine learning in general.

06:33 Let's say there's some sort of image classification thing you're using in the cloud.

06:36 There also, you would say, like, file name goes in.

06:39 that's an image and if the same file name goes in we don't want the expensive compute cost to happen

06:43 either so it's definitely more general than llms but llms do feel like it's the zeitgeisty thing to

06:48 worry about yeah i think for two reasons one because they're just the topic du jour and two

06:54 because they're they're i think a part of computing that most people experience that is way slower than

06:59 they're used to yeah well and especially if you're you know if i suppose that you you have a an

07:05 an attic somewhere and you're a dad and you want to do home lab stuff and you're playing with all

07:09 these open source LLM models, then you also learn that, yeah, they're fun to play with, but they also

07:14 take a lot of time to compute things. So then immediately you get the motivation to do it the

07:18 right way. Yeah, I built a couple of little utilities that talk to a local LLM. I think it's

07:25 the OpenAI OpenWeights one, that 20 billion parameter one I have running on my Mac Mini,

07:31 and it's pretty good, a little bit slow, but, you know, it's fine for what it's being used for. And

07:35 put-- use your disk cache technique on it.

07:39 And if I ask it the same question again, it's like, boom.

07:41 You don't need to wait 10 seconds.

07:43 Here's the answer.

07:43 Yeah.

07:44 So that-- and I guess like-- but I guess from your perspective,

07:46 I think your main entry point to this domain was a little bit more from the web dev perspective, right?

07:51 Like that's-- and I suppose you're using it a lot for preventing expensive queries to go to Postgres,

07:57 or I don't exactly know your backend.

07:59 You know how-- you won't believe how optimized my website is.

08:02 There's not a single query that goes to Postgres, because they go to MongoDB.

08:06 I'm just kidding.

08:06 There you go.

08:07 No, but your point is totally valid.

08:10 Go into the database, right?

08:11 Now, I don't actually cache that many requests.

08:15 I don't avoid that many requests going to the database.

08:17 They're really quite quick, and so I'm OK with that.

08:19 But when you think about a feature-rich database, feature-rich web app, there's just tons of these little edge

08:26 cases you're like, oh, got to do that thing.

08:28 And it's not a big deal, but we've got to do it 500 times in a request.

08:31 Then it is kind of a thing.

08:34 So let me give you an example.

08:35 I'll give you some examples.

08:36 So for example, the good portions of the show notes on talkpython.fm are in Markdown.

08:43 I don't want to show people Markdown.

08:44 I want to show them HTML, right?

08:47 So when a request comes in, it'll say any fragment of HTML that needs

08:53 to be turned into Markdown instead of just going, oh,

08:56 let me process that.

08:57 It just goes, all right, what is the hash of this or some other indicator of the content?

09:03 And then I've already computed that and stored it in disk cache.

09:06 So here's the HTML result.

09:08 Another example is there's a little YouTube icon on each page.

09:13 And that's actually in the show notes, but then the website parses the YouTube ID out

09:17 and then embeds it with an, like, there's a bunch of stuff going on there to keep YouTube

09:22 out of spying on my visitors.

09:25 But stuff happens, YouTube ID is used.

09:27 That could be parsed every time.

09:29 Or I can just say this episode has this YouTube ID.

09:33 That information goes into a cache, right?

09:35 And because it's a disk cache sort of scenario, like a file-based one, not an LRU cache.

09:42 It doesn't change the memory footprint and it's shared across processes.

09:46 So in like the web world, it's really common to have a web garden

09:48 where you've got like two or four processes all being like round robin to

09:53 from some web server manager thing, right?

09:56 If you don't somehow out of process that, either Redis or SQLite or database or something,

10:03 then all of those things are recreating that, right?

10:05 They can't reuse that, right?

10:07 So there's a lot of interesting components there.

10:09 And I suppose your web deployment, you have like a big VM, I suppose,

10:11 and then there's like multiple Docker containers running,

10:14 but they do all have access to the same volume, and that's how you access SQLite.

10:18 Bingo, yeah, exactly, exactly.

10:21 And how am I doing?

10:22 Yeah, so what I have done is in the Docker Compose file,

10:26 I have an external, This is also important for Docker.

10:29 So I have an external folder on a big hard drive in the big VM that says, here's where

10:34 all the caches go.

10:36 And then depending on which app, it'll pick like a sub directory it can go look at or

10:40 whatever that it's using.

10:41 And so that way, even if I do a complete rebuild of the Docker image, it still retains

10:48 its cache from version to version and all that kind of business.

10:51 You could do that with a persistent VM as well, volume as well.

10:55 But I've just decided--

10:57 you can go and inspect it a little easier and see how big the cache is and stuff like that.

11:00 OK, so we're going to get into the weeds of how disk cache works exactly.

11:04 But I'm triggered here because it sounds like you've done

11:06 something clever there.

11:07 Because what you can do in disk cache is you can say, look, here's a file that's SQLite.

11:11 And then it behaves like a dictionary, but it's persisted on disk.

11:14 But what I just heard you say is that you've got multiple caches.

11:16 So am I right to hear that, oh, for some things that

11:19 need to be cached, let's say the YouTube things, that's a separate file.

11:22 And then all the markdown stuff, that's also a separate file, and therefore if connections need to be made to either,

11:27 it's also kind of nicely split.

11:29 Is that also the design there?

11:30 Yeah, that is.

11:30 And actually, before, like, we're going to dive into all the details of how it works,

11:33 but I'll just go, I'm just to give people a little glimpse.

11:36 I'll go ahead and show, I've got this whole admin back in here.

11:39 And I've got different caches for different purposes.

11:42 Because they're just SQLite files, you can either say, give me the same one,

11:45 or you can say, this one is named something else, and it has a different file name or different folder or whatever.

11:50 Right, so I've got one that stores things like that YouTube ID I talked about

11:53 any markdown, any fragment of markdown anywhere in the web app that it needs to say that needs

11:58 to go to HTML, like just.

12:00 Yeah, and it's like 8,000 items in that thing.

12:03 Yeah.

12:04 In this one, there's 8,970 items, which is nine megs, right?

12:08 I mean, it's not huge, but it's not too bad.

12:10 And you can actually even see where it thinks it lives, but that's not really where it lives

12:14 because there's, you know, the volume redirects and stuff.

12:17 But I've also got stuff for directly about the episodes that it needs to pull back.

12:22 And then I do a lot of HTTP caching.

12:25 And one of the things that I think is really wrong with web development is people say,

12:30 well, that's like a stale image or that's a stale CSS file or JavaScript, you know,

12:33 all that kind of stuff.

12:34 So if you just do like super minor tricks and just put some kind of hash ID on the end

12:41 of your content, it will, and you teach your CDN or whatever, that that's a different file

12:47 if it varies by query string, then you never, ever have to worry about stale content ever.

12:52 right but computing that can be expensive especially for remote stuff like if it's it's on a different

12:57 it's like a s3 thing but you still want to have it do that so i have a special cache for that and

13:01 that takes that's like pretty complicated to build up because it's got to do like almost 700 web

13:06 requests to figure out what those are but once they're done it's blazing fast you don't have to

13:10 do it again right unless it changes then it doesn't change much and so on so there's that's the way

13:14 that i'm sort of using and appreciating disk cache yeah it works well in your setup because you've

13:19 gone for the VM route. I mean, if you go for something like Fly.io or maybe even

13:24 DigitalOcean has like a really, I think it's a nice like app service, but that

13:27 all revolves around Docker containers that like spin up horizontally. And I

13:31 don't think those containers can be configured in such a way they share the volume.

13:36 So in that sense, you could still use disk cache, but then

13:40 each individual instance of the Docker container would have its own cache, which still could

13:43 work out.

13:45 Not going to be as well well functional. It's going to be better with your setup, though.

13:50 Yeah, absolutely. I agree, though. You could still do it. Or you could go, I'll take the

13:55 zen of what Vincent and Michael are saying today, and I'll apply that to Postgres, or

13:59 I'll apply that to whatever data. You could pull this off in a database.

14:03 You would just have to do more work. Yeah. I mean, I've had a couple of, I think

14:07 it was like a Django conference talk I saw a while ago. They were also raving about

14:11 disk cache. But the merits of disk cache do depend a little bit on your

14:15 deployment, though.

14:15 That is, I think, one observation.

14:17 Like in your setup, I can definitely imagine it.

14:18 Interesting.

14:19 Yeah.

14:19 Yeah.

14:20 Well, I don't even think we properly introduced this thing

14:22 yet, so.

14:23 But let's maybe go there.

14:24 Yeah.

14:24 Let's start there.

14:25 Let's start there.

14:26 It's time.

14:26 OK.

14:27 It's time.

14:27 Yeah.

14:29 I guess the simplest way I usually describe it, it really behaves like a dictionary,

14:33 except you persist a disk and under the hood is using SQLite.

14:36 I think that's the-- it doesn't cover everything, but you get quite close, if that's the way it is.

14:40 I think there might be--

14:42 you know, I keep harping on this on the show, but there are so many people that are new to Python

14:45 and programming these days.

14:47 Many, many of them, almost half of them.

14:49 I think it's worth pointing out, just like, what is SQLite?

14:51 Like, why is it different than any other database?

14:54 Like, why have I been using the word database or SQLite

14:56 when SQLite is a database, right?

14:57 That's weird.

14:58 - So, I never really took a good database, of course.

15:01 I might be ruining the formalism of it.

15:04 But the main, like, for me at least, the way I like to think about it is Postgres,

15:08 that's a thing I can run on a VM, and then other Docker containers can connect to it

15:13 because it's running out of process.

15:14 There's some other process that has the database somewhere,

15:17 and I can connect to it.

15:18 And I think the main thing that makes SQLite different

15:20 is that, no, you got to run it on the same machine,

15:23 on the same process where your program is running.

15:25 And that's, I think, the main--

15:26 and there's all sorts of little details, like how the data structures are used internally,

15:30 and SQLite doesn't have a lot of types.

15:32 There's lots of other differences.

15:33 I think that's the main one.

15:35 Unless, Michael, I forgot something.

15:36 MICHAEL LUTH: Yeah, no, I think it's--

15:38 and it's--

15:40 operationally, it's a separate thing run. It has to have both, it has to be secure because if your data gets exposed, like-

15:49 For Postgres, is it not for SQL? Yes, it's running somewhere. People can SSH in if you're

15:54 not careful. You've got to be mindful of passwords and all that stuff. That's totally true.

15:58 Right. And it can go down. Like it could just become unavailable because you've screwed up

16:02 something or whatever, right? It's a thing you have to manage in the complexity of running your app

16:07 when it's like, well, it used to just be one thing I could run in a Docker container. Well,

16:10 now I got different servers, they got to coordinate and there's firewalls and there's like, it's just,

16:14 it just takes it so much higher in terms of complexity that like SQLite is a file.

16:19 Yes.

16:20 I mean, I do want to maybe defend Postgres a little bit there.

16:22 Cause one thing that's like really nice and convenient in terms of like CICD and deployments

16:26 and all that, oh, suppose you want to scale horizontally and there's like Docker containers

16:31 running on the left and there's this one Postgres thing running on the right.

16:34 I mean, you can just turn on and off all those Docker containers as you see fit.

16:38 they're just going to connect to the Postgres instance.

16:40 And I've done this trick for Calm Code a bunch of times

16:43 where I just switch cloud providers, because Postgres is running there,

16:46 and I can just move the Docker containers to another cloud provider, and it all works fine.

16:50 No migration necessary.

16:52 With SQLite, that aspect is a little bit more tricky.

16:54 You have to be a bit more mindful.

16:56 Although, I should mention, might be worth a Google.

16:59 There's actually this one new cloud provider that's very much Python-focused.

17:02 It's called Plash, P-L-A dot S-H, I think.

17:06 Oh, this is new to me.

17:07 Yeah, so I think--

17:08 Wow, OK.

17:09 Look at this.

17:09 From.py to.com in seconds.

17:12 Yeah, it's the Answer AI, Jeremy Howard and friends.

17:15 I don't know to what extent this is super production ready.

17:18 And SQLite, you've got to be mindful of the production aspect

17:22 for some reasons as well.

17:23 But one thing that is kind of cool about them is they give you a persistent SQLite as a database

17:29 and a pipeline process that can just kind of attach to it.

17:32 And they just-- in their mind, that's the simplest way that a cloud provider should be.

17:36 take a very opinionated approach.

17:38 So yeah, if you're interested in maybe running this

17:40 as a web service, migrations are a little bit tricky

17:43 in that realm, because you do have to download the entire data set due to migration

17:47 and upload it again, I think, if I recall correctly.

17:50 And for some apps, that's no big deal.

17:52 Others, that's a mega deal.

17:53 Depends how big that data is.

17:55 So I'm not suggesting this is going to be for everything and everyone,

17:58 but I do think it's cool, which is why I figured I'd mention it.

18:00 Oh, it's new to me.

18:03 I'm going to follow up with a lightstream.io.

18:06 Have you seen this?

18:07 Yeah, that is also really neat.

18:11 So basically, what if you want to back up your SQLite?

18:13 Like, how could you do that?

18:15 Oh, it might be nice to do that with S3.

18:17 And I think it's like the guy who made the thing works at Fly.io.

18:21 He's doing a bunch of low-level stuff.

18:23 One thing about that open source package is also really interesting, by the way,

18:26 is I think he refuses PRs from the outside.

18:30 He just wants to have no distractions whatsoever.

18:33 He has a very interesting way of developing software.

18:35 You can submit issues, of course.

18:38 I think if you scroll down, there used to be a notice that

18:40 basically said, hey, this is a--

18:42 I'm not running this--

18:43 Yeah.

18:44 There you go.

18:45 We welcome-- yeah, contribution guide.

18:48 We welcome bug reports.

18:51 Yeah, this is a way where you can basically stream updates to S3.

18:54 And the main observation there is S3 is actually really cheap

18:58 if all you do is push stuff into it.

18:59 If you never pull it out, usually getting it out is the expensive bit of S3.

19:03 So this is like pennies on the dollar for really decent backup.

19:08 And you can also send it to multiple--

19:09 you can send it to Amazon and also to DigitalOcean,

19:11 if you like.

19:12 Yeah.

19:13 Yeah, because these days, S3 is really a synonym for blob storage on almost any hosting platform.

19:20 Like, it used to be S3 might go to literally S3 at AWS.

19:23 But now it's like, or DigitalOcean object spaces, or to you name it.

19:29 They've all adopted the API, kind of like OpenAI's API.

19:32 Yeah, I will say it's a little bit awkward that you have to--

19:35 like, sometimes you go to a cloud provider, and they say, you have to download a SDK

19:40 from a competing cloud provider, and then you can connect to our cloud bucket.

19:44 I know.

19:44 And it's usually Bodo 3.

19:46 And Bodo 3 is--

19:48 if you want to cry because you're using a library, like, Bodo 3 has a good chance of being the first one

19:53 to make you do it.

19:53 It is so bad for me.

19:55 It's so not custom--

19:57 It's not built with craft and love.

19:59 It's like auto-generated where you pass these--

20:02 like, you pass this kind of dictionary, and then the other argument takes a separate dictionary

20:05 that relates back-- it's just like, could you give me a real API here?

20:09 IAN MCKAYAN: I mean, the one thing I can appreciate about Bodo that I do think is honest to mention

20:12 is they do try to just maintain it.

20:15 The backward compatibility of that thing also means it can't move in any direction as well.

20:19 And I can't-- there is this meme where Google kills all

20:22 of its products way too early, and Amazon's meme that they kill them way too late, sometimes never.

20:27 Right?

20:28 So in that sense, I can appreciate that they just try to keep Bodo just not necessarily as user friendly,

20:33 but they do keep it super stable.

20:34 Like, I get there's a balance there.

20:36 Yeah.

20:37 I feel like we still haven't really introduced this cache.

20:39 We've kind of set the stage.

20:41 Anyway, but yeah, SQLite, super cool.

20:44 How does it work under the hood?

20:45 Well, it's really just like a Python dictionary.

20:47 So you can say something like, hey, make a new cache.

20:49 And then you can do things like cache, square brackets,

20:52 string name, equals, and then whatever Python object you like can go in. And Python has this serialization method called a pickle.

21:00 Serialization just means, well, you can persist it to disk in some way, and then you can sort of

21:05 get it back into memory again. And that's what disk cache just uses under the hood. So in theory,

21:10 any Python object that you can think of can go into disk cache. The only sort of thing to be

21:16 mindful of is if you have like Python version, if NumPy version 1 in Python 3.6, and you're going

21:21 to inject a whole lot of that into this cache.

21:24 Don't expect those objects to serialize nicely back

21:26 if you're using Python 3.12 and NumPy version 2 or something.

21:29 Right, because pickle is almost an in-memory representation

21:33 of the thing.

21:34 And that may have evolved over time.

21:36 That's also a true statement about your own classes, potentially.

21:39 Yeah, so if you're dealing with multiple Python versions

21:41 and multiple versions of different packages, there's a little bit of a danger zone to be aware of there.

21:47 That said, for most of the stuff that I do, that's basically a non-issue.

21:50 But I do get this nice little object that can just store stuff into SQLite and can get it out.

21:56 And it's very general.

21:58 It's going to try to be clever about it.

21:59 Like if you give it an int, it's going to actually store it as an int and not use the pickle format.

22:03 So there's a couple of clever things that it can do.

22:06 And it's also really like a Python dictionary.

22:07 So you can do the square bracket thing.

22:09 You can also do the delete and then cache square bracket thing to delete a key from the cache.

22:15 Just like a Python dictionary, you have the get method.

22:17 So you can say dot get key.

22:19 And if it's missing, you can pass a default value.

22:22 So it's very much like a dictionary.

22:25 I think Bob's your uncle on that one.

22:27 Unless, Michael, I've forgotten something.

22:29 But I think that's the simplest way to do it.

22:30 Yeah, pretty much.

22:31 Yeah, I think so.

22:32 The difference being it's not in memory.

22:34 It's stored to a file.

22:36 It happens-- it's not always a SQLite file.

22:39 But often, it is a SQLite file as its core foundation

22:43 that it's stored to.

22:43 So it gives you process restart ability, where it still remembers the stuff you cached.

22:49 It's not like LRU cache.

22:50 We got to redo it every single time.

22:52 And I think, I don't know where it is in the docs here,

22:56 but the thread safety bit of it and the cross-process safety

23:00 is really nice about, is it persistent?

23:03 You've got this whole table here, things like, is it persistent?

23:06 Yes.

23:06 Is it thread safe?

23:07 Yes.

23:07 Is it process safe?

23:08 Yes.

23:10 Compared against other things people might choose.

23:13 And that, honestly, I think that is the other half of the magic.

23:17 Yeah, so especially for your web stuff, I would say that that's the thing you really want.

23:21 And some of that, of course, is just SQLite itself.

23:25 Historically, one reason why people always used to say, like, use Postgres, not SQLite,

23:28 has to do with precisely this concurrency stuff.

23:32 My impression is that SQLite is really good at reading, but writing can be slow if multiple processes do it.

23:37 Some of that, I think, is related to the disk as well.

23:39 I don't know to what extent that has changed.

23:41 But historically, at least, whenever I was doing Django, hanging out at Django events,

23:45 People are always saying, like, just use Postgres because it's better for the web thing.

23:48 But it is safe, the SQLite.

23:51 It might become slower, but it is thread safe if it's--

23:53 MARK MANDEL: Right.

23:54 There's actually-- they've thought a lot about in this thing

23:58 about transactions, concurrency, and basically dealing with that.

24:02 But it is ultimately, for the most part, still SQLite underneath.

24:07 But the thing with a cache is if you're writing it more

24:10 than you're reading it, you probably shouldn't have a cache.

24:13 Yeah.

24:13 I mean, like...

24:15 That beats the purpose.

24:18 Exactly.

24:18 Like, you get no value if you're recreating it.

24:21 You're only probably just doing overhead and wasting memory or disk space.

24:24 So it's inherently a situation where it's going to be pretty read-heavy,

24:30 and SQLite is good at read-heavy scenarios.

24:32 And maybe it's also fair to say, like, the LRU cache that you get with basic Python,

24:36 so also to maybe explain that one, so the LRU cache is a little bit different because you decorate a function with it,

24:41 and then given the same inputs, one output goes out,

24:44 you can kind of keep track of a dictionary that's in memory.

24:47 If you don't have a lot of stuff to keep in the back of your mind,

24:49 then maybe you don't have to write to disk, right?

24:51 So there's also maybe a reason to just stick to caching mechanisms

24:54 that use Python memory, because I also think, I would imagine it to be quicker too.

24:59 But maybe that's also...

25:01 Probably, yes.

25:01 That should be quicker.

25:02 It's just that if you're capped at memory, then you might want to spill to disk,

25:05 and then disk cache becomes interesting too.

25:06 Right, for example, you have literally zero serialization, deserialization.

25:11 What you put in LRU cache is the pointer to the object

25:14 that you're caching, right?

25:15 If you've got a class or a list that's part of the LRU cache.

25:18 The one thing that is good to mention is also a really nice feature of disk cache

25:21 is just like LRU cache has a decorator, so you can decorate a function, disk cache also has that.

25:27 And it works kind of interestingly, too.

25:29 So when you decorate the function, you do have to be a little bit careful if you use that.

25:34 But then disk cache will--

25:36 I think it will hash the function name and the inputs

25:39 that you pass.

25:40 I don't know if it also hashes the contents of the function.

25:45 Like if you change the function itself, I don't know if this cache will actually put that

25:49 in a different slots, if that makes sense.

25:51 Yeah, you can say @cache_memoize, which is the design pattern speak for just remember this.

25:57 Yeah.

25:57 It takes the arguments.

25:59 And then it has like the Fibonacci sequence, which is the classic example, of course.

26:02 And like there are some extra things you can set there as well.

26:05 So you can say things like, hey, I think you're able to--

26:08 yeah, you're able to set the expiry.

26:10 So you can say things like, I want to cache this, but only for the next five minutes or so,

26:14 which can make a lot of sense if you're doing a front page

26:16 kind of a thing.

26:17 So like the Reddit front page or something like that that updates, but not every second.

26:20 It probably updates once every five minutes or something like that.

26:23 And then you do want to have something that's cached,

26:25 but then after that, you want the cache to maybe basically just reset.

26:28 And that is something you can also control with a few parameters here.

26:31 Right, that's interesting.

26:32 There's a couple good use cases that come to mind for me.

26:35 Like one, if I put this on the function that generated the RSS feed for Talk Python,

26:39 I could just say every one minute and then it might be a little bit expensive to compute

26:44 because it's got a pars, you know, 535 episodes or whatever.

26:48 But then for one minute, all the subsequent requests,

26:50 just here's the answer, here's the answer.

26:52 And then without me managing anything, it will just automatically the next minute refresh itself

26:58 by the nature of how it works, right?

27:00 How much traffic do you get on that endpoint?

27:01 Just roughly, like you're asking.

27:02 One terabyte of RSS a month.

27:05 Okay, gotcha.

27:06 Okay, but there you go.

27:08 Like then just doing that like once a minute instead of like many times a minute will be a huge cookie.

27:13 I would say it's probably more than one request a second.

27:17 And the file size, the response size of the RSS feed is over a meg.

27:21 And so it's a non-trivial amount of asking, you know.

27:24 Yeah.

27:24 And then like, and how do you fix that with a whole bunch of infrastructure?

27:27 No, with a decorator.

27:28 Like that feels...

27:29 Exactly.

27:30 Exactly.

27:31 You pretty much summed up all the reasons why I'm so excited about this,

27:34 because it's like you could do all of this complex stuff

27:36 or just like you could just literally in such a simple way,

27:41 just not recompute it as often.

27:43 Yeah.

27:43 Here's the danger.

27:44 What if there's a race condition?

27:46 And oh my goodness, two of them sneak in.

27:48 You know what I mean?

27:49 Like, okay, so you've done a little bit extra work and you throw it away.

27:52 Who cares?

27:52 I want to use Redis now.

27:53 And Redis is cool, but I've never used it before.

27:56 Better buy a book.

27:57 Okay, no.

27:58 With this cache, if you, I mean, I'm sure it won't solve everything,

28:02 but this, I make a bit of a joke by saying, just use a decorator.

28:05 But it's honestly that feeling that this library really

28:07 does give you.

28:08 You can just use it as a decorator, which has a lot of great use cases.

28:12 You can just use it as a dictionary.

28:13 So it still feels like you're writing Python.

28:16 It's just Python with one concern less.

28:19 And that is the magic.

28:20 MARK MANDEL: And it takes on so many of the cool aspects

28:22 of these high-end servers like Postgres or Redis or Valkey.

28:27 Valkey is sort of the shiny new Redis, right?

28:29 I would actually love to do a Redis benchmark.

28:31 I haven't done that yet.

28:32 But one thing I do wonder with disks are getting so much faster.

28:36 Yes.

28:36 Right?

28:37 And so you can actually at some point wonder like how much faster is Redis really going to be

28:40 and how much money are you willing to spend on it?

28:43 Because if your cache is huge and it allows to go in memory in Redis,

28:47 it could be wrong, but Redis is fully in memory, I think, right?

28:50 I believe so.

28:51 There is a database aspect.

28:53 Redis is weird because it could be so many.

28:54 Redis is cool.

28:55 It can do a lot.

28:57 But they do have, they actually have benchmarks here on.

29:00 Ah, there you go.

29:01 Compared against memcached and Redis.

29:06 And it has the get speed and the write speed.

29:08 And this is smaller is better.

29:10 Yeah, look at this.

29:11 Disc cache beats Redis.

29:13 And I imagine that's because of the network hop or something.

29:17 Yeah, exactly.

29:17 I bet it's the network, the network connection.

29:20 So if you're running it on the same machine, you would have a different number there.

29:24 Might be good to maybe caveat that.

29:25 Yeah, it might be.

29:26 I mean, but also that I think that I don't know when they ran this benchmark,

29:29 but I just checked on PyPI.

29:32 This project started in 2016.

29:34 So it might-

29:35 I bet this is 2016 data right here.

29:38 If I know how these docs go.

29:40 Yeah, so it could also be that those are old disks comparing old memories, right?

29:45 So then this is one of those weird benchmarks.

29:47 You got to really run them every six months or so for them to remain relevant.

29:51 Yeah, yeah.

29:51 I mean, the new NVM, VVM, whatever, disks, SSD disks are so fast.

29:58 And also they're not memory.

29:59 Memory is expensive nowadays.

30:01 Yes, it is.

30:02 People want to build data centers with them, I've heard.

30:04 Yeah, yeah.

30:05 And on the cloud there, this is a totally, this is another really interesting aspect to discuss.

30:11 Probably more of a web dev side of things.

30:13 But if you do LRU caches or even to a bigger degree, I run a whole separate server,

30:18 even if it is just a Docker container that holds a bunch of this stuff in memory,

30:22 that's going to take more memory on your VM or your cloud deployment or whatever.

30:26 And if you just say, well, I have this 160 gig hard drive. That's an NVVM high speed drive. Like maybe I could just put a

30:33 bunch of stuff there and you could really thin down your deployments, not just because it's not in

30:39 memory in a cache somewhere, but if you're not having any form of cache, you might be able to

30:42 dramatically lower how much compute you need and avoid them. Right. Like there's layers of how this

30:48 could like shave off. And again, it's one of those things of like, oh, I just, can I pay for disk

30:53 instead? Oh, that's a whole lot cheaper. What else do I got to do? You just got to write a tech

30:56 creator yeah i think i pay five dollars for uh i think i remember exactly but i pay something like

31:03 five dollars for 400 gigs of disc there you go and do you know how much 400 gigs of ram will cost

31:09 on the cloud um well i mean more there goes the college tuition but the exactly sorry kids yeah

31:18 no but like it's um and again like i vividly remember when i started college people were

31:23 always saying oh keep it in memory because it's way faster than disc but i i think we got to let

31:26 a lot of that stuff just go.

31:28 Interesting idea. Yeah, I agree, though.

31:30 I think you're right. But anyway, so far, we've mainly

31:34 been discussing the mechanics of it, but there's some bills and whistles I think we should maybe also mention.

31:38 The expiry definitely is one of them.

31:41 There's also first in, first out kinds of things that you can do.

31:45 So maybe if we could go back to that.

31:46 Yeah, there's a bunch of features, actually, there.

31:50 The expiry is interesting already because

31:54 if you do regular, say, LRU caching like that, you have a natural.

31:59 It's going to go away when the process restarts, and that's going to happen eventually.

32:03 Even in a web app, you ship a new version, you've got to restart the thing or something.

32:07 But when it goes to the disk, it starts to pile up, right? That's why I have this little admin

32:10 page that has, like, how big is this and a button to clear it.

32:14 In fairness, that can blow up to a lot of Palooza as well, if you're not careful.

32:18 It is a concern. Yeah, it is definitely a concern.

32:20 And so I think a big way to fix it, like the expiry, we already talked about why it's

32:24 interesting for stale data.

32:26 You want it to just like auto refresh, but it's also just a safeguard of maybe we'll just recompute this once a month.

32:32 It's really quick and easy.

32:34 Yeah.

32:34 Maybe just don't let it linger forever.

32:37 I think what you can also do though, if I'm not mistaken,

32:39 is I think you can also set like a max key size.

32:41 So you can say this cache, this particular disk cache can only have 10,000 keys in it

32:46 and use a first in first out kind of a principle.

32:49 Right, or last accessed or number of times.

32:52 There's a bunch of metrics there actually for how that works.

32:54 Yeah, it's pretty interesting.

32:56 I've never had to fiddle around with them too much,

32:59 but it's one of those things where even if I don't need it right now,

33:02 it is just a relief to see that the feature is there

33:04 in case you might need it later.

33:06 Yeah, yeah, for sure.

33:08 Let me see if I can find out where that is.

33:09 I don't know, keep bouncing around the same spot.

33:11 I've got it, let's just talk through them.

33:12 So I think the tags and expiries are pretty interesting,

33:15 but then there's also, I think, something that surprised me a little bit

33:18 is these different kinds of caches.

33:20 So there's like a fan out cache.

33:23 Have you looked at these?

33:24 These are interesting.

33:25 I remember reading about, I've never used them, but I do remember reading them.

33:29 So let me give you the quick rundown and then you'll understand instantly.

33:32 It's super quick.

33:32 So it uses sharding, which is a database term, right?

33:36 So sharding is like, if I've got a billion records and it's a challenge to put them all

33:40 in the same database entry, database table or server, I could actually have 10 servers

33:45 and decide, well, okay, what we're going to do is if it's a number of the ID of the user,

33:49 if the first number is one, then they go into this database.

33:52 If it's two, then they go in that, right?

33:54 So 2005 goes into the second database and so on.

33:57 So it does that as well.

33:59 This is one of the things it does to try to avoid the issues of multiple writers, I believe.

34:04 So you're less likely to write to the same database.

34:07 So it doesn't have to lock as hard.

34:08 It's kind of what you do where you say, oh, I've got this for the YouTube link.

34:12 And I've got this for the HTML markdown.

34:14 Except those are like different chunks because of their use case.

34:19 But you can also imagine, well, I've got this long list of users.

34:22 but I still want to benefit from having multiple SQLite instances.

34:25 And I suppose that's when you use this, right?

34:26 Yeah, I think that is why.

34:28 So it says, it's built on top of cache, cache fanout, automatically shards.

34:33 Automatically, you just said how many you want, it figures out what that means.

34:36 And it says, while readers and writers don't block each other,

34:38 writers block other writers.

34:39 Therefore, a shard for every concurrent writer suggests it.

34:44 This will depend on your scenario, default is eight.

34:46 So that's pretty cool.

34:48 Yeah, okay.

34:48 And presumably internally does something like hashing to figure out how to like

34:53 send it around the shards.

34:55 Right.

34:55 The keys themselves have to be hashable anyway, probably.

34:57 So just hashes the key and then shards on like the first couple letters or whatever.

35:02 Yeah, this is cool.

35:03 So it avoids the concurrency crashes.

35:06 The one difference for me, the reason I didn't choose fanout cache is because I want to be

35:10 able to say, I want to clear all the YouTube IDs, but I want to keep the really expensive

35:13 to compute hashes.

35:15 I want to be able to clear stuff if I really have to by category.

35:19 And I guess you could also do that with tags, but I'm just not that advanced.

35:22 Well, and also keep it simple, right?

35:23 And again, it's one of those things where it's, oh, it's nice to know that this is in here,

35:27 even if you don't use it directly.

35:28 I do agree.

35:29 This is a really nice feature.

35:30 It is.

35:31 And I probably will never in my life use it, but it's really cool that it's like, you know,

35:34 it's one of those things about a library that when you're thinking about picking it, it's

35:38 like, okay, the core feature is great, but if I outgrow it, what is my next step?

35:42 Do I have to completely switch to something really different

35:45 like Redis or Valkey?

35:46 Or do I just change the class I'm using?

35:49 I had that with this.

35:51 So I actually had that feeling a while ago.

35:53 We're going to get to my example I think a bit later.

35:54 But I was using the Memoize decorator to decorate a function to properly cache that.

36:00 But the one issue I had is an input to that function

36:04 was a progress bar.

36:05 It's kind of a remote specific thing.

36:06 I wanted this one progress bar to update from inside of this one function.

36:10 But the downside was that every time I rerun the notebook, I make a new progress bar object.

36:15 Oh, and that means that the input has a different object going in.

36:17 MARK MANDEL: So you never actually hit the cache.

36:19 JOHN MUELLER: So, oh my god, I'm never hitting the cache.

36:21 This is horrible.

36:23 Then it turns out the Memoize decorator also allows you to ignore a couple of the inputs

36:27 of the function.

36:28 So you can also--

36:28 MARK MANDEL: Oh, interesting.

36:28 Like ignore by keyword or something, keyword argument.

36:31 JOHN MUELLER: Precisely.

36:31 And there's a bunch of use cases for it, and this was one.

36:34 And you can just imagine my relief after writing the entire notebook to then look at the docs

36:39 and go, oh, sweet.

36:42 Yeah, that's super sweet.

36:43 Yeah.

36:44 Okay.

36:45 I think there's right below this.

36:46 Yeah, there's a couple.

36:47 We can just go down this list here.

36:48 There's some cool.

36:48 Oh, Django.

36:49 So they have a legit Django cache.

36:52 Yeah, yeah, yeah.

36:52 Straight in.

36:53 Sweet.

36:54 Yeah, I recall a little bit.

36:56 It was like a huge Django following for this thing.

36:58 Yeah, and I think this is a part why.

36:59 And when I first saw it, the reason it got sent to me is somebody's like,

37:02 oh, this disk cache, Django cache is a really cool thing to just drop into Django.

37:07 And I'm like, that's cool.

37:09 I'm not using Django, but I admire it.

37:11 That's why I didn't really look into it until I saw your use case outside of Django.

37:14 I'm like, oh, okay, I understand how much this can do.

37:17 So the Django dish cache says it uses the fanout cache,

37:20 which we just discussed with the sharding, to provide a Django-compatible cache interface.

37:25 And you just do that in your settings file, and you just say the back end is diskcache.django cache,

37:29 and you give it a location.

37:31 Boom, off it goes, right?

37:32 So really, really nice.

37:33 Cool.

37:34 Yeah, and it sounds like you've done more Django than me.

37:37 How's this sit with you?

37:37 I mean, to be very clear, I do think Django is really nice and really mature.

37:42 I do sometimes have a bit of a love-hate relationship with it because Django can go really, really deep.

37:47 And some of that configuration stuff definitely can be a little bit in your face.

37:51 So the main thing I just want to observe is doing everything manually inside of Django can be very time-consuming.

37:57 So it's definitely nice to know that someone took the effort to make a proper Django plugin in the way that Django wants it.

38:03 That's definitely the thing to appreciate here.

38:05 I've never really used this in a Django app, to be honest.

38:08 Yeah.

38:09 You know, it has a lot of nice settings here.

38:11 Like you can set the number of shards, the timeout, so in case there's a write or read

38:16 contention, it can deal with that or it can at least let you know you're failing.

38:19 It even has a size limit.

38:21 Does it say how you can configure what to cache and whatnot?

38:24 Or is like the-- I've never really used caches in Django in general.

38:27 So I don't know if there's a general cache feature in Django itself that it will just

38:31 plug into or if--

38:32 I think there is a general cache feature in Django.

38:35 The Django people are like screaming silently.

38:38 Yes.

38:38 I apologize.

38:39 I know, I know.

38:40 But I'm pretty sure it's just like a built-in Django cache functionality.

38:44 Exactly.

38:45 Yeah, okay.

38:46 It just routes into this thing.

38:47 Exactly.

38:48 So instead of configuring Redis, you just feed it this and you're good.

38:50 That's the idea.

38:51 Yes, exactly.

38:53 Exactly.

38:53 So there's more.

38:54 The next one, I have to rage against the machine.

38:57 I'm sure it's the way, but DQ, pronounced deck.

39:01 Yeah.

39:02 So I don't know.

39:03 For me, I still say DQ.

39:04 I don't say deck.

39:04 like it's spelled D-E-Q-U-E.

39:06 And I know a lot of computer science people just call that deck,

39:09 but this cache dot DQ or deck, however you want to use it.

39:13 It provides, there's, there's a couple of higher order data structures that like operate on what we

39:18 talked about so far, but to give you data structure behavior, right?

39:21 Like what we talked about so far is sort of dictionary,

39:24 but not list or order or any of that.

39:26 But this, you can actually do go and add a thing.

39:31 How do we add one?

39:32 Anyway, you can say like pop left, pop left.

39:34 over and over to get, I guess you just add it.

39:37 It's kind of like a queue.

39:38 Yeah, like a queue, exactly.

39:40 With the goal of taking stuff out of it instead of into it.

39:42 But you don't normally think of a cache as doing that.

39:44 But it'd be a cool way actually to fan out work across processes.

39:48 I was about to say, that's a really good, I think, I mean, there's people that have made, I forget the name,

39:55 the Python queuing system, salary.

39:59 So that one is also built in such a way that you can say like,

40:02 oh, where do you have the list of jobs that still need doing?

40:04 And I think also Redis is used--

40:06 MARK MANDEL: Yeah, right, Redis and Qum.

40:07 Yeah, exactly.

40:08 Or Rabbit and Qum as well.

40:10 FRANCESC CAMPOY: Yeah, exactly.

40:11 But you can configure SQLite if you want to, though,

40:14 if I recall with those.

40:15 It's just that in this particular case, if you don't want to use salary, you can still kind of roll

40:19 your own by using this cache as well.

40:20 I'm assuming it uses the same pickle tricks, so you can do general Python things

40:24 and if the process breaks for whatever reason, you still have the jobs that need doing.

40:27 MARK MANDEL: Yeah, we're still going to have to talk about this serialization thing,

40:31 These pickles.

40:33 Yes.

40:33 Not yet.

40:34 Let's go through this.

40:34 Let's go through this first.

40:35 Let's first.

40:36 Before we get distracted, because it's a deep.

40:40 So DEC, I guess we'll go DEC.

40:44 DEC provides an efficient and safe means for cross-thread, cross-process communication.

40:49 Like, you would never think you would get that out of a cache, really.

40:51 But it's...

40:52 You would do that in SQLite.

40:54 Yeah, exactly.

40:55 But you would do work to do that, right?

40:57 You would do like transactions.

40:58 And you would do sorting.

40:59 You would figure out, well, what if there's contention?

41:01 I mean, the fact that it's just kind of a pop, it's pretty nice.

41:04 Yeah, that's definitely nice.

41:05 No, that's definitely true.

41:06 Although one thing that makes it easy in this case, though, is again, it is all running in like one process.

41:11 So it's not like we've got SQLite running in one place and there's 16 Docker containers that can randomly interact with it.

41:16 No, that can, though.

41:17 Is that the case?

41:18 Because disk cache itself is already cross-process safe.

41:23 Like, that's why I was so excited about it.

41:25 But it has to be on the same machine, though.

41:27 Like, that's what I do think.

41:28 Yes, it's got to at least be accessible.

41:30 RushRite, that is true.

41:31 because technically there's nothing that says you can't put the file anywhere.

41:35 I think there's mega performance issues and locking issues on,

41:38 it says basically don't use it on network drives.

41:40 Yeah, so that's the thing.

41:42 Some of this is like, okay, you can do the locking.

41:44 You can do all those things.

41:45 You can do it well, but the practicality of the network overhead

41:47 is something that usually causes a lot of confuffle,

41:50 at least in my experience.

41:51 Yeah.

41:52 Okay, another one is disk cache index, which creates a mutable mapping and ordered dictionary.

41:57 So if you kind of really want to lay into the dictionary side,

42:00 Yeah, you can do that.

42:01 That one's transactions as well.

42:04 So you can actually, it has like sort of in-place updates

42:06 and other things you can do.

42:08 So you can say, I want to make sure that I'm going to get

42:11 two different things out of the cache.

42:13 And I want to make sure that they're not changed while I'm doing that, right?

42:17 Just like you would with threading or something.

42:18 Yeah, so nothing can happen in between.

42:20 So they both have to come out at the same time.

42:22 So the two values that I get, they were, they were,

42:26 they both existed at the same time in the cache at the point in time that I was retrieving it.

42:29 Yeah, yeah, exactly.

42:30 So just with cache.transact and you just go to town on it.

42:35 That's pretty straightforward, right?

42:36 Yep.

42:37 Are there any more in here?

42:38 There's a bunch of recipes for like barriers and throttling and probably semaphore-like stuff,

42:44 but I don't really want to talk about.

42:47 But you touched on these eviction policies.

42:48 Here's where I was looking for.

42:49 There's these different ones here that are kind of cool.

42:52 Whoops.

42:52 I didn't go away.

42:53 Yeah, so you can set a maximum to the cache.

42:57 I think you do that by number of items typically in it.

43:00 It could also be the case.

43:01 Size or something, yeah.

43:02 Yeah, or like total disk size, maybe we should double check.

43:06 The default for the disk size is one gig.

43:08 Yeah, there you go.

43:09 So there's already a built-in one, yeah.

43:11 Which might catch people off guard.

43:14 Much of the stuff is cash, but not always.

43:15 I don't understand.

43:17 Yeah, you've got to be a little bit mindful of that,

43:18 I suppose.

43:19 But it's the same default, not to have it go to infinity.

43:24 Agreed.

43:26 Yeah, I guess small screen on my side, but like, yeah, last recently.

43:30 I'll read them out.

43:31 I'll read them out for you.

43:31 Yeah.

43:31 So we got recent last recently stored as a default, every cache item

43:35 records the time it was stored in the cache and that adds an index to that

43:40 field.

43:40 So it's nice and fast, which is cool.

43:41 We have, this is, there's some other ones that are more nuanced, like least

43:46 recently used, not in terms of time, but we've got one that was accessed a hundred

43:51 times and one that was accessed two times.

43:53 Even if the one is accessed two times was just accessed,

43:55 that one's getting kicked out because it's not as useful.

43:57 I don't know.

43:58 That's a pretty neat feature.

43:59 And then the one people would expect is, I don't know,

44:02 maybe at least recently used.

44:04 How are you?

44:04 Yeah.

44:06 Yeah, exactly.

44:07 And there's also pruning mechanisms, if I'm not mistaken.

44:10 So there's all sorts of fun.

44:11 You can argue there are bells and whistles until you need them.

44:15 And one thing I have always found is every item that I see here,

44:19 you might not need it right now, but for every item you see, you do plausibly go,

44:23 oh, but that might be useful later down the line somewhere.

44:25 Like the transaction thing where you retrieve two things at the same time.

44:29 I don't really have a use case for it, but I can imagine it one might,

44:33 where the consistency really matters.

44:35 Yeah, I can.

44:36 I could see using the fan out cache.

44:38 Yeah, definitely.

44:39 But probably not the transaction.

44:41 But I'm already talking to MongoDB, which doesn't have transactions effectively.

44:45 So not really.

44:47 What about performance?

44:48 Should we talk about your graphs?

44:50 You brought pictures.

44:51 Yes.

44:51 So that might be, so when you told me like, hey, let's do an episode on disk cache,

44:55 and I kind of told myself, okay, then I need to do some homework.

44:58 Like, I actually have to use it for something real.

44:59 It's a bit complex.

45:01 So what we're going to try and do is we're looking at a chart right now,

45:04 and I'm going to explain to Michael what it does, and I'm going to try to explain it in such a way such that if you're not watching

45:09 but listening, that you're also going to be fairly interested in what you're seeing.

45:12 And I'll link to the chart, of course, people can.

45:15 Yeah, so this is all running on GitHub pages, and the charts that you see here definitely needed a bit of disk cache

45:20 to make it less painful.

45:23 So what I've done is I've downloaded a Git repository.

45:26 What you're looking at right now is the Git repository for Marimo.

45:29 And then I just take a point in time and I say, okay, let's just see all the lines of code.

45:34 And then I take another point in time.

45:36 And then I basically just do kind of a Git blame to see if the line got changed in between.

45:41 So what you're looking at here is kind of a chart over time

45:44 where it's basically like a bar chart, but it changes colors as time moves forward.

45:49 and the shape that you see is that things that happened early on,

45:53 well, there's a nice thick slab, but it gets a little bit thinner and thinner as time moves forward

45:57 because some of those lines of code got replaced.

45:59 But in the case of Marimo, you can see that, you know,

46:02 most of the lines of code actually stay around for a long time

46:05 and it's kind of like a smooth sedimentary layer every time we move forward.

46:10 It's compressing a little over time.

46:12 Like there's the weight of the project has sort of compressed it.

46:15 So, yeah, it's pretty interesting.

46:17 So, okay, so that's pretty cool.

46:18 But you can also go to Django.

46:21 So there's a director on top.

46:22 MARK MANDEL: So if I go to Django.

46:24 Oh, you can put this cache in here.

46:26 This is really different.

46:27 FRANCESC CAMPOY: Yes.

46:28 MARK MANDEL: What is this telling us?

46:29 FRANCESC CAMPOY: So what you can see here is that at some point in time, there's

46:33 a huge shift in the sediment.

46:35 There's a lot of light sand and a lot of the dark sand goes away.

46:38 There's also a button that allows you to show the version number.

46:41 So I've--

46:42 yep, there you go.

46:43 MARK MANDEL: There we go, yeah.

46:44 FRANCESC CAMPOY: So you can see that right before a new version,

46:46 a bunch of changes got introduced or right after.

46:49 It's usually around the version number that you can see that shift.

46:51 Right, once the feature freeze is lifted, some stuff comes in, PRs come in maybe or something.

46:58 Yes.

46:59 And one other thing that's actually kind of fun, if you go--

47:02 there's this project called Psychot LEGO that you can also go ahead and select.

47:05 And folks--

47:06 I've heard a pretty cool guy maintains that, yeah.

47:09 Well, so the funny thing is you can see that there's

47:11 a massive shift there at some point.

47:12 OK.

47:13 That's when we got the new maintainer.

47:16 Are you on the purple or the green side?

47:18 So there's this dark blue sediment that sort of goes down massively at that point.

47:22 But yeah, no, so but in this case, like the first thing he did is redid all the docs. So we went

47:27 from Sphinx to make docs. And that's like a huge, if you look at the lines of code that changed as

47:31 a result that, you know, that's quite a lot. But if we now start talking about how you make a chart

47:36 like this, you got to imagine like, I take the start of the GitHub history, I take the end of the

47:40 GitHub history, I sample like 100 points in between, and then for every line in every file,

47:46 I do a git blame.

47:49 MARK MANDEL: I think Django is something like 300,000 lines

47:52 of code.

47:52 I mean, that's a lot of--

47:54 A lot of--

47:54 MARK MANDEL: So that thing took two hours and 15 minutes

47:57 on my M4 Mac.

47:58 And if you go there, you can actually select it.

48:02 That one-- that was a chunky boy, is what I'll say.

48:06 MARK MANDEL: Yeah, 550,000 lines.

48:08 Yeah.

48:09 MARK MANDEL: There you go.

48:09 But you can see that there's one version change, I think,

48:12 where they made a bunch of changes.

48:13 And it could be that--

48:14 MARK MANDEL: Yeah.

48:15 That might have been, again, because I checked the docs

48:17 as well on Markdown files, it might have been a big docs

48:20 change.

48:20 But hopefully by just looking at this, you can go like, oh yeah, this is probably a notebook somewhere.

48:25 And there's a huge for loop that does threading and tries to do as much in parallel as possible.

48:31 And there's a progress bar.

48:33 Right?

48:34 Yeah.

48:35 And we don't--

48:37 Now I see why you had this problem with the caching.

48:39 That's right.

48:41 But yeah, but here's also where the threading came in.

48:43 Because the moment you say this point in time, now do all the files, do the git blame, that's definitely something that can happen in parallel.

48:49 But then for every file, for every point in time, you do want to have something in the cache that says, okay, if I have to restart this notebook for whatever reason, that number is just known.

48:58 Don't check it again.

48:59 Yeah.

48:59 Yeah, super interesting.

49:00 Okay.

49:01 I wonder if this 4.0 in 2022, is that might be when they switched to async?

49:08 They started supporting async.

49:10 It could be docs as well.

49:11 I'm not sure.

49:12 Well, yeah, so that's kind of the hard thing of some of these charts.

49:15 Like, I could expand these charts by saying things like,

49:18 okay, only the Python files, et cetera.

49:22 But, like, the way that this is hosted, this is, like, really using the disk as a cache

49:27 because all these charts are Altair charts, and you can save them to disk,

49:29 and then you can easily upload them to GitHub pages.

49:32 So I do everything in disk cache to make sure that if I,

49:35 for whatever reason, the notebook fails, I don't have to sort of do anything fancy to get it back up.

49:41 But then once it's time to actually put it on the site,

49:44 I could use disk cache to show the charts, but then I would need a server.

49:47 So actually using disk to actually just serve some files

49:51 is also just a pretty fine and good idea.

49:54 There are some things on the Marimo side where we are also hoping to maybe give better caching tools

50:00 to the library itself.

50:02 It's just that when I was doing this, I actually found a bug in our caching layer,

50:05 so then I switched back to disk cache.

50:07 You know what?

50:07 Look, that's valuable.

50:08 That's maybe not the way you will find, but it's valuable.

50:11 It's-- oh, so one thing you learn is that caching is actually

50:15 hard to get right.

50:16 Oh, it is.

50:17 It is.

50:18 Very hard.

50:18 It's on par with naming things.

50:22 It is one of the two things that goes wrong--

50:25 naming things, cache invalidation, and off by one errors.

50:28 Yes, exactly.

50:29 It's the middle one.

50:30 Yeah.

50:32 Dad jokes are amazing.

50:33 Anyway, so one thing about this repo, by the way, this is all my--

50:38 we're going to add a link to the show notes.

50:40 There is a notebook, so if you feel like adding your own project

50:43 that you want to just add, feel free to spend your compute

50:46 resources two and a half hours to add a popular project.

50:49 I would love to have that.

50:50 One thing I think will be cool with these sorts of charts

50:52 is to see what will change when LLMs kind of get into the mix.

50:55 Do we see more code shifts happen if more LLMs get used

50:59 for these libraries?

50:59 MARK MANDEL: Very interesting.

51:01 I don't know if more code--

51:02 old code will get changed, but they are verbose code writers,

51:06 those things.

51:06 Yes.

51:07 So this, assuming we can do this over time and we're going to start tracking this,

51:13 I'm calling this code archaeology.

51:15 I do think it will be an interesting chart.

51:18 As is, I think it's already quite interesting to see differences

51:20 between different projects.

51:21 I think if you go to sentence transformers, you can also see when the project got moved from an academic lab

51:25 to hugging face.

51:26 So there are interesting things you can see with these charts.

51:30 But you are going through every file, every line, get blame 100 times.

51:35 Yeah, a lot.

51:36 You got to do 100 times per line as well.

51:39 Well, so the start of the project with a Git repository,

51:42 and then to make a chart like this, you got a sample over the entire timeline.

51:46 And it is a bit cheeky because sometimes you can go like,

51:48 OK, but there's a character that changed because of a linter.

51:51 And then is that really a change?

51:53 Does it really matter?

51:55 It's whoever decided to run the format on the thing or whatever.

52:00 We're looking at the Django chart.

52:01 It could also just be that black just got an update

52:03 or something like that, right?

52:03 Yeah, exactly.

52:04 It's also possible.

52:06 It's very possible.

52:07 It's unlikely, but it's not impossible, let me say.

52:09 But yeah, anyway.

52:11 Yeah, this was one of the benchmarks that I did with disk cache that I thought was pretty amusing

52:16 and pretty interesting.

52:18 But there's this one other feature that I think we should also talk about, which

52:21 is that if you want to, disk cache actually lets you

52:24 do the serialization yourself.

52:26 So normally, what it would do is it would say, like, OK, let's do the pickle thing.

52:31 And it's a bit clever, right?

52:32 So if the thing you're storing is like an integer, doesn't go through the whole pickle thing and just stores it as an integer there's these native types

52:38 that sqlite has and then you know it's able to do something clever but right as soon as it becomes

52:42 like a custom class or a list of weird things then yeah then then it's i personally don't like it

52:49 pickling i would i would prefer that it makes me do something it's i think it's weird well so the

52:53 thing is you can write your own disk class and then what you can do is you can pass set disk class

52:58 onto uh the disk cache itself and i'm just kind of wondering like when might it make sense to do

53:03 this sort of a thing and if you go to the docs there's actually like a really good example just

53:07 right there which is jason yeah i i lost your example if i'll get it back if you type disk

53:13 you'll find it so jason has this interesting thing the it is text if you think about it but

53:18 it's text that has a bit of structure and that means that there are these compression libraries

53:23 you can actually run on them and especially if you have like a pattern that repeats itself let's say

53:28 a list of users or something like that and there's always the user key and there's always maybe the

53:32 the email key and those things just repeat themselves all over the place,

53:36 then there is an opportunity to, there's this library called zlib where you can just take that string,

53:41 you can compress it and then that compressed representation can go into disk cache instead.

53:46 Yeah, I figured that sounds like a lot of fun.

53:49 You can just grab the implementation there.

53:51 I have this notebooks repository where I have LLMs just write fun little notebooks.

53:55 I always check the results obviously just to be clear on that.

53:58 But one thing that was I think pretty cool to see, if you just do the normal data type and you pickle it, then you get a certain size. And if you just

54:08 have a very short, normal Python dictionary basic thing, then it's negligible. You shouldn't use this

54:14 JSON trick. But the moment you get text heavy, there's just a lot of text that you're inputting

54:19 there and there's some repetition of characters. Or if you really do something that's highly

54:23 compressible, it is not unheard of to get like 80%, 90% savings

54:30 on your disk space, basically.

54:32 Now, there is a little bit of overhead because you were doing the compression and decompression.

54:36 But if you're doing text-heavy stuff, this is something that can actually save you a whole bunch.

54:40 And I can imagine for LLMs, this would also be a win.

54:43 OK.

54:43 So this JSON disk, not only does it serialize to and from JSON,

54:47 which I think is safer.

54:49 It can be a pain if you had date times.

54:51 You've got to do something about that.

54:52 or JSON or something like that.

54:53 You can.

54:54 Yeah, yeah, I think.

54:55 But then it's using Zlib here.

54:57 You know, I just actually did something like this for just something in my database.

55:00 It had nothing to do with caching.

55:02 But these records are holding tons of text for something sort of tangential to the podcast.

55:10 And I'm like, I don't really want to put 100K of text

55:13 that I'm not going to query against or search into the database.

55:17 So I used Python's XZ implementation.

55:20 And like you said.

55:21 There's a bunch of compression algorithms you could use.

55:22 It was way fast.

55:23 So I just store it as bytes now, and it's like a tenth the size.

55:26 It's great.

55:26 So I guess this is the same, but for the cache back end, right?

55:29 Yeah.

55:30 And I think-- well, you can see, I think Zlib is being used internally.

55:33 Yeah.

55:33 I mean, it's not XZ, but it's the same idea.

55:36 Yeah, exactly.

55:38 And there's always new compression algorithms.

55:40 Like, feel free to check whatever makes sense.

55:42 But the fact that you have one very cool example to add on the docs, because you can just copy and paste it.

55:47 A lot of people benefit from it.

55:48 But why stop here?

55:50 Because this is the company you can do for JSON.

55:52 What else can you do?

55:53 Before we move on, though, if I were writing this, I would recommend using microJSON or ORJSON

56:00 or whatever, some of the more high performance versions

56:03 right there.

56:03 Yeah, and I think ORJSON--

56:05 I mean, performance is cool.

56:07 The reason I use ORJSON a lot more has to do with the types that it supports.

56:10 So it can accept the NumPy arrays, for example, and just listifies it.

56:14 And I think it has a few things with dates as well.

56:17 It just has a slightly better support for a few things.

56:19 OK, good to know.

56:20 All right, where are we going?

56:21 What's next?

56:23 Numpy arrays.

56:24 OK.

56:25 So a lot of people like to do things with embeddings nowadays.

56:29 So like text thing goes in, some sort of array thing comes out.

56:31 And then hopefully, if two texts are similar, then the arrays are also similar.

56:35 So you can do all sorts of fun little lookups.

56:37 And I do a fair share of doing things with embeddings.

56:40 And embeddings are also not notoriously expensive to calculate, but still pretty expensive to calculate.

56:46 OK, but you can write your full Python thing in there.

56:48 So if you compare storing NumPy as bytes compared to that

56:54 to a pickle, it's actually even.

56:55 There's very little to gain there.

56:57 But one thing you could do is you could say, well, let's maybe bring it down to float 16.

57:03 That's a thing you can do.

57:04 You can sort of say, before we save it, we actually make it just a little bit less accurate

57:09 on the numeric part of it.

57:10 But that'll save us a whole bunch of disk space.

57:11 So that's already kind of old.

57:13 Well, you need it to be super precise when it's involved in calculations.

57:17 But then in the end, if you're not going to report the numbers

57:20 to great decimal places, maybe going down is good, yeah.

57:23 Yeah, it depends on the use case.

57:25 But typically, you could argue maybe a 1% difference

57:28 in similarity if we have 100x savings on disk.

57:32 That'll be kind of a win.

57:34 So one thing I was sort of focusing on is just this--

57:37 you can do things like, OK, come up with your own little weird data structure where you say,

57:42 OK, let's pretend we're going to quantize the whole thing.

57:45 So we're going to calculate the quantiles of the float values that it can take.

57:50 And we're going to take basically 256 buckets.

57:55 We're going to store the scale.

57:57 We're going to store the mean.

57:58 And then we're going to store in what bucket the number was in.

58:01 And you can turn that into a string representation.

58:03 These things are pretty fun to write.

58:05 Nice.

58:06 And yeah, and then you scroll down into your big notebook and then you find this.

58:12 There you go.

58:13 That's a retrieval time.

58:14 I think I got like a 4x improvement in terms of like disk space being saved.

58:20 It was like a 1% similarity score that I had to give up for doing things like this.

58:24 Mileage can vary, of course, but like, again, these are fun things to sort of start playing with

58:29 because you have access to the way that you write that down.

58:32 So that was also like a fun little exercise to do.

58:36 Yeah.

58:36 Could you save NumPy arrays by just putting up, just converting it to bytes?

58:41 There's probably some efficient.

58:42 You know what?

58:43 What about a parquet file?

58:45 Like in an in-memory Parquet file, then you just say,

58:47 here's the value in bytes.

58:50 So I tried the bytes thing and compared it to the pickle thing,

58:53 and that was basically the same size.

58:55 OK.

58:55 That barely led to anything.

58:57 About the Parquet one, I mean--

59:00 You do get compression.

59:01 Well, yeah, but I could be wrong on this one.

59:03 But I think Parquet is optimized to be a disk representation.

59:07 And then once you want to have it in memory, it becomes an arrow representation.

59:11 I see.

59:11 Yeah, probably.

59:12 So in that sense, what I would do is, OK, if you have something in Arrow, you use this cache

59:16 to make sure it's written as parquet.

59:18 But then you have to be-- you kind of have to know what you're doing if you're going to make parquet

59:23 files.

59:24 And also, the benefit of a parquet file is that you have one huge table.

59:27 Because then--

59:27 Right, you can scan it, yeah.

59:28 Yeah, it's a columnar format.

59:30 So then if I were a column, boy, would I want to have all the rows in me.

59:35 So in that sense, what you--

59:38 yeah, so in that sense, what I would do instead is if you, for whatever reason, you just have a lot of data,

59:43 It's still kind of a cache, but it makes more sense to store all of it in like a huge parquet file.

59:47 In parquet, you can store a partition.

59:49 So you can say this one column that's partitioned, a date would be like a very typical thing to partition on.

59:54 And then if you point polars to like parquet, but you say, I only want to have this date,

59:58 it can sort of do the forward scan and only pick the rows that you're interested in.

01:00:03 And I would imagine that that would beat anything we might do with this cache, especially if the table is big.

01:00:09 So you don't always want to cache stuff.

01:00:11 Like I said, I actually don't avoid hitting the Mongo database

01:00:15 a lot for my projects because it's like the response time

01:00:18 is quick enough and might as well.

01:00:20 I want to take two little avenues here.

01:00:22 But the first one is, what about DuckDB?

01:00:25 I know at least on the data science side and the analytics

01:00:28 side, DuckDB is really popular, really well respected,

01:00:31 really fast.

01:00:32 Maybe you don't even cache it.

01:00:33 Maybe you just use DuckDB as a thing.

01:00:36 How do you feel about that?

01:00:37 I mean, DuckDB does solve a very different problem than SQLite or Postgres in a way.

01:00:42 So I don't believe-- to name one thing, I believe DuckDB does assume that everything under the hood

01:00:47 is immutable.

01:00:49 So it will never be ASCID compliant, because it doesn't necessarily have to be.

01:00:52 You can still insert rows, if I'm not mistaken.

01:00:54 But the use case is just assumed to be analytical in general.

01:00:58 That like--

01:00:58 I see.

01:00:59 --it's really designed to sort of fit that use case.

01:01:03 You can insert rows, though.

01:01:04 So like--

01:01:05 I mean, you might be caching data science things that you're only computing once.

01:01:08 Like, for example, your charts.

01:01:11 Once you've computed that, it's not going to change because it's historical.

01:01:14 I mean, I might want to rerun it a month later or something like that.

01:01:16 That's something I might want to do.

01:01:18 And in that particular case, it would be cool if the sampling is the same

01:01:21 and I just want to add one sample at the end that all those samples I had before,

01:01:25 that those are in a cache somewhere.

01:01:26 Or maybe you want faster, better resolution.

01:01:29 Instead of going 100, you're going to go to 1,000 points,

01:01:31 but you could do 10% less because those are done, right?

01:01:34 So stuff like that.

01:01:35 But then what you would never do with a cache is do a group buy

01:01:37 and then a mean, for example.

01:01:39 It's like--

01:01:41 It's not-- it's outgrown its use at that point.

01:01:43 That's for sure.

01:01:45 Yeah.

01:01:45 And if it were a part of it, then the docs would say so.

01:01:47 But like-- no, so in my mind, DuckDB really just solves a different problem, similar to like general SQLite

01:01:54 also solves a different problem than disk cache.

01:01:56 And also Postgres is also solving a slightly different problem.

01:01:59 Sure.

01:01:59 All right, fair.

01:02:00 Like the other angle--

01:02:01 Yeah?

01:02:02 No, go ahead.

01:02:02 Finish your thoughts.

01:02:03 Well, I also really love Postgres, I got to say.

01:02:06 Like the thing I really like about it is that it is like boring, but in a good way, software

01:02:09 where like I have a Postgres thing running there and whatever SSH thing I need, I can just swap the cloud

01:02:17 provider and it'll just go ahead and still run without me having to move the data

01:02:20 or do any migration or anything like that.

01:02:22 That is also just like a nice feeling, but it solves a different problem.

01:02:25 Yeah.

01:02:25 Yeah.

01:02:26 These would very likely be used together, not instead of--

01:02:29 I mean, Postgres can be used instead of disk cache, but disk cache, definitely not instead of Postgres.

01:02:34 So the other one, the other angle I wanted to riff on, have you riff on just a little bit is

01:02:38 think people, especially people who are maybe new to this idea of caching, they can end up thinking,

01:02:43 okay, I'm going to store stuff. We talked a lot about like, I get something back from the database.

01:02:47 I could store that in a cache. So I don't have to query it again or whatever. And those are

01:02:51 certainly good use cases. But I think a lot of times an even better use case is if you're going

01:02:55 to get 20 rows back from a database, do some Python and construct them along with a little

01:03:00 other information into some object.

01:03:03 And then that's really what you want to work with.

01:03:04 Store that constructed thing in the cache.

01:03:07 You know what I mean?

01:03:08 Like, as far as you can go down the compute layer, like, don't just stop like, well, it

01:03:12 comes back from the database, so we cache it.

01:03:14 Like, if there's a way to say it, like, there's a bunch of work after that, think about how

01:03:16 you might cache at that level.

01:03:18 Okay, I'm going to pitch you a dream then.

01:03:21 Imagine you have a Python notebook and, oh, you're running a cell, you're running a cell,

01:03:26 you're running a cell, and halfway the kernel dies for whatever weird reason.

01:03:29 Right.

01:03:30 it'd be nice if I could just reboot the notebook and it would just pick it up again and move further.

01:03:33 Because again, I'm picking something out of a database and I'm doing something little with it

01:03:37 and processing, processing, processing. But wouldn't it be nice if maybe every cell had a caching mechanism?

01:03:43 If only you had some influence. If only we're a company that did this kind of stuff.

01:03:51 You can imagine these are things that we are thinking about. And again, what I'm about to

01:03:55 suggest is definitely a dream. This is not something that works right now. Don't pin me on this.

01:03:59 like we're thinking out loud here.

01:04:01 You can also imagine this being super useful where an entire team can share the cache.

01:04:05 Yeah.

01:04:05 Right?

01:04:06 So if your colleague already calculated something, you don't have to recalculate it again.

01:04:10 There's all sorts of use cases like that as well.

01:04:13 But there may be, there are these moments when you want to have very tight manual control

01:04:17 over what goes into the cache.

01:04:19 That makes a lot of sense and it's great.

01:04:21 But there are also moments when you just really don't want to think about it at all.

01:04:24 And you just want everything to be cached.

01:04:25 Could I just use this thing as a checkpoint?

01:04:28 I can autosave as my code runs.

01:04:30 Yeah, that's cool.

01:04:31 Yeah.

01:04:31 And again, doing this right is hard, because there's all sorts of weird Python types.

01:04:36 And I mentioned the progress bar thing.

01:04:38 And there's all sorts of things that we've got to be mindful of here.

01:04:42 But if you're thinking about really, how would you use this in data science

01:04:46 when you fetch a little bit of data and deal with it,

01:04:48 to me, it is starting to feel more natural than thinking about cells in a notebook,

01:04:52 maybe cache on that level.

01:04:53 Yeah, that's pretty interesting.

01:04:55 just sort of cascade them along as a hash of the hashes

01:05:00 of the prior cells or--

01:05:01 Well, and this is where things become tricky, of course,

01:05:04 because then, OK, I've got this one cell, and I change one function in it.

01:05:07 Oh, if you're going to cache on the entire cell, oh, everything has to rerun.

01:05:10 And then, oh, if you're not careful, your cache is going to be huge.

01:05:13 So like, OK, how do you do this in a user-friendly way?

01:05:16 There's all sorts of--

01:05:17 it sounds easier than it is the one thing I want to say.

01:05:19 Yeah, well, I think you've got a better chance with Marima than with Jupyter, at least,

01:05:23 because you have a dependency graph.

01:05:25 So you can at least say, if this one is invalid, that means the following three are also invalid,

01:05:32 being sort of propagated there.

01:05:33 Totally.

01:05:34 But I try to focus a little bit more on the user experience

01:05:38 side of things.

01:05:38 And one thing I've really learned from the notebook

01:05:40 with the progress bar is just there were moments when I felt like, oh, I just

01:05:44 want this entire thing to be automated.

01:05:46 Don't make me think about this.

01:05:47 And then there were moments where I thought, oh, it's really

01:05:49 nice to have tight manual control.

01:05:51 How do I provide you with both?

01:05:53 Yeah.

01:05:54 That's quite tricky.

01:05:55 But it is a dream, so that's something to keep in mind.

01:05:57 Yeah, maybe someday there'll be a turn on cash flow checkbox

01:06:01 something.

01:06:01 Yeah, or well, at least till then, I do think having something that works on disk instead of memory

01:06:07 these days is also just a boon.

01:06:10 Right.

01:06:10 So this works in data science notebooks.

01:06:12 It works in web apps.

01:06:13 It works in little TUIs.

01:06:15 It doesn't care.

01:06:16 It works with LLMs.

01:06:19 And if you have a kind of--

01:06:20 actually, similar to your set of, If you have multiple processes with your web app running on one VM,

01:06:25 if you have one big VM that you share with your colleagues,

01:06:27 you can also just share the cache, actually.

01:06:29 Yeah, that's true.

01:06:30 So that's the reason you can't just point the same file, yeah.

01:06:32 Yeah, and especially if you're doing like big experiments

01:06:34 like grid search results or stuff like that, that you really don't want to recalculate the big compute thing.

01:06:39 That's actually not too unreasonable.

01:06:41 Yeah, that's cool.

01:06:42 Yeah, if you have a shared Jupyter server.

01:06:44 Yeah, and a bunch of universities have that, right?

01:06:47 Yeah, exactly.

01:06:48 I don't want to go down this path because we're basically out of time.

01:06:51 I respect your time.

01:06:52 However, I do think there's a whole interesting conversation to be had about like, how do you

01:06:57 choose the right key for your, what goes into the cache?

01:07:01 Because you can end up with staleness really easy if something changes.

01:07:05 But if you incorporate the right stuff, you might never run into stale data problems because,

01:07:11 you know, like for example, I talked about the YouTube ID.

01:07:15 Basically the cache key is something like episode, episode data, episode YouTube thing,

01:07:20 colon YouTube, colon the hash of the show notes, right?

01:07:25 Something like that, where there's no way that the show notes are gonna change

01:07:28 and I'll get the old data because guess what?

01:07:32 It's constructed out of the source, right?

01:07:34 Things like that.

01:07:35 There's probably a lot, especially like in your notebook side.

01:07:39 There's a lot to consider there, I think.

01:07:41 - Yeah, I mean, so I remember this one thing with the course where I still wanted it to be cached,

01:07:46 but I wanted to have like text goes in And then five responses from the LLM go out.

01:07:51 And the way you solve that is you just add another key.

01:07:53 But you have to be mindful of the cache key.

01:07:55 And you can-- oh, and you can use tuples, by the way.

01:07:57 That's also something you can totally use as a cache key.

01:07:58 Right, right.

01:07:59 So that was easy to fix.

01:08:01 It's just that you have to be mindful.

01:08:02 Yeah, that's kind of-- I want to give a quick shout out to that.

01:08:05 I want to leave--

01:08:06 I don't want to leave on a sour note.

01:08:08 But I think it's necessary to give this shout out, or this like call this out, rather, is the way I should say it,

01:08:14 is I think this project is awesome.

01:08:15 You think it's awesome.

01:08:17 Honestly, I think it doesn't need really very much.

01:08:19 But if you look at last updated, if you look at the last updated date,

01:08:24 it's really, it hasn't got a lot of attention in the last six months or something like that.

01:08:29 Yeah, and if I look at PyPI, the last release was 2023,

01:08:34 which, yeah, a year and a half ago.

01:08:36 Yeah, and it's okay to, I would like to say that it's okay for things to be done.

01:08:40 It doesn't have to, things don't have to change, but there's also a decent amount of,

01:08:45 like conversations on the issues and they haven't, you know,

01:08:48 like a couple of days ago, actually someone asked about this, but you know,

01:08:52 the last change, I believe the guy Grant who works on it,

01:08:57 started working at open AI about the time changes stopped going on.

01:09:02 I'm not entirely sure. I feel like I could be confused with another project.

01:09:06 So Grant, if that's not true, I apologize, but I think that it is.

01:09:11 Pretty sure that is. Do we have LinkedIn?

01:09:13 I mean, I am comfortable stating is--

01:09:16 let me put it this way.

01:09:18 Yes, my colleague might make a different caching mechanism.

01:09:21 And yes, I might use that at some point.

01:09:22 But at least for where I'm at right now, this cache needs to break vividly in front of my face

01:09:28 for me to consider not using it.

01:09:30 Because it does feel like it's done in a really good way.

01:09:33 The main thing that needs to happen, I think, functionally to make sure this doesn't get deprecated too

01:09:38 badly is just you got to update the Python version.

01:09:40 When a new Python version comes out, you got to update PyPI to confirm,

01:09:43 like, OK, we do support this Python version.

01:09:45 But I mean, most of the--

01:09:48 if you look at the area that needs to be covered, a lot of that has been covered by SQLite.

01:09:51 And that thing is definitely still being maintained.

01:09:54 It's getting mega maintained.

01:09:56 That's right.

01:09:57 So I also don't see the problem.

01:09:59 I'm not going to not use it.

01:10:01 I just want to put it out there on the radar for people

01:10:04 for whom my co-look is to go, oh, Michael and Vincent

01:10:06 were so psyched, and I started to use this.

01:10:08 And now I'm really disappointed because of whatever.

01:10:10 I saw this.

01:10:11 I mean, the only real doom scenario I can come up with

01:10:14 is if SQLite made like a breaking change.

01:10:16 That's the only thing I can kind of come up with.

01:10:18 But the odds of that seem very low.

01:10:20 Yeah, and it's on GitHub.

01:10:21 You can fork it.

01:10:22 I fork it.

01:10:22 Exactly, exactly.

01:10:23 So, no, I definitely am still super excited about it.

01:10:27 I just want to make sure that we put that out there.

01:10:29 I'd intended to talk about it sooner in the conversation,

01:10:32 but you know what?

01:10:33 We were just so excited.

01:10:34 Yeah, no.

01:10:36 This is definitely in my top five favorite Python libraries

01:10:38 outside of the standard lib.

01:10:40 Awesome.

01:10:40 Yeah, I've really, really have gotten awesome results out of it as well.

01:10:43 So remember the way we opened the show and you talked about this,

01:10:45 like when we talked about the LLM building block stuff

01:10:48 on the previous time you were on the show, it was like, oh, we better not go too deep on this,

01:10:53 even though we're both so excited because it's going to derail the show.

01:10:56 We're now one hour and 15 minutes into it.

01:10:58 We kind of cut ourselves off.

01:11:00 I think that was accurate.

01:11:02 Yeah, I mean, you get two dads making dad jokes and riffing on tools they both like.

01:11:06 It's bound to exceed a barbecue.

01:11:10 Yes, I know.

01:11:11 I wonder what would happen if sometime we just removed the time limit,

01:11:15 just got real comfortable and just riffed on something.

01:11:17 It could be hours.

01:11:18 It would be fun.

01:11:18 But maybe not today.

01:11:20 Two-hour live streams exist, Michael.

01:11:22 I know.

01:11:24 I've listened to some podcasts at over three hours.

01:11:26 I'm like, how is this still going?

01:11:27 But you know what?

01:11:28 Yeah.

01:11:28 It's all good.

01:11:29 But yeah, this is a good point in time.

01:11:32 We're both excited.

01:11:33 That's the summary.

01:11:34 That is the summary.

01:11:35 And I think I'm going to let you have the final word on this topic here.

01:11:39 like maybe speak to people just about caching in general

01:11:42 and disk cache in particular as we close it out?

01:11:45 I mean, I guess the main thing that I learned with the whole caching thing in the last couple of years,

01:11:49 I always thought it was kind of like a web thing.

01:11:52 Like, oh, you know, front page of Reddit, that thing has to be cached.

01:11:55 That's the way you think about it.

01:11:55 Yeah, of course.

01:11:56 And thinking about it too much that way totally blocked me from considering, like, oh,

01:12:00 but if you do stuff in notebooks and data science land,

01:12:02 then you need this as well.

01:12:03 And I think there's actually a little emerging discovery

01:12:07 phenomenon happening where people that do things LLM's at some point go like, oh, I need a cache.

01:12:11 And then, oh.

01:12:15 So that's the main thing I suppose I want to say.

01:12:16 Like, even if you're doing more data stuff, like give this disk cache thing a try.

01:12:19 It's just good.

01:12:20 Yeah, it's so easy to adopt and try out.

01:12:23 Like, you can throw it in there.

01:12:24 Just add a decorator.

01:12:26 Exactly, see what you get.

01:12:27 See what you get.

01:12:28 All right, Vincent, welcome.

01:12:30 Oh, thank you for coming back.

01:12:31 I really appreciate it.

01:12:32 Thanks for having me.

01:12:33 Always good to talk to you.

01:12:34 Yeah.

01:12:34 And yeah, see you next time when we, again, find out there's a cool Python library.

01:12:38 Yeah, that's going to be the three-hour episode.

01:12:41 Watch out, y'all.

01:12:43 Have a good one.

01:12:44 Later.

01:12:45 This has been another episode of Talk Python To Me.

01:12:48 Thank you to our sponsors.

01:12:49 Be sure to check out what they're offering.

01:12:50 It really helps support the show.

01:12:52 If you or your team needs to learn Python, we have over 270 hours of beginner and advanced courses

01:12:58 on topics ranging from complete beginners to async code,

01:13:02 Flask, Django, HTMX, and even LLMs.

01:13:05 Best of all, there's no subscription in sight.

01:13:08 browse the catalog at talkpython.fm.

01:13:10 And if you're not already subscribed to the show on your favorite podcast player,

01:13:14 what are you waiting for?

01:13:15 Just search for Python in your podcast player.

01:13:17 We should be right at the top.

01:13:19 If you enjoyed that geeky rap song, you can download the full track.

01:13:22 The link is actually in your podcast blur show notes.

01:13:24 This is your host, Michael Kennedy.

01:13:26 Thank you so much for listening.

01:13:27 I really appreciate it.

01:13:28 I'll see you next time.

01:13:41 I'm out.

Talk Python's Mastodon Michael Kennedy's Mastodon