TinyDB: A tiny document db written in Python

Episode #80, published Sun, Oct 16, 2016, recorded Thu, Oct 13, 2016

Episode Deep Dive Transcript

NoSQL and document dbs like MongoDB have made building fast scalable software that is easy to evolve and maintain much easier for a broad class of applications. Embeddable, file-based databases like SQLite have made "shipping" an application requiring a database a no brainer. The database just runs in process so there is no setup or maintenance.

Yet, when you try to intersect these two excellent capabilities, you'll find the options very limited. There just aren't many embeddable document databases. If you're a Python developer, and you want a native Python solution, the options are much slimmer still.

That's why I'm excited to introduce you to Markus Siemens and TinyDb. This is a 100% pure python, embeddable, pip-installable document DB for Python.

Links from the show:

Markus on Twitter: @siem3m
Markus on the Web: m-siemens.de
TinyDb (github): github.com/msiemens/tinydb
TinyDb (docs): tinydb.readthedocs.io
TinyDb (PyPI): pypi.org/project/tinydb
CodernityDB: labs.codernity.com/codernitydb
Buzhug: buzhug.sourceforge.net
Ultra JSON package: pypi.org/project/ujson
How to Extend TinyDB: tinydb.readthedocs.io/en/latest/extend.html
Extensions: tinydb.readthedocs.io/en/latest/extensions.html

Episode Deep Dive

Guest Introduction and Background

Markus Siemens is a Python developer who created TinyDB, a pure-Python, document-oriented database designed to be embedded directly into Python applications. He began programming in Java and C++, but got seriously into Python when tasked with building a web tool for configuring servers. Discovering that he enjoyed Python for its simplicity and directness, Markus found a lack of small, lightweight embeddable NoSQL databases. This led him to build and share TinyDB, which has become surprisingly popular in the Python community.

What to Know If You're New to Python

Here are a few quick insights to prepare you to get the most from this discussion:

Be aware that Python can easily handle database interactions through libraries and ORMs, TinyDB is just one of many options.
Understand that Python’s readability makes experimenting with new libraries (like TinyDB) a lot simpler than in some other languages.
Working with dictionaries (key-value pairs) is a basic Python skill. Many NoSQL databases, including TinyDB, store data in JSON or dictionary-like structures.

Key Points and Takeaways

TinyDB as a Pure-Python Document Database TinyDB is a 100% Python-based, embeddable, document-oriented database designed for small to medium projects. It doesn’t rely on an external server process, just pip install TinyDB, and you’re set. This means you can include a database in your Python app without extra setup or maintenance. It’s excellent for quick prototypes, internal tools, or small-scale applications.
- Links and Tools:
  - TinyDB on PyPI
  - TinyDB GitHub Repo
Why TinyDB over Other Databases Markus originally tried to find a Python-based NoSQL database but saw only solutions like CodernityDB (large codebase) and BuzzHug (relational) or heavier solutions like MongoDB with external servers. TinyDB’s goal is ease of use and minimal overhead, allowing quick schema evolution without migrations. This solves a big developer annoyance by seamlessly handling changing data structures.
- Links and Tools:
  - CodernityDB GitHub
  - BuzzHug on PyPI
Main Use Cases and Limitations TinyDB is perfect for single-user or small-team scenarios, configuration storage, small desktop apps, or simple web apps. Its downsides include limited concurrency handling, by default, multi-process writes or massive datasets can hurt performance. However, community extensions like TinyRecord add some concurrency and transaction-like features.
- Links and Tools:
  - TinyRecord Extension
Flexible Query Language TinyDB’s query system is reminiscent of Pythonic code or even mini ORM-like approaches. You can filter data using the Query object, forming clear and readable expressions such as db.search(User.name == "Markus"). It’s an intuitive style for Python developers, removing the overhead of learning a separate query language.
- Links and Tools:
  - TinyDB Docs - Query
Community Extensions and Ecosystem The database’s extension system lets developers create custom features: custom storage engines, middlewares, and table classes. Examples include storing data in-memory, adding caching, or using specialized file formats like YAML. More advanced add-ons exist, such as TinyMongo, which mimics the PyMongo interface, and TinyDB SmartCache for more selective caching of queries.
- Links and Tools:
  - TinyMongo GitHub
  - TinyDB SmartCache
Serialization and Special Data Types By default, TinyDB stores data in JSON, so standard Python objects like datetime may not serialize. The TinyDB Serialization extension allows custom serialization rules, letting you store more complex objects seamlessly. This extensibility means you can adapt TinyDB to store domain-specific data without rewriting the core database.
- Links and Tools:
  - TinyDB Serialization
Performance Tips (uJSON and More) Since TinyDB is built on JSON reads and writes, parsing performance can be a bottleneck. Installing ujson (a C-based JSON library) can yield significant speedups. For truly large datasets or multi-process support, the best practice is either to switch to a more robust system like MongoDB or carefully craft custom storage or concurrency features around TinyDB.
- Links and Tools:
  - ujson on PyPI
Extending with Custom Storage Engines TinyDB is designed so you can swap out the storage layer. This is a powerful approach for specialized use cases like storing data in AWS S3, adding caching and locking, or hooking in a more advanced backend. It’s as simple as writing a class with read and write methods.
- Potential Ideas:
  - Locking storage for multi-process safety
  - Network-based storage (S3, REST endpoints)
Transaction-Like Functionality (TinyRecord) While TinyDB doesn’t feature full ACID transactions out of the box, an extension called TinyRecord adds a transaction-like approach by handling locks at the database level. This can preserve data integrity if multiple threads are writing at once. It’s still best for moderate use, but it showcases how the flexible extension system can address advanced needs.
- Links and Tools:
  - TinyRecord Extension
Philosophy and Project Status Markus initially wrote TinyDB for a small internal tool and never expected it to gain traction. The project’s popularity highlights the Pythonic demand for simple, minimal solutions. He sees TinyDB as feature-complete for its core goals, with future improvements coming mainly from community extensions, possible API cleanups, and thorough documentation.

Links and Tools:
- TinyDB Documentation

Interesting Quotes and Stories

"I didn’t plan to write a NoSQL database. I just wrote a simple database with dictionary objects, and it turned out to be quite popular." -- Markus Siemens

"After a couple months, I got my first pull requests without any involvement on my side. It's a real great example of open source." -- Markus Siemens

Key Definitions and Terms

NoSQL: A non-relational database approach that typically stores data in a more flexible format, such as documents (JSON-like structures), instead of tables and rows.
Document Database: A type of NoSQL database that stores data in document-like structures, often in JSON, enabling easier schema evolution.
Embedded Database: A database that runs within the application’s process and doesn't require a separate server environment.
Serialization: The process of converting objects (like Python dictionaries or more complex data types) to a storable or transmittable format (e.g., JSON).
Middleware: A layer of software that sits between the database core and your application logic, intercepting operations to provide extra behavior (caching, logging, etc.).

Learning Resources

Below are some resources to help you continue your Python learning journey and better understand databases and Python's ecosystem.

Python for Absolute Beginners: Learn the fundamentals of Python in an approachable course.
MongoDB Quickstart with Python: A short course on working with MongoDB and Python if you need to move up from a lightweight solution to something more scalable.
Write Pythonic Code Like a Seasoned Developer: If you want to get a deep sense of creating expressive, maintainable Python, this pairs well with using TinyDB in clean, Pythonic ways.

Overall Takeaway

TinyDB fulfills a niche: a lightweight, purely Pythonic, document-oriented database that’s embeddable with no external dependencies. It’s perfect for small projects, prototypes, and utilities, thanks to an intuitive design and flexibility. If you keep its concurrency and performance limitations in mind, TinyDB can be both a learning tool and a handy solution for quick-turnaround Python applications.

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 NoSQL and document databases like MongoDB have made building fast, scalable software that is easy to evolve and maintain much easier for a broad class of applications.

00:09 Embeddable, file-based databases like SQLite have made shipping an application that requires a database a no-brainer.

00:16 The database just runs in process, so there's no setup or maintenance.

00:20 Yet, when you try to intersect these two excellent capabilities, you'll find the options are very limited.

00:25 There just aren't many embeddable document databases.

00:28 If you're a Python developer and you want a native Python solution, the options are much slimmer still.

00:33 That's why I'm excited to introduce you to Marcus Siemens and TinyDB.

00:37 TinyDB is a 100% pure Python, embeddable, pip-installable document DB for Python.

00:43 This is Talk Python To Me, episode 80, recorded October 13, 2016.

00:56 Welcome to Talk Python To Me, a weekly podcast on Python.

01:18 Thanks for listening to my podcast.

01:42 I've heard from so many of you that the insight into the industry you gain from each week is very important to you.

01:47 I want to take this moment and tell you about my online courses for those of you who want to go deeper and convert your enthusiasm to working knowledge.

01:55 At Talk Python, we currently have three courses available.

01:58 Python Jump Start by Building 10 Apps, for those of you who are just getting into Python.

02:04 Write Pythonic Code Like a Seasoned Developer, covering over 50 hard-won coding tips.

02:08 And Python for Entrepreneurs, available for early access already, covering web development, design, and everything else you need for an online Python-based business.

02:18 See these courses and more at training.talkpython.fm to hone your Python skills, no matter your experience level.

02:25 Now, I hope you enjoy this week's interview.

02:27 Marcus, welcome to Talk Python.

02:30 Thanks for having me.

02:31 Yeah, it's really great.

02:32 I was talking to Austin, I think it was show 63, possibly, about mutation testing.

02:39 And he said, I'm doing this really cool work with mutation testing.

02:42 Oh, and for the database, I'm using this cool, embedded, document-oriented database called TinyDB.

02:49 And since then, I was like, wow, an embedded document database?

02:52 How awesome.

02:53 And it's 100% in Python.

02:54 So even cooler.

02:56 And since then, I've wanted to have you on the show and talk about it.

02:59 So welcome.

02:59 Yeah, thank you.

03:00 Yeah, it's going to be fun to talk about this database.

03:03 It's tiny, from what I can tell.

03:05 It's great.

03:06 Yeah, it is.

03:07 But before we get into that, of course, let's talk about your story.

03:11 How did you get into programming in Python?

03:12 When I was about 10 years old, I kind of got into programming because my dad bought me a book called C++ for Kids.

03:21 I was 10 years old.

03:23 I didn't really understand much.

03:25 I just copied the examples and made sure it compiled.

03:29 And that was basically it.

03:31 So later when I had a book, Python for Kids and Java for Kids, then with around 14 or 15 years, I started really doing some programming in Java on my own.

03:44 So that's when I actually got really started with programming.

03:49 Cool.

03:49 And what were some of the first apps you built with Java?

03:52 What kind of work was that?

03:53 My first application was a Hangman implementation in Java.

03:58 So I had some problems with how I structured the code because I had to make everything public static in Java.

04:05 So it compiled.

04:06 And then something didn't work and I asked my dad for help.

04:11 And he was horrified when he saw the code because everything was public static.

04:16 And he then helped me out with sorting out where to pass an object and so on.

04:23 So, yeah, I got some help from him.

04:26 And, well, that's how I first got into working with Java.

04:31 And a couple of years later, I applied to university to study electrical engineering.

04:38 And before that, you had to do an eight-week internship.

04:41 So six of these weeks at a small company, which mainly sells servers, which they sell servers with support contracts for enterprises.

04:52 And they asked me to create an application so they can configure the servers via the web.

04:58 And the project was they asked me to do it in Python.

05:01 So that's how I got started with Python and wrote my first serious Python code.

05:08 That's cool.

05:08 And what was your impression?

05:09 Were you like, oh, yeah, Python's awesome.

05:11 Or were you like, whoa, where'd public static void go?

05:13 Where'd public static go?

05:15 Yeah.

05:16 It was, I think it was way easier than I expected.

05:20 Because after like one or two weeks, I was already quite confident that I could do the whole project.

05:26 So I didn't really have any experience beforehand with Python.

05:30 And with that project, after one or two weeks, I was really confident that I could do the whole project without big problems.

05:39 Yeah, that's great.

05:40 Very nice.

05:41 I find a number of people get into Python that way.

05:44 They're like working on some project and people come along, their co-workers or boss or somebody.

05:50 will say, hey, can you do this thing in Python?

05:53 They're like, Python?

05:54 Well, I don't know anything about it, but let me give it a try.

05:55 It turns out to be really easy.

05:57 So that's really great.

05:58 Yeah, it is.

05:59 Definitely.

05:59 It's very easy to start.

06:01 And I would say it's one of the best programming languages to learn programming.

06:07 Yeah, I certainly think it is as well.

06:10 In the United States, they used to teach Java as the primary computer science 101.

06:16 First course, you take programming languages.

06:19 And in the recent years, it's switched to be primarily Python.

06:22 So that's great.

06:23 At my university, people have to start with C and C++.

06:27 And most people have quite a lot of problems with that.

06:31 And I guess it would be kind of an easier route to go with Python first.

06:36 But I don't know.

06:37 It's their choice.

06:38 Yeah, of course.

06:39 I think there's something to understanding a C-oriented language where you actually work

06:45 with pointers and directly with the memory and stuff.

06:48 But I'm not sure you should start there.

06:49 You know what I mean?

06:50 Yeah, it's your very first exposure.

06:52 Just for comparison, my first computer science 101 course in college was Lisp.

06:59 So that was really different.

07:02 All right.

07:03 So that's how you got started.

07:04 What do you do day to day?

07:05 University, I'm still studying.

07:07 And next week, my lectures start again.

07:11 Till then, I do some freelancing work, mainly web development.

07:15 Nice.

07:16 And is that web development in Python?

07:17 No, it isn't.

07:18 It's mainly JavaScript.

07:20 Right now, I'm doing some front-end development, which is mainly JavaScript.

07:24 And sometimes I wish it were Python, but I can't always choose.

07:29 That's right.

07:30 Sometimes if there's good work to be done, you just got to go do it.

07:33 All right.

07:34 So the database that you created called TinyDB is a document-oriented database.

07:40 And that falls into the realm of NoSQL databases.

07:42 And so just for everyone out there listening, I realize many of you know, but maybe some

07:46 of you don't really totally know the difference between, say, relational databases and NoSQL

07:51 databases.

07:51 So relational ones, I think we're all pretty familiar with.

07:55 There's a bunch of tables.

07:55 There's foreign key relationships between them.

07:57 We try to normalize our data, minimize duplication, and sort of set it up for querying from any

08:03 particular angle, right?

08:04 You have a wide range of queries you can write.

08:07 But in the NoSQL world, there's a different set of trade-offs that those databases are making,

08:12 right?

08:12 Like the document databases don't have as much of a flexible set of queries you can write.

08:20 But if you build the documents right, you can really optimize them for exactly the use case

08:24 that you're looking for, right?

08:26 So your data is more hierarchical and more directly models like how you might actually want to work

08:32 with objects and relationships in your code rather than in storage.

08:35 So there's also other ones like key value stores and even graph databases, although in my mind,

08:41 those are not NoSQL databases.

08:43 That's a different debate for a different time.

08:46 Yeah, it is.

08:47 Yeah, but so yours, TinyDB, falls into this document database.

08:51 So why don't you tell us, what is TinyDB?

08:53 Yeah, TinyDB is, like you said, a NoSQL, basically, a database.

08:59 And actually, I didn't plan to do a NoSQL database.

09:03 I just plan to do something that is easily usable.

09:09 I happened to write TinyDB for another project, which was in total, I think, like 300 lines.

09:16 And I didn't really feel like doing SQLite or SQLite database for it, because if I had to change

09:24 anything about the data structure, I would have to write migrations and so on.

09:30 So I just wrote a really simple database with a bunch of dictionary objects, which contain the data,

09:40 and then added some query language for it.

09:43 And it happened to become quite popular.

09:47 Yeah, that's really cool.

09:48 And it is quite popular.

09:49 It's got over a thousand stars on GitHub, which is very cool.

09:52 Yeah.

09:54 You must be proud of that.

09:55 That's great.

09:55 Yeah.

09:56 Yeah.

09:56 What was interesting is that I didn't really do any marketing or promotion at all.

10:02 I just put it on GitHub, uploaded some documentation, and just left it as it was.

10:07 And after a couple of months, I got the first issues.

10:12 And like a year later, I got the first pull requests without any involvement on my side,

10:18 which was really a real great example of open source, how it works.

10:22 Yeah, that's a great validation that people see what you've made, and they actually want to contribute back.

10:27 You're like, wait, what?

10:28 Yeah.

10:28 Where did this person come from?

10:30 They are working on my project.

10:31 How great.

10:32 Yeah, that's nice.

10:33 You made a really interesting comment when you described why you didn't want to use MySQL or SQLite,

10:38 whichever one you were referring to.

10:40 And that's, you don't want to deal with migrations and all the headaches of evolving your schema.

10:45 And I think one of the really, really powerful things about document databases that you don't really appreciate until you try it

10:54 and you've sort of lived with an application over time is the ability to easily evolve the schema, right?

11:01 Like if you want to add a field, you add a field to either your dictionary or class in your application,

11:08 and that just becomes part of your schema.

11:10 It's usually much easier, I think, to maintain these applications that are based on document databases.

11:17 That's my experience anyway.

11:18 Yeah.

11:18 It was my experience too with the project I was working on originally because I didn't really know how the data would look like.

11:26 So I needed something that would allow me to change the data as I need or as I find out I need another field so it wouldn't be hard to edit.

11:37 Yeah, absolutely.

11:38 And I think it's really cool that you have this as an embedded database because I would say probably the most popular choice for these types of databases would be MongoDB.

11:48 But then you've got to deal with another server and, you know, configuring that thing and the connections.

11:54 And just, you know, sometimes you just want a small embedded thing that's just part of your app.

11:59 Yeah.

11:59 What I really like about SQLite is that it's embedded so you don't need any server, you don't need any maintenance or anything.

12:07 So basically Python, basically TinyDB combines the best of SQLite being embedded and MongoDB being a document database.

12:19 Yeah, that's really cool.

12:20 So you said it's written 100% in Python.

12:23 How many lines of code is that?

12:25 Yeah, it's about 1,200 lines of code of which 40% approximately are documentation.

12:32 And in addition to that, like 1,000 lines of test code.

12:37 That is awesome.

12:38 Yeah, and you have 100% test coverage as well on your project, which is really nice when you've got something that's actually managing your data.

12:45 You want to be really pretty sure that it's working right, yeah?

12:48 Definitely.

12:49 Okay, so if it's 100% in Python, how do I get it into my application?

12:56 Like, can I pip install it?

12:57 Yeah, definitely.

12:58 Installing should be very easy.

13:01 Just run pip install TinyDB, and then you just have to open a Python repl to get started, import TinyDB, and basically you are ready to go.

13:14 Yeah, that's really nice.

13:15 And so there's no server to start, like if your app is running, the database is running, right?

13:20 Yeah.

13:21 Yeah, yeah, very cool.

13:22 So you had some reasons why you should use TinyDB and some reasons maybe why you might not.

13:30 So what use cases are really good for considering TinyDB?

13:35 I think it's just like the project I was working on originally.

13:39 I think it works best for small projects where you don't really have or you don't really want to use a bigger database.

13:48 So a small web application, I think some guy used it for a password manager or something.

13:56 So really projects where having MySQL database or something like that would just add too much complexity.

14:04 Yeah, certainly if you're going to deploy an application or library and you don't want to have the getting started steps, say,

14:13 now you need to set up this server and you need to run this script to build this schema,

14:18 and then you need to set the connection string here, and then you can get going, right?

14:24 So to me it feels pretty similar, not an exact fit, but similar to, say, SQLite, when you might use SQLite.

14:31 Yeah.

14:32 It's similar in that you can really easily get started and you don't have to manage a server or something like that.

14:40 Sure.

14:41 So because it is a pretty small database, it doesn't try to do everything, right?

14:45 It's not like MongoDB rewritten in an embedded Python variation.

14:51 It really does do less.

14:52 So there are some things it doesn't do, and maybe if you really depend upon them, you should not focus.

14:59 You should pick something else.

15:00 What are those cases?

15:01 Just so we know people can decide if it fits for them.

15:04 I think the main disadvantage of TinyDB is when you're using multiple threads.

15:11 In that case, there is an extension for TinyDB you can use.

15:15 If you have multiple processes, then you are really out of luck because there is no type of locking the file you're writing to.

15:25 So you would probably run into issues with that.

15:28 And also, it might turn out to be a bottleneck in performance.

15:33 If you have hundreds of thousands of entries in your database and you search them, because TinyDB doesn't have any kind of index, it will have to scan the entire list every time you do a search.

15:45 So you probably shouldn't do really involved data processing with it.

15:51 Okay.

15:51 Yeah.

15:51 And that definitely, some of those are serious.

15:54 Some of them might not be.

15:55 It depends on your use case, right?

15:57 Now, I don't want to talk about it yet.

15:59 We'll get to it.

15:59 But there's a bunch of extensions.

16:01 And I think there are some that will come and solve some of those performance problems, which are interesting to some degree.

16:10 And also, they open the possibility for a listener, an enterprising listener out there, to come and add something like multi-process or multi-thread support, things like that, right?

16:22 So we'll get to the extensibility.

16:24 There are a number of extensions now.

16:26 I hope it's quite easy to add a new one.

16:30 So if there are a couple of things where TinyDB doesn't support them out of the box, but if you want to, you can write an extension, you can write your own storage mechanism, and so on.

16:42 Yeah.

16:43 Great.

16:43 Yeah.

16:43 We'll talk about that in a little bit.

16:44 Awesome.

16:45 So when I think of competitors, you have a few listed, and we've talked about a couple of them.

16:52 Competitors, like we said, is not exactly the right word because it's all open source, but choices you might choose from.

16:59 On one hand, like MongoDB, if you're doing document databases and you're doing real large data production apps, maybe this is, you want to scale out, maybe this is the right choice.

17:10 But there's also some other pure Python databases that I had not heard of that I thought were pretty interesting.

17:15 So do you want to tell us about the two that you listed on your site?

17:19 What are their names?

17:20 Before starting with TinyDB, I first did some research if there already is some kind of embedded Python database I can use, and that fits my use case.

17:32 And there wasn't really, I found two projects which are somewhat similar, but not really.

17:38 For example, there's a project called CodernityDB, which is also pure Python and NoSQL.

17:46 But as far as I know, it's much more complex.

17:49 You have the ability to use indexes.

17:52 You have HTTP support.

17:54 So it might be a great fit for a project.

17:57 But in my case, it was just too much code.

18:01 I think it's like 7,000 lines of code, which would be seven times the size of my project.

18:07 So, yeah.

18:10 And I think that's without unit tests, the size of the code.

18:13 So it's even bigger, yeah?

18:14 Yeah.

18:14 And the other one was BuzzHug.

18:16 BuzzHug is also quite interesting because it's optimized for speed.

18:21 But what I didn't really, the reason why I didn't use BuzzHug was that it didn't have the kind of query language I wanted to use.

18:29 So while it is really fast and it's also pure Python, it just didn't have API in the way I wanted to have or the kind of API I imagined to use.

18:43 So I chose to build my own project.

18:47 And also, it's like twice the size of TinyDB without tests.

18:53 Yeah, definitely.

18:54 So the CoderityDB, that one is a key value store.

18:59 So in some ways, maybe not quite as capable as the document databases.

19:04 It depends.

19:05 They do have some additional indexes and queries you can write.

19:08 But maybe not quite as flexible.

19:10 And then BuzzHug is relational.

19:12 So you have, again, all the trade-offs with managing the schema and evolving it in migrations.

19:17 So, yeah, to me, it feels like SQLite is kind of the biggest competitor to you, you guys.

19:24 Yeah.

19:24 Yeah, cool.

19:26 So can you maybe, I don't necessarily want to talk about code exactly in audio format because that doesn't usually go so well.

19:34 But just give us a sense of, like, what is the API like?

19:37 Like, if I wanted to create, you know, get started with TinyDB, I wanted to create a database and insert some records.

19:44 Like, what does that look like?

19:46 Yeah.

19:46 So you start with importing TinyDB.

19:50 And there are two main classes you will probably use, which is the TinyDB class and the query class.

19:59 So you basically create a new instance of TinyDB.

20:03 And you already have your database.

20:06 You don't need any for the setup code or anything.

20:10 And then you can just call DB.insert or the instance and call the insert function, pass some Python dictionary, and you already inserted your first object into the database.

20:23 Yeah, very nice.

20:24 And so it just basically, you either can directly or into a table insert more or less Python dictionaries, right?

20:31 Yeah.

20:32 And basically, you have all the basic functions, create, read, update, and delete.

20:36 You can do some interesting searching.

20:39 Search for objects with specific parameters.

20:43 You can do some quite involved searching with the API I provided.

20:48 But yeah, I think besides the database itself, the query language is like the second biggest or has the second most code in the whole project.

21:01 If you have a list, you can search for an item in that list with specified properties and so on.

21:09 So if you want to, you can really go crazy with searching.

21:12 But still, the basic API is insert, delete, and then, of course, querying and update.

21:21 Yeah, nice.

21:21 And to me, the query language looks a little bit like SQLAlchemy in that you do it sort of in a Python language and it's translated to the query that goes down to the system, right?

21:32 I actually found out about the SQLAlchemy project.

21:36 I think I just discovered it.

21:39 After I wrote TinyDB, so it happened to have a very similar query language.

21:45 Yeah, you didn't take it straight from there.

21:47 But it is a very natural way of doing it.

21:49 So basically, the fact that you use the native Python sort of equality and things like that and then it translates it over is really nice.

21:57 The exchange of data, if you just go against your API directly, is really in dictionaries.

22:04 And sort of extended a little bit with your primary key being on there as a separate type of attribute and so on.

22:11 But more or less as dictionaries.

22:12 Is there an object data mapper layer, kind of like Mongo, has Mongo Engine or Mongo Kit, where you work in higher level stuff?

22:21 Like I would write, say, like, here you have, in your example, you had a person.

22:25 Like, could I create a person class and maybe put rules or columns or structure on that and then use that to query with?

22:32 Yeah, in a way, you can already use the query language of TinyDB as an ORM.

22:39 So you can create a new instance of the query object and call it user.

22:45 And then you can call something like user.name equals to the name you're looking for.

22:52 And you already have a query, which you can use to search the database.

22:58 So you can do that or use that as some kind of ad hoc modeling.

23:04 Okay. Yeah, nice.

23:05 And do you support Python 3?

23:08 Python 2?

23:10 What versions of Pythons do you support?

23:11 I tried or I still try to support as many versions of Python as possible.

23:16 I'm not sure about Python 2.6.

23:18 But upwards from Python 2.7, I think I support every version.

23:23 Also because TinyDB doesn't have any external dependencies even on other Python packages.

23:30 So it's quite easy or it was quite easy for me to write code which works in both Python 2 and 3.

23:37 So if you're using Python 3 or Python 2, either way, you're good to go.

23:43 Yeah, that's really excellent.

23:44 So I'm very happy to see the increased Python 3 support for your project, other projects.

23:50 I had the Bware guys on last time and they're like, it's all Python 3.

23:55 So that's great.

23:56 The other thing, if it's pure Python, the very next question I had was, well, if it supports Python 3, what's the story with PyPy, the JIT compiled version of the interpreter?

24:07 I didn't myself really use it with PyPy, but I still run every commit I do with continuous integration, which tests all the Python 2 and 3 versions and also tests that it works with PyPy.

24:21 But myself, I didn't really have a use case where I would need to use PyPy.

24:27 So I don't know if it's faster or slower than you would expect.

24:31 Okay.

24:32 But it definitely works.

24:34 It definitely works.

24:35 Okay.

24:35 That's really cool.

24:36 And so maybe it's a lot faster.

24:38 Actually, we'll get to some stuff that talks a little bit about speedups that you can do.

24:42 And maybe that just takes PyPy out of the loop in terms of being necessarily important.

24:48 But anytime you have something like this that's pretty central to how your application performs and you can do it on PyPy, that's pretty interesting to see.

25:10 This portion of Talk Python To Me is brought to you by Hired.

25:12 Hired is the platform for top Python developer jobs.

25:15 Create your profile and instantly get access to 3,500 companies who will work to compete with you.

25:20 Take it from one of Hired's users who recently got a job and said, I had my first offer on Thursday after going live on Monday, and I ended up getting eight offers in total.

25:28 I've worked with recruiters in the past, but they've always been pretty hit and miss.

25:32 I tried LinkedIn, but I found Hired to be the best.

25:35 I really liked knowing the salary up front.

25:37 Privacy was also a huge seller for me.

25:40 Sounds awesome, doesn't it?

25:41 Well, wait until you hear about the sign-in bonus.

25:43 Everyone who accepts a job from Hired gets $1,000 sign-in bonus.

25:46 And as Talk Python listeners, it gets way sweeter.

25:49 Use the link Hired.com slash Talk Python To Me and Hired will double the sign-in bonus to $2,000.

25:54 Opportunity is knocking.

25:56 Visit Hired.com slash Talk Python To Me and answer the door.

26:07 Speaking of that, one thing that I saw that was cool, and this is something I learned today,

26:11 I hadn't really paid any attention to it before, is you said one of the speed-ups you can do is,

26:16 if you install a package called uJSON, you can get TinyDB to run a lot faster.

26:23 Because as you can imagine, it's all about JSON parsing, multi-directional JSON parsing, right?

26:28 And so what's uJSON and why is it faster?

26:32 uJSON is a really cool project.

26:34 I think the main or the reason why JSON parsing matters so much is that basically due to the way TinyDB works,

26:43 every time you do, every time you interact with the database, you have to read the database file from the file system.

26:54 So you really need a fast JSON parser.

26:57 And the JSON parser that comes with Python is really powerful.

27:02 You can extend it in a few different ways.

27:05 But if you really need performance, this kind of extension really only gets in the way,

27:13 because probably to implement it in Python, the whole parsing, and then you lose a lot of speed.

27:21 So uJSON is one project I found, which also implements Python, I mean, which also implements JSON parsing,

27:29 but it does it all in C.

27:32 So it's really fast.

27:34 You do have a trade-off that it's not as extensible.

27:37 But if you don't need to overwrite some internals of the JSON parser, then it's really good to use uJSON to get more performance out of it.

27:50 Yeah, that's really cool.

27:52 And you can just pip install that.

27:53 And you said that TinyDB will look for its existence and prefer it if it's there.

27:59 But if it's not there, it's fine.

28:00 It'll just use the standard library JSON, right?

28:03 Yeah, so if you install it, it works like a drop-in.

28:08 You just install it, and Python and TinyDB auto-detects it, and you get the speedup instantly.

28:16 Yeah, nice.

28:17 Let's talk about extensibility a little bit.

28:19 So there was a couple of things I asked about, and you're like, well, I'm not sure it's really in there, maybe.

28:25 But you've written a pretty cool framework for extending TinyDB.

28:31 And so some of the things you can extend are like you could write a custom storage engine.

28:36 You can write custom middleware, custom table classes.

28:40 And then there's a bunch of projects that people have written that then you can plug into TinyDB to make it nicer, faster, etc.

28:49 Yeah, extensibility was something I first started with having support for multiple storages.

28:58 So you could swap out the JSON storage and use memory-only storage.

29:03 I decided to provide even more extensibility so you can write a middleware,

29:09 which modifies the way TinyDB works with any kind of storage.

29:15 And also there are quite a bit of extensions out there you can use to modify TinyDB works in total.

29:24 So there's a bunch of extensions, and one of them is TinyRecord.

29:28 What's the story of TinyRecord?

29:30 TinyRecord was basically written because TinyDB itself didn't have any real support for multi-threading.

29:39 So with TinyRecord, you basically have some sort of transactions and also uses locking.

29:46 So you can use TinyDB from multiple threads.

29:49 And then you have an object where you can add a bunch of modifications you want to use.

29:57 And yeah, basically it's transaction support for TinyDB, which is quite handy if you need multi-threading.

30:03 It depends on your use case, of course.

30:05 Yeah, of course.

30:05 But if you need it, that is really cool.

30:07 So another one that someone wrote, these are all separate projects on GitHub, by the way.

30:13 Another one that somebody wrote is called TinyMongo.

30:16 Yeah, TinyMongo.

30:17 So does that like match the MongoDB API or something, but then actually store it as through TinyDB?

30:25 Or what's the story with that?

30:26 Yeah, TinyMongo, as far as I know, allows you to use an interface or an API which matches the MongoDB,

30:35 API, but instead uses TinyDB.

30:38 I think one day I just got an issue on GitHub where someone said, hey, I have this extension

30:45 for TinyDB so it can replace MongoDB.

30:48 And I included it into the list of extensions.

30:52 Yeah, that's really cool.

30:54 So it does look like it more or less just replaces the PyMongo API.

30:59 So you've got like insert and find one and it even has like the dollar operators.

31:05 So like dollar set and so on.

31:07 Yeah.

31:08 So one of the things that comes to mind for this for me is because TinyDB stores its files,

31:15 at least by default, in a simple JSON document and there's no setup for it.

31:21 If you were doing unit testing and you needed something better than just simple mocking

31:26 and studying out your test data against MongoDB, it might be cool to switch it to this and use

31:33 this as a way to kind of have a slightly better test backend for a real Mongo system or something

31:39 like that.

31:40 If you don't want to start up a separate server just for testing, that might be a really cool

31:46 solution for that problem.

31:48 Yeah, nice.

31:49 So then there's a couple of others.

31:50 One is TinyDB serialization.

31:53 And I noticed that when I was playing around with TinyDB that I, for example, put like a

31:59 date time in as part of my record.

32:02 And of course, you can't, people may or may not know, you can't just go to the JSON module

32:07 and go save this thing if it has a raw date time in it.

32:10 It just, it doesn't support it.

32:11 And that's what happened when I tried to save my thing in TinyDB.

32:16 So does this address that kind of problem and other stuff?

32:18 Yeah.

32:19 I even think that touring date time objects was the reason why the extension came into

32:25 existence.

32:26 Because for TinyDB to tour other objects than dicts or for the JSON module really to support

32:35 it, you have to modify it yourself.

32:40 So the extension allows you to register your own serialization code.

32:45 So you can store all kinds of objects that you want to store.

32:50 And it allows you to specify how exactly it will be stored.

32:54 I see.

32:54 So I can, I can basically tell it like, hey, if you see a date time, serialize it like this.

32:58 And if you see some other type that it doesn't necessarily know about, store it like that.

33:03 Okay.

33:04 Yeah.

33:05 Very nice.

33:05 What about TinyDB SmartCache?

33:08 It's also a really interesting project.

33:10 I think it was created after a pull request or actually, yeah.

33:16 So there was an addition to TinyDB because as I said before, searching can get really slow

33:24 if you have a lot of objects.

33:27 So what TinyDB does is that as long as you don't modify the database, it stores the results

33:34 of your search queries.

33:35 So it doesn't have to redo the work if you didn't change anything at all.

33:40 So really handy in some cases.

33:43 So if you do like a query and you give it some parameters and it comes back with 20 records,

33:49 the SmartCache will say, if I see this query again, just give them those 20 records and more

33:54 or less.

33:55 Actually, TinyDB itself will already do that caching for you.

33:58 So what SmartCache does is that if you basically takes it one step farther, if you search query,

34:06 it stores the result.

34:09 And then if you do some updates on the database, it doesn't throw away the search results, but

34:15 instead replaces them or updates the cache with the new results.

34:21 So if you have a query which matches some elements and if you insert a new element which also

34:27 matches that query, it will go directly to the cache.

34:30 So it doesn't have to redo the host searching, which TinyDB itself would have to do.

34:38 But of course, it's also a trade-off because it uses more memory, because it has to show more results and also do more.

34:47 There is some overhead on every insert and update because it has to check every query if it matches and update the cache in place.

34:57 Yeah, but in some cases it can really be a really handy thing to use if you have a lot of updates and deletes in your code and you don't want to redo or to process all the entries in your database for every search.

35:13 That might be an extension you want to use.

35:16 Okay, yeah, that sounds really cool.

35:18 Very nice.

35:20 So let's also talk a little bit about the extensibility.

35:23 So you said there's three basic places that we could extend TinyDB, and one of them is writing a custom storage engine.

35:33 So by default, it's just a single, your data is more or less stored in a single JSON file.

35:40 And that's more or less the connection string when you create the database, right?

35:44 You say, here's the file.

35:45 Yeah, you just pass the file name and it uses it.

35:49 Yeah, nice.

35:50 Again, very SQLite-like.

35:53 But then you said you can create alternate storage engines.

35:58 And this got me thinking about some ideas.

35:59 But there are some that you talked about already, right?

36:02 Yeah, for example, there is a storage which uses or just stores the data in memory.

36:10 The first storage I wrote even was for YAML.

36:14 So you have these YAML files later switched to JSON because it was faster, because YAML can get really complex in some cases.

36:23 If you get really fancy, you could even do some kind of HTTP stuff.

36:29 But I don't think that there is an extension for that yet.

36:33 But that would be something really interesting to try.

36:36 Yeah, absolutely.

36:37 And so when I saw this, I was thinking, well, okay, maybe there are certain things you can do to leverage some of the shortcomings that you had talked about when you're like, maybe not so much from multiple process, stuff like that.

36:52 So, for example, MongoDB recently switched from storing binary JSON in a variety of files to something called WiredTiger, which is apparently much faster.

37:03 And they support multiple processes from there through certain types of locking and so on.

37:09 I was thinking, hmm, could you take one of these storage engines and somehow, you know, with some work, plug it into TinyDB?

37:15 Yeah, that should be possible.

37:16 Yeah, that should be possible.

37:17 Writing a custom storage is really easy because you just need to create a class with a constructor, which takes the parameters you need to create the storage.

37:29 And then method for reading and one method for writing the data to the storage.

37:35 So you could do some logging there.

37:38 I think that would be an interesting thing to do, logging on storage level.

37:44 So basically, then it has to wait until the database file itself is locked or unlocked to return the data.

37:53 That would be interesting.

37:55 It would be interesting, right?

37:56 And so, I mean, it definitely seems like you could unlock some potential there as well.

38:01 Okay, cool.

38:02 So custom storage engines.

38:04 And you could even do crazy stuff, right?

38:06 Like you could say, I'd like to use an S3 bucket as my storage location or those types of things, right?

38:12 Whatever you're looking at.

38:13 As long as you can read it and write it, it should be possible to use it as a storage.

38:19 Interesting.

38:20 Okay.

38:20 The other thing you talked about was custom table classes.

38:23 What's that?

38:24 Yeah, custom table classes mainly came into existence to modify the way the database implementation works.

38:32 So it's basically in other cases you would have to over or to use a subclass.

38:39 And so with custom table classes, you basically what you do is have a subclass of TinyDB.

38:47 And then you can really do a lot of modifications.

38:51 For example, the smart cache extension uses a custom table class to provide or to intercept every insert, update, and delete, and so on to add its own logic behind it.

39:04 Okay.

39:05 Could you use something like a custom table class to say, like, this field is required?

39:10 Or this one must match like some kind of constraint, like it must be an email address or something like that?

39:16 And then like not let it insert if you tried to save the wrong one?

39:20 Yeah, that's definitely an interesting idea.

39:22 Yeah.

39:22 That would be a really interesting extension for our TinyDB validation because you can intercept all the in-source and updates.

39:33 And then you could check if it matches and raise an exception if it's not valid.

39:38 Yeah.

39:38 Yeah, that's really nice.

39:40 The other way, I guess, that we haven't really spoken about yet is middleware.

39:44 So what kind of things can you do with regard to extensibility and middleware?

39:49 Yeah.

39:49 Middlewares are the third way to do modifications or to modify the way TinyDB works.

39:55 Basically, it acts as a layer between TinyDB and the storage you pass to it.

40:00 So you can do some interesting things over there.

40:04 For example, in TinyDB, there is a caching middleware, which provides a way so it doesn't have to hit the file system on every read and write.

40:14 Only every couple of reads and writes.

40:17 So you don't have that overhead.

40:20 And yeah, so you can do some modifications there, different ones from a custom storage.

40:27 I think the main way in which middlewares are interesting, because you can use multiple of them at the same time.

40:34 You can use it with any kind of storage behind it and do some modifications over there.

40:40 Yeah.

40:41 For example.

40:41 Yeah.

40:42 Okay.

40:43 Very cool.

40:44 So what's the future of TinyDB?

40:46 Have you got anything you're adding or anything like that?

40:49 I don't really have any big plans for TinyDB.

40:52 I think right now I'm quite happy with how the project works and how the API is looking like.

40:59 There might be some renaming of methods.

41:03 So the intent of the method becomes clearer.

41:07 But apart from that, right now I don't have really any big plans to change how it works.

41:13 Because, I mean, the project itself is quite small.

41:16 So there's not much I would want to add into the TinyDB core itself.

41:23 But of course, there's still room for a lot of extensions to write.

41:28 Yeah, that sounds great.

41:30 Are you looking for people to create extensions or suggest changes or basically open source contributors?

41:36 Yeah, I think writing extensions for TinyDB can lead to some really interesting projects.

41:43 As we already talked about a couple of extensions.

41:46 And apart from that, I'm always glad if people have improved the documentation.

41:51 Because I'm not a native English speaker.

41:55 So there might be a few rough places in the documentation.

41:59 So if people think that the wording might need some improvement, I'm definitely happy to accept that pull request.

42:07 And also, if you write an extension, just open an issue on the project on GitHub.

42:12 And I will add it to the list.

42:14 Okay, excellent.

42:15 Well, I think it's a very cool project.

42:17 I would love to see a nice, robust, embedded document database.

42:21 So thanks for taking us a little closer to that world.

42:25 Yeah.

42:26 Awesome.

42:26 All right, so before I let you go, I have two quick questions for you, as always.

42:30 So first of all, you're welcome to name your own stuff if you want.

42:34 What's your favorite PyPI package?

42:36 I just noticed we now have over 90,000 PyPI packages.

42:40 And so you must have some exposure to a few that are pretty interesting.

42:44 Yeah, I think the package I would recommend is the one we already talked about, which is uJSON.

42:52 Because in many cases, it can improve performance by a lot.

42:56 Without, if you don't need to customize the way the JSON parser of Python works, then using uJSON can be really cool.

43:06 Yeah, that's great.

43:07 And so many people are processing JSON in one way or another these days.

43:11 So it's very broadly applicable.

43:13 A good choice.

43:14 I'll throw in TinyDB for you as well.

43:17 So pip install TinyDB.

43:18 And how about your editor?

43:20 If you're going to write some Python code, what do you open up?

43:22 It depends on the project, really.

43:24 For small projects, I probably use sublime text.

43:27 But if it's more than two or three files, I probably fire up PyCharm and use it.

43:35 Okay.

43:35 Also because, yeah, that's an interesting development in Python with gradual typing.

43:40 It's high-pins.

43:41 Yeah.

43:41 And PyCharm, as far as I know, already supports them to some extent.

43:49 So it's really handy for that as well.

43:52 Yeah, that's really cool.

43:53 If I'm trying to understand some code and I'm like, oh, these three things that are coming in are these types.

43:59 If you tell it, it'll give you a lot more assistance trying to understand something.

44:04 If you're jumping into something you don't totally know the API for, that's cool.

44:07 All right.

44:08 Yeah.

44:08 Awesome.

44:09 PyCharm is a good one.

44:10 So sublime text.

44:11 All right.

44:11 Final call to action.

44:12 People should get out there.

44:14 Write some extensions.

44:15 What do you think?

44:15 Yeah, definitely.

44:17 Write some extensions.

44:18 Write some extensions.

44:19 Yeah.

44:20 And if you're looking at the API, discuss the documentation.

44:21 I'm always happy to hear feedback.

44:33 And also if you find some difficulties or even just difficulties to get how to get started with TinyDB, I think that's worth of an issue on GitHub.

44:45 So other people can improve if you or other people can have an easier way to get started if you run into problems.

44:53 Yeah, just open an issue on GitHub and we can discuss things over there.

44:58 All right.

44:59 That sounds great.

45:00 So, Marcus, thanks so much for being on the show.

45:03 Yeah.

45:04 Thank you for having me.

45:05 Yeah.

45:05 Bye.

45:06 This has been another episode of Talk Python To Me.

45:10 Today's guest has been Marcus Siemens.

45:12 And this episode has been sponsored by Hired.

45:14 Thank you for supporting the show.

45:16 Hired wants to help you find your next big thing.

45:19 Visit Hired.com slash Talk Python To Me to get five or more offers with salary and equity presented right up front and a special listener signing bonus of $2,000.

45:28 Are you or a colleague trying to learn Python?

45:30 Have you tried books and videos that just left you bored by covering topics point by point?

45:35 Well, check out my online course, Python Jumpstart, by building 10 apps at talkpython.fm/course to experience a more engaging way to learn Python.

45:44 And if you're looking for something a little more advanced, try my Write Pythonic code course at talkpython.fm/pythonic.

45:51 You can find the links from this episode at talkpython.fm/episodes slash show slash 80.

45:57 Be sure to subscribe to the show.

46:00 Open your favorite podcatcher and search for Python.

46:02 We should be right at the top.

46:03 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.

46:13 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.

46:17 Corey just recently started selling his tracks on iTunes, so I recommend you check it out at talkpython.fm/music.

46:24 You can browse his tracks he has for sale on iTunes and listen to the full-length version of the theme song.

46:29 This is your host, Michael Kennedy.

46:32 Thanks so much for listening.

46:33 I really appreciate it.

46:34 Smix, let's get out of here.

46:41 I'll see you next time.

46:58 Open.

46:58 you