A Pythonic Database Tour

Episode #105, published Mon, Mar 27, 2017, recorded Thu, Mar 16, 2017

Episode Deep Dive Transcript

There are many reasons it's a great time to be a developer. One of them is because there are so many choices around data access and databases. So this week we take tour with our guest Jim Fulton of some databases you may not have heard of or given a try.

You'll hear about the pure Python database ZODB. There's Zero DB, an end-to-end encrypted database in which the database server knows nothing about the data it is storing, and NewtDb spanning the world of ZODB and JSON friendly Postgres.

Links from the show:

Jim on Twitter: @j1mfulton
ZODB: zodb.org
ZODB Book: zodb.readthedocs.io
ZeroDB: opensource.zerodb.com
NewtDb: newtdb.org
Buildout: docs.buildout.org
Two-tiered Kanban: github.com/feature-flow/twotieredkanban
Jim's Webcast: Why Postgres Should be your Document Database: blog.jetbrains.com/pycharm/2017/03/why-postgres-should-be-your-document-database-webinar-recording

Sponsored items
GetStream Feed API: talkpython.fm/getstream
Our courses: training.talkpython.fm
Podcast's Patreon: patreon.com/mkennedy

Episode Deep Dive

Guest introduction and background

Jim Fulton is a long-time Python developer deeply involved with the Python community. He has worked extensively on Zope, an early and influential open-source web application server, and is the original architect of ZODB (the Zope Object Database). Jim has a background in civil engineering and hydrology, but his curiosity for software drew him into building tools, libraries, and databases that leverage Python’s object-oriented power. More recently, he has been contributing to projects like ZeroDB (an end-to-end encrypted database) and NewtDB, while also sharing insights into agile processes and automation tooling.

What to Know If You’re New to Python

If you're new to Python and want to follow the database discussions in this episode, here are a few helpful points:

Understanding the basics of object-oriented programming in Python will help you see why storing data as objects (instead of rows) can be more intuitive in certain cases.
Familiarity with package management in Python (e.g., pip install) will clarify how to install specialized libraries like ZODB or Buildout.
Knowing how transactions work in general (the idea of committing or rolling back changes) will also help you grasp the concurrency model of these object databases.

Key points and takeaways

ZODB: The Pythonic Object Database ZODB is a pure Python, object-oriented database that eliminates much of the traditional mismatch between Python’s in-memory object model and relational database schemas. Rather than performing SQL queries and manually translating data into objects, ZODB lets you store Python objects directly and load them transparently when attributes are accessed. It supports transactions, multiversion concurrency control, and it caches objects locally to ensure consistency.
- Links and Tools:
  - ZODB
ZeroDB: End-to-End Encrypted Data Storage ZeroDB is an encrypted database system originally built on ZODB. All data is encrypted on the client side, meaning the server (and even a cloud provider) cannot read the data in memory or on disk. This approach keeps private data protected even if someone gains access to the physical server. While the company behind it shifted focus, the concept of combining client-side encryption with an object-oriented storage model remains an important innovation.
- Links and Tools:
  - ZeroDB on GitHub
NewtDB: Bridging ZODB and Postgres NewtDB is designed to bring the best of both worlds: an object database model via ZODB plus the robust querying and JSON capabilities of PostgreSQL. It stores objects in ZODB but also generates JSON documents for Postgres, allowing for powerful SQL-based searches and easier external reporting. This integration solves one of the biggest concerns about ZODB, namely, the difficulty of querying data in a more conventional, relational manner.
- Links and Tools:
  - Newt (GitHub repo)
When (and When Not) to Use ZODB ZODB shines for applications that naturally work with Python objects and do not require heavy traditional querying. It simplifies development by removing the need for an object-relational mapping layer. However, highly complex queries or large-scale analytics might still benefit from relational or specialized NoSQL solutions. Tools like NewtDB help bridge that gap, but understanding your data-access patterns is key.
- Links and Tools:
  - ZODB Documentation
Testing Strategies with ZODB’s Pluggable Storages One of ZODB’s standout features is its flexible “storage” backends. You can test your application with a lightweight in-memory storage or a “demo storage” that layers temporary changes over a base state. This approach lets you reuse a known dataset and then discard changes at the end of each test, simplifying integration and staging environments.
- Links and Tools:
  - ZODB Storage Adapters (ZEO, demo storage, etc.)
How Transactions and Conflict Resolution Work ZODB (and many modern databases) use an optimistic concurrency model. Multiple clients or threads can read the same data, but when they commit, any conflicting writes trigger a “conflict error” requiring a retry. While this approach can seem surprising at first, it often eliminates the overhead and complexity of locking, provided applications handle conflict errors gracefully.
- Tools / Terminology:
  - Multiversion Concurrency Control (MVCC)
- Conflict Errors
Buildout: Environment Automation and Deployment Buildout is a tool Jim worked on to automate building and deploying Python applications. It goes beyond pip install by allowing you to define everything your app needs in a single configuration, including which Python packages to install, how to generate config files, and even how to run extra processes. This is particularly helpful for reproducible builds and consistent deployments across environments.
- Links and Tools:
  - Buildout GitHub
Two-Tier Kanban for Project Management Jim described a Kanban approach where high-level features (the “top tier”) explode into detailed development tasks (the “lower tier”) only when they hit the development column. This helps teams focus on delivering business value, not just churning out discrete tasks, and makes it clearer how individual tasks roll up into a feature that end-users can actually see and benefit from.
- Concepts:
  - Feature-based Kanban boards
- Exploding tasks at development time
Balancing Object Databases with Search and Analytics While ZODB is efficient for object traversal and day-to-day CRUD operations, advanced searching or analytics may call for external tools. In this conversation, Jim explained how layering solutions like NewtDB or adopting specialized indexes can help. The question of transactions versus scalability, or simple object access versus heavy queries, often boils down to picking the right tool (or combination of tools).
- Links:
  - Newt (GitHub)
- PostgreSQL JSON features
Developer Tools: Editors, IDEs, and More Jim primarily uses Emacs but expressed admiration for PyCharm’s extensive feature set (like built-in database browsing, REST clients, and more). Choosing the right environment can significantly speed up development, testing, and deployment, especially in projects spanning multiple services or microservices.

Tools:
- PyCharm IDE
Emacs (text-based editor)

Interesting quotes and stories

"ZODB is an object-oriented database for Python... If you want to avoid that painful impedance mismatch, you just store your objects directly." -- Jim Fulton

"When you talk about ZeroDB, you're literally talking about a database that doesn't know anything about the data it's storing. It’s all encrypted on the client." -- Michael Kennedy

Key definitions and terms

ZODB (Zope Object Database): A Pythonic database where objects are stored and retrieved without needing SQL or an ORM.
ZeroDB: A variation on ZODB that employs end-to-end encryption, so data is encrypted client-side.
NewtDB: A hybrid approach that stores objects in ZODB while syncing JSON representations to PostgreSQL for searching.
Multiversion Concurrency Control (MVCC): A concurrency model where each transaction operates on a snapshot of data, allowing non-blocking reads.
Buildout: A tool to automate deployment and environment setup of Python-based projects.
Two-Tier Kanban: A project management approach combining high-level feature boards with lower-level task boards.

Learning resources

If you’d like to deepen your Python skills to explore databases like ZODB or tools like Buildout, check out these courses from Talk Python Training:

Python for Absolute Beginners: Perfect if you’re new to Python’s syntax and core ideas.
MongoDB Quickstart with Python: While not about ZODB, this free course demonstrates how Python interacts with document databases, showing parallels and contrasts.
Building Data-Driven Web Apps with Flask and SQLAlchemy: Understand how relational databases compare to object databases in a web setting.

Overall takeaway

This episode highlights the diversity of Python’s data storage ecosystem, from purely object-oriented databases like ZODB to end-to-end encrypted solutions such as ZeroDB, and hybrid approaches via NewtDB. Jim Fulton’s unique perspective shows how careful design, transactions, and the right tooling can simplify or even eliminate many of the complexities we take for granted with traditional database development. Whether you’re seeking to store complex object graphs or needing specialized searching and analytics, Python’s adaptability, and the work of developers like Jim, continues to open new possibilities.

Episode Transcript

Collapse transcript

WebVTT format On GitHub

00:00 There are many reasons it's a great time to be a developer.

00:02 One of them is because there are so many choices around data access and databases.

00:06 So this week, we take a tour with our guest, Jim Fulton, of some of the databases you may not have heard of or haven't given a try yet.

00:15 You'll hear about the pure Python database, Zodb.

00:18 There's Zerodb, an end-to-end encrypted database in which the database knows nothing about the data it's even storing.

00:25 And NewtDB, spanning the world of Zodb and JSON-friendly Postgres.

00:30 This is Talk by Thunderby, episode 105, recorded Thursday, March 16, 2017.

00:37 Developers, developers, developers, developers.

00:40 I'm a developer in many senses of the word because I make these applications, but I also use these verbs to make this music.

00:48 I construct it line by line, just like when I'm coding another software design.

00:52 In both cases, it's about design patterns.

00:55 Anyone can get the job done.

00:57 It's the execution that matters.

00:58 I have many interests.

01:00 Sometimes...

01:00 Welcome to Talk Python To Me, a weekly podcast on Python, the language, the libraries, the ecosystem, and the personalities.

01:07 This is your host, Michael Kennedy.

01:09 Follow me on Twitter, where I'm @mkennedy.

01:11 Keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via at Talk Python.

01:19 This episode is brought to you interruption-free by GetStream.

01:23 That's right, GetStream, a new sponsor of the show, has a really cool offer for you guys.

01:27 If you're building an application that has some form of activity stream like you might see in Slack or Facebook or Instagram and others,

01:33 then you owe it to yourself to have a look at GetStream.

01:36 They provide scalable, reliable, and personalizable hosted API feeds as a service.

01:42 The feed is the most intensive component of these types of applications,

01:46 yet there's no need for you to reinvent the underlying feed technology when GetStream has the infrastructure and a Python API already in place.

01:54 Go from zero to scalable feed in hours, not weeks or months.

01:58 They even use advanced machine learning to serve up personalized results to each and every user.

02:03 Stream powers the feeds for over 500 companies, including Makerspace and Fabric,

02:09 with a total of 70 million end users.

02:11 Try the API yourself in a short five-minute interactive tutorial at talkpython.fm/stream.

02:18 Be sure to create an account and try it for yourself.

02:21 It helps support the show.

02:22 Jim, welcome to Talk Python.

02:25 Thank you. It's nice to be here.

02:26 It's great to have you here.

02:27 We have a whole bunch of really cool topics generally around data, but not all data, right?

02:34 So we're going to talk about Zodb, something called ZeroDB, which is something I'd never heard of and really interesting, actually.

02:41 NewtDB.

02:42 And then a little bit more process with some agile concepts and continuous integration and so on.

02:48 But of course, before we get to all those, let's start at the beginning.

02:51 What's your story?

02:52 How did you get into programming?

02:53 I was exposed to programming fairly young, although back then it wasn't very common or very accessible.

02:58 I'd say I really got hooked in grad school when I was doing research on rainfall runoff model calibration.

03:06 And I had to hack some alternate statistical techniques, calibration techniques, into a rainfall runoff model.

03:13 And I found that I enjoyed that quite a bit.

03:15 That became, for years, I was a civil engineer slash hydrologist.

03:20 And the software aspect of it kept pulling me and pulling me until it finally extracted me.

03:27 I think that's really interesting.

03:30 A lot of people get into programming that way and somewhat grudgingly like,

03:35 okay, I have to learn this programming thing to make whatever it is I'm doing actually work, right?

03:39 But I sort of went down that path myself to some degree.

03:42 And after a few years, I realized, actually, what am I doing this other stuff for?

03:47 This programming stuff is really great.

03:48 I'm just going to go do more of that.

03:49 And it's funny how life is sort of serendipitous like that.

03:54 But it's also good, right?

03:56 So was that original bit of work?

03:57 Was that in Python or was that in something else?

03:59 Oh, no.

04:00 That was in Fortran.

04:00 Oh, yeah.

04:02 Fortran.

04:02 I mean, I went through a lot of languages over my career.

04:07 That work was in 1981.

04:11 Okay.

04:12 So probably not Python.

04:14 Yeah, definitely not Python.

04:16 Because it was 10 years before it released.

04:17 But, yeah, I've used a lot of different languages.

04:20 I used Fortran for a long time.

04:22 I used PL1 for a little while.

04:24 I used ADA for a little while.

04:27 I really like OO languages.

04:29 I used – I couldn't afford Smalltalk for a long time.

04:32 So I used a language called Actor for a while.

04:34 And then much later, I did an interesting application with GNU Smalltalk, which was an adventure in and of itself because it was fledgling and the garbage collector was broken.

04:44 So I had to use a special branch with a non-broken garbage collector.

04:47 So anyway, I've had lots of fun with different languages over the years.

04:50 Yeah, that sounds like you've really been through a lot of them.

04:53 So are you doing mostly Python these days?

04:56 Yeah.

04:57 Although I did – a couple years ago at Zoap Corporation, we did a bunch of Android development.

05:02 I got to use Scala, which I really enjoyed.

05:05 I like to describe Scala as a beautiful evil language because it just invites so much abuse.

05:12 But it allows you to produce beautiful code within the JVM and just insane, mind-blowing notions of type-based development where you – people do interesting development tasks in the compiler.

05:26 Wow.

05:27 That sounds interesting and evil.

05:30 It was a lot of fun.

05:31 I haven't done any of that.

05:32 I did some Rust lately, which kind of reminded me of a lightweight version of that this last year.

05:37 Sure, sure.

05:38 Okay.

05:38 Yeah, I've been wanting to learn Rust, but I haven't really gotten into it.

05:41 I did look at Go recently this year, but I don't know.

05:44 I'm just not sold on Go.

05:45 I still like Python a lot better.

05:47 We'll see about that.

05:48 I'm actually very anti-Go.

05:50 Yeah.

05:50 I think it's bad on multiple levels, but I like Rust quite a bit.

05:54 Okay.

05:54 Well, that's interesting.

05:55 Maybe I'll be learning Rust eventually.

05:56 But what do you do day-to-day these days?

05:59 You're not still at the Zope Corporation doing Android development, right?

06:01 Nope.

06:02 Nope.

06:02 At Zope, I did a ton of different things.

06:05 But towards the end, we were doing some Android development, among other things.

06:09 But these days, I'm splitting my time between paid work to sort of keep the lights on and

06:15 open source work.

06:16 I got an opportunity to work with a company called ZeroDB about a year ago.

06:21 That and also my sons are grown and they've moved away.

06:25 And so we were sort of downsizing.

06:27 So that was an opportunity to sort of have enough money and reduce my run rate and focus

06:33 on some open source stuff for a while.

06:35 And so that's really what I'm doing right now is paying attention to some open source

06:40 projects that have been neglected for a while, as well as exploring some new ideas.

06:44 That's really, really great.

06:46 And it must feel really good.

06:49 It must just be great to just stop, look at these projects that are pretty mature and say,

06:54 okay, I'm going to work on these things.

06:55 And I don't have to go to meetings.

06:57 I don't have to hit some silly deadline that's not realistic or work on some feature that I

07:04 think adds no value, right?

07:05 Just be able to focus on what you want, right?

07:06 Yep.

07:06 Yeah.

07:07 Excellent.

07:08 So we'll be touching on some of these projects, I'm sure.

07:11 So let's start with one of the older projects, I guess.

07:14 It's been around since 1996 with Zodb.

07:16 What is Zodb?

07:17 So Zodb is an object-oriented database for Python.

07:21 And when I say object-oriented, I contrast that with object-based because lots of people

07:26 refer to databases that are object-based that I don't really consider object-oriented.

07:32 The original goal of object-oriented databases, which were a pretty exciting thing back in

07:37 the, I don't know, late 80s, maybe early 90s, was to try to reduce or eliminate the impedance

07:44 mismatch between programming languages and databases.

07:47 So in databases, you know, you have a very different computational model than you do in a programming

07:53 language, especially some of the, you know, especially object-oriented programming languages.

07:57 Yeah.

07:58 You have hierarchies of object graphs in object-oriented languages, and you have highly

08:04 normalized data that work to minimize duplication and let you approach the data from any angle

08:12 you want.

08:12 But there's always this.

08:14 I pull it into Python and build it into an object graph, and then I tear it back into part

08:19 into all the other tables and put it back again, right?

08:22 So these object databases, they try to just say, let's keep them in the same shape, something

08:27 like that?

08:28 Well, again, I don't think there are many object-oriented, I don't know, I'm not sure I know of any object-oriented

08:34 databases today other than ZDV.

08:36 I mean, I'm sure there are some.

08:37 My sense is that a lot of the object-based languages let you get objects, but they don't necessarily

08:43 avoid having to do queries and doing assembly.

08:46 Like, for example, some databases referred to as object-based seem to be more like graph-based,

08:50 where you have the ability to query graphs, but it's still somewhat of a foreign object.

08:58 I see.

08:59 Like, when you use ZODB, there are some exceptions, but you have to subclass a special

09:03 base class, and you have to identify transaction boundaries, which that latter aspect is usually

09:09 automated, depending on your situation.

09:12 But beyond that, it's literally just as if you were working with objects in memory.

09:17 You don't really query a database.

09:19 You know, the way you query a database is the way you query something in Python.

09:23 You maybe look up a key in a mapping, or maybe you access an object's attribute.

09:28 Accessing an object's attribute might cause data to be loaded from the database, but that's

09:33 transparent to you.

09:34 Okay.

09:34 How interesting.

09:35 So does it use, like, interesting descriptors or something like that for attributes to do

09:40 that?

09:41 That's where the base class comes in.

09:42 So the base class, I can't remember.

09:44 There must be a meta class.

09:45 It's been so long since I've implemented it that I don't remember if there's a meta class

09:50 lurking.

09:50 I wouldn't be surprised if there was.

09:53 But basically, yeah, the base class does a couple of things.

09:56 I've actually had a project that I've wanted to do for some time, which is to get rid of

10:00 the base class.

10:01 I have some hacks in mind involving weak reference data structures.

10:04 But basically, the base class, it watches attribute accesses.

10:09 And so when you modify an attribute, it marks the object as dirty.

10:13 And when you access an attribute, if the object is something we call a ghost.

10:18 So in Zodb, when you first load an object from the database, typically by referencing it from

10:22 some other object, it's loaded as a ghost.

10:24 And then when you actually access an attribute, which includes any method, then the ghost is

10:30 activated and its state is loaded into memory.

10:33 I see.

10:33 There's an in-memory object cache that is effectively an incomplete database replica.

10:40 At transaction boundaries, any changes that have been made in the database by other clients

10:45 are then cause any objects that were affected that are in your cache to be invalidated.

10:50 And so then the next time you access them, they're loaded automatically.

10:53 So the data in memory is always consistent with the committed database as of some point in time.

10:59 Okay. Yeah. So you have transaction support and all those sorts of things as well.

11:03 That's pretty interesting.

11:04 So I can get an object from the database and pass it around.

11:07 And maybe it was passed off to some other module.

11:10 But eventually, that reference will be updated because someone committed a transaction.

11:15 Right.

11:15 The only sort of caveat there is that the data...

11:19 So when you access the database, you open a connection.

11:21 And on that connection is a root object.

11:23 And then all other accesses you make are from that root object, possibly through many steps.

11:30 And then there's an object cache associated with that connection.

11:32 And so that connection, its cache, can only be accessed by one thread at a time.

11:36 So you couldn't hand it off to a different process.

11:38 And you couldn't hand it off to another thread and have both threads operating on it.

11:42 But you can have multiple threads with their own database connections.

11:45 And they're essentially coordinating their activities via transaction commit.

11:50 Very much in the way that software transactional memory either was going to...

11:55 I'm not sure this is the current status, but I should have looked it up.

11:57 Much in the way that software transactional memory is supposed to do that for PyPy.

12:02 Interesting. Okay.

12:03 So I don't think I've talked about software transactional memory previously.

12:07 Maybe you could just give us the quick elevator pitch for what that is.

12:11 Well, it's like ZODB, but not persistent.

12:13 I see.

12:15 You know, there are lots of different sort of models for managing concurrency.

12:19 And so some of the traditional models like locking are very expensive.

12:24 And what a lot of systems have moved towards is something called the actor model, where you

12:31 have different independent actors and message queues.

12:33 And that's a model that works really well.

12:35 But of course, it's fairly invasive.

12:37 You have to architect your model or your application around that.

12:41 I think what the PyPy people were wanting to do was to get rid of the GIL and trying to find some way to get rid of the GIL without being crushed by all the locking overhead of managing concurrency.

12:52 So with transactions, you basically have multiple copies of the object space, possibly with shared and copy and write, etc.

13:00 I'm not really super familiar with either their implementation or their status.

13:04 But the idea is that you have basically different copies of memory.

13:09 Those copies get synchronized when you reach a transaction boundary.

13:14 And that means that at the transaction boundary, that's when you sync everything up.

13:18 And everything else is completely independent.

13:20 So you don't need any locks because you've only got one logical thread of control accessing the data.

13:25 Right.

13:26 And so that sounds really cool.

13:28 Like, basically, it's a very optimistic view of the world, right?

13:30 We're going to grab all this stuff in memory and we're going to try to make a bunch of changes.

13:33 And it's probably fine.

13:34 But if it's not fine, then we're actually going to have to retry that function or whatever that was working at it.

13:40 Right. So there's this instead of taking locks, it'll basically restart parts of your code, which is really quite a different way of thinking about solving this problem, isn't it?

13:49 It's how most modern databases work now.

13:52 I know Postgres uses multiversion concurrency control, which is basically the same idea.

13:57 I think Oracle does as well.

13:58 But yeah.

13:59 And so you sort of have to come to terms with what we call conflict errors.

14:03 Yeah.

14:04 You have to instead of being blocked, you have to deal with here's how I resolve it when something went wrong.

14:09 I mean, it's fine if it's all within one database or within your memory.

14:13 But if I've called two web services and written a file and then it says, no, no, roll back.

14:19 Right.

14:19 Well, now what, right?

14:20 Yep.

14:21 Already charged your credit card.

14:22 Roll back.

14:22 What are you talking about?

14:23 Yep.

14:23 So it's just a, yeah, it's interesting.

14:26 So would you call Zodb a NoSQL database?

14:30 I mean, was it NoSQL before NoSQL is a thing?

14:33 Well, one of the stories I like to tell about how I learned Python was I was at USGS and we were using this system called RANDRDB, which was based on an earlier system.

14:45 But basically it was based on managing relational data as flat files.

14:49 And since I stopped using that project, it sort of evolved and it called itself NoSQL because it didn't use SQL.

14:56 Right.

14:58 Really, NoSQL is a terrible name.

15:00 To me, in my mind, modern NoSQL databases have nothing to do with not having SQL.

15:05 Yes, I agree.

15:06 Some of them do have SQL.

15:07 You know, really the, the, a better characterization of the, most of the NoSQL databases that I'm familiar with is that they're no transaction.

15:16 And they're no transaction because transactions at some point do limit scalability.

15:23 Although there's, you know, continuing to be work to make databases like Oracle and Postgres scalable even with transactions.

15:30 But the NoSQL databases have much weaker notions of consistency and really are optimized to allow very fast writes.

15:38 And I, the sort of problem domain that I think they're really well suited to is collecting massive amounts of data that you collect and, and analyze, but never really have to update and aren't really part of any business processes.

15:50 Some kind of analytics or something.

15:51 Right.

15:52 And so, so in that sense, ZADB is not a NoSQL database, but it doesn't use SQL.

15:56 Although with Newt, you can, you can now start to leverage SQL in ZADB.

16:02 Yeah.

16:03 We'll talk about Newt as well.

16:04 Yeah.

16:04 I saw a really great quote by somebody who was trying to talk about NoSQL and said something like, my toaster doesn't use SQL.

16:10 Is it NoSQL?

16:11 No.

16:12 I'm a better definition.

16:14 And I really feel like, I feel like, like my definition and your definition are probably quite similar from what you said.

16:20 I feel like NoSQL databases are the ones that give up some relational features in order to be more scalable, possibly more horizontally scalable, things like that.

16:31 Right.

16:32 Like a lot of them give up joins.

16:33 A lot of them give up transactions, but not all of them.

16:37 Right.

16:37 They, they give up different things here and there for different things to optimizing.

16:40 I'm not aware of many, there was a foundation DB, which is no longer a thing that had transactions, but some databases will talk about atomicity, but their notion of atomicity is kind of laughable because, well, we update a single record atomically.

16:57 Yes, exactly.

16:58 But that's not really, I'm sure there's some NoSQL databases out there that are transactional, but if there are, they're probably not scalable the way that some of the other ones are.

17:08 I mean, I think that's, I, to me, in my mind, transactions are the big trade-off.

17:12 And I think it's a trade-off that most people don't really understand.

17:16 Yeah, I think you're probably right.

17:17 If I, if I think of them, the one, the thing they give up first is probably transactions.

17:20 The thing they give up second is probably joins.

17:22 Right.

17:23 I mean, MongoDB does have the isolated operator, which does let you work on multiple documents, but it's, it's not quite the same as, as this just global isolation level serializable that you get in a lot of relational databases.

17:36 Well, and in fact, the giving up joins is really closely related to that because giving up joins means that there are more problems for which only being able to do one operation at a time atomically makes sense.

17:48 Yep.

17:48 Absolutely.

17:49 Nice.

17:49 Okay.

17:50 So if I want to use ZeoDB, how do I get started?

17:52 Can I just pip install it?

17:53 Yep.

17:53 Okay.

17:54 What's it written at?

17:55 Is it written in Python?

17:55 Yep.

17:56 It has some C extensions.

17:57 It has some C extensions, but it also works with PyPy.

18:00 There's all of the C extensions have Python versions.

18:03 Right.

18:03 So if you run it with, if you run it with PyPy, then it'll, it'll use the Python versions.

18:08 And zeoDB.org has some pretty decent documentation.

18:11 I was, I was noticing yesterday that I, some topics that I need to add, but getting started is pretty easy.

18:17 You can run it with an in-memory database if you want, just while you're playing around.

18:21 Okay.

18:21 Yeah.

18:22 That's really nice.

18:22 Nice for testing as well, right?

18:24 Well, it's testing stories is especially strong.

18:27 ZeoDB has a, what I call a pluggable storage architecture.

18:31 So the, there's a defined API or set of APIs that storages can provide.

18:36 And then there are a bunch of different storage implementations ranging from an in-memory implementation to a, a file-based implementation to a client server implementation to, or to an implementation that sits on top of, well, there are a couple of client server implementations actually.

18:53 And then there's a implementation that sits on top of a relational database.

18:56 And then there are also, we sort of follow a pattern of, of layering those with adapters.

19:01 And so one of the interesting adapters for testing is something called the demo storage.

19:06 And with a demo storage, you have a demo storage, wraps two storages, a base storages, a storage and a changing storage.

19:14 Okay.

19:14 And so in testing, what you'll typically do is you'll have for a suite or a set of tests, you'll, you might set up a base database.

19:22 And then each test will use a demo storage on top of that.

19:26 And then, you know, whatever changes are made are made in the changes.

19:29 And then the demo storage is discarded.

19:31 And then next test creates a new one.

19:33 Oh, that's a really cool feature because one of the super painful things of testing is, well, how do I load up the test data?

19:39 How much is enough to be representative, et cetera, et cetera.

19:42 So you can put like, a snapshot on top of the data in a sense, right?

19:46 Basically.

19:46 And you can layer that as many levels deep as you want.

19:49 In fact, we've had, I've written Selenium tests where basically there were sort of push and pop operations on your database.

19:55 So you make some changes and then push another demo storage on top of that.

20:00 And then for, for staging, what, what, what we've often done was to, one of the, one of the layers, one of the layers you can add is something called a before storage.

20:08 And what it does is it wraps a, a, a, a writable storage, like our client server storage.

20:12 But it, it, it says, okay, only show me the data as of this point in time.

20:17 And then that becomes the base for a demo storage.

20:20 And then you have a file storage as your changes.

20:22 And now you can stage a large production database, make substantial changes to it, but it's all in, in, in this sort of layered snapshot.

20:30 So, which you can then discard after staging and it doesn't affect any of the actual production data.

20:37 Yeah, that's really cool.

20:38 Okay. So that sounds like the storage system is, is really robust there.

20:43 And of course that's going to play into zero DB when we get into it.

20:48 But let me ask you two quick questions on zero DB before we move on.

20:52 When is it a good idea to use zero DB?

20:54 Like what's the ideal use case for this?

20:57 I think a really good use time to use it is when you don't want to spend a lot of time writing software.

21:01 Yeah, sure.

21:03 So it makes writing software a lot easier in a lot of ways because you're, you're, again,

21:08 you've, you don't have that database impedance mismatch.

21:11 Does it store things in the basically pickled form or something to that effect?

21:16 Right.

21:16 Okay.

21:17 So you could just say, these are the things I want in the database, put them in the database

21:21 and they're the database, right?

21:22 As long as, as long as they're pickleable.

21:23 And we could or could not have a discussion about pickle.

21:26 Pickle, pickle, pickle has a bit of a bad reputation.

21:30 That's a little bit fudtastic.

21:32 Yeah, sure.

21:34 But, but anyway, yeah, it basically uses pickle.

21:36 So you can store anything that's pickleable.

21:38 All right.

21:39 Nice.

21:40 And you talked about the good testing story as well.

21:42 When should we not use Zodb?

21:44 You shouldn't use Zodb.

21:46 And I think this is changing actually, but.

21:49 Maybe it's being changed by things like NewtDB, right?

21:52 Well, it's being changed by, for example, when, when you, I think Newt, I think Newt can help

21:57 quite a bit.

21:58 Yeah.

21:58 Because Zodb can sit on top of say Postgres or Oracle, it can scale more or less as far

22:06 as they can scale.

22:06 Right.

22:07 Okay.

22:08 Traditionally, Zodb has managed its own, provided its own search facilities on the client side.

22:13 And so you, when you do that aggressively, you end up with lots of extra objects in the

22:19 database to support indexing.

22:21 So, so there are Zodb based implementations of B-trees.

22:24 And then on top of that, various sorts of indexes like inverted indexes and regular B-trees

22:29 indexes and things that are sort of like Postgres GIN indexes a little bit.

22:35 But that tends to bloat the database quite a bit and cause lots of extra rights and lots

22:39 of opportunities for conflicts.

22:41 So I've in the past sort of said, well, don't necessarily use, if your application is very

22:46 search intensive, then maybe you don't want to use Zodb.

22:50 If your application is sort of object intensive and, you know, you're, you're primarily working

22:55 on application objects and traversing application objects, then, then it's a much better fit.

23:00 But I think especially with Newt, by pushing the search back into the relational database,

23:06 it can greatly reduce some of the challenges.

23:10 And plus you just have a much more powerful search engine.

23:12 Sure.

23:13 So let's talk about NewtDB a little bit, then we'll come back to Zodb.

23:17 So Newt is kind of a marriage between Zodb, which we talked about a lot of the features

23:23 there.

23:23 And it's one of its shortcomings, I guess you could say, is it is really hard to query.

23:28 You talked about how your app is search intensive, then maybe you don't want to use it because

23:34 this is not really normalized.

23:36 It's not flat text and integers and stuff in columns, but it's object graphs as binary

23:42 stuff.

23:42 So doing that is challenging, but something like Postgres is really good at storing that

23:49 data and querying it.

23:50 So NewtDB, you called this the amphibious database, which I think is really interesting.

23:55 Right.

23:56 What is it?

23:56 Okay.

23:56 Well, I'd like to argue with your previous assertion, but let's, let's come back to that.

24:00 Which one is that?

24:02 Well, so in terms of searching, it's the issue isn't so much that the search capabilities

24:07 of Zodb catalogs, which is sort of the common pattern for this, are not really that different

24:13 from a lot of the NoSQL database search mechanisms.

24:17 In fact, a lot of the NoSQL mechanisms, even something like SQLAlchemy, to a fault, I think,

24:22 tries to express searches as data.

24:25 And so the catalog is often quite good at that.

24:29 In fact, if you're indexing data fits in memory, the search, the searching in Zodb, I mean, I've

24:36 seen it actually smoke Postgres.

24:39 Wow.

24:40 Okay.

24:40 But for larger databases where it's not all in memory, then, you know, Postgres ends up

24:45 being a win.

24:46 But it's not so much that it's hard to search, other than that you can't use SQL.

24:50 But I think most humans can't use SQL anyway.

24:52 Right.

24:53 But anyway, so the ease of search is debatable, but I think it's reasonable to expect that on

25:00 average, Postgres is going to do a lot better.

25:02 And so the reason I called Newt the amphibious database was that it sort of gives you two views

25:09 on your data.

25:09 It gives you a very Python-centric, object-oriented view on your data via Zodb.

25:13 One of the problems that traditional object-oriented databases have had in terms of what they've

25:20 been criticized for is that they're kind of closed.

25:22 They're limited to a single language.

25:23 And they may even depend very heavily on the classes.

25:27 I mean, that's the whole point is, you know, in Zodb, when you're storing objects, they're

25:31 objects that have specific classes.

25:32 And traditionally in Zodb, if you wanted to access the data, you had to have the class around.

25:37 So it's a little bit more of a restrictive environment.

25:40 Right.

25:41 So if I want to call it from like JavaScript, it's not going to be fun.

25:44 Right.

25:44 So what the idea is, is in Newt, you've got your regular OO Python view of your data.

25:51 And then you also have a Postgres view of your data.

25:53 And in Postgres, you can see your data as JSON.

25:56 You can access it from anything that can access JSON in Postgres.

26:00 So you could conceivably write reporting applications that reported against it.

26:04 You can index it and you can search it using PostgresQL.

26:08 Okay.

26:09 Interesting.

26:09 So basically it stores two copies of any given record and it keeps them in sync.

26:13 One pickled version and one JSON version.

26:16 And then you leverage the JSON capabilities of Postgres to work with that thing from other

26:21 languages.

26:22 Okay.

26:23 There's also sort of lurking around there some interesting patterns about synchronizing

26:28 your data.

26:28 So Newt has sort of two modes you can use it in.

26:32 It has the sort of default mode where it writes the JSON data as it's committing the transaction.

26:37 But there's also sort of asynchronous mode where you can run a separate updater process that

26:43 watches the database and generates the JSON asynchronously.

26:47 And one of the things I think that's interesting about that is that you can generalize that.

26:52 So you could, for example, eventually, instead of updating JSON in Postgres, you could update

26:59 an Elasticsearch database.

27:01 Or you could even conceivably asynchronously update a relational representation of the data.

27:06 Right.

27:06 Exactly.

27:07 Right now you're just taking it and turning it into JSON because it's such a close fit.

27:12 But theoretically, you could have a SQLAlchemy type representation as well.

27:18 Something to that effect, right?

27:20 So what flexibility does the Postgres add besides just other clients or other technologies?

27:27 Is there better searching?

27:28 Can I work with more data?

27:31 What's the story?

27:32 Postgres has a large community behind it.

27:34 So there are lots of people working on scaling Postgres in various ways.

27:38 So yes, I'm sure you can work with more data than you can with, say, the built-in client server

27:45 storage in Zodb.

27:47 Although there is a project called Neo where they're doing some interesting things in terms

27:52 of scaling database without Postgres.

27:56 But also, Postgres has this interesting model.

27:59 I don't know if Oracle does this, but in Postgres, when you create an index, you can index expressions

28:07 rather than indexing columns.

28:08 Okay.

28:09 And that gives you a lot of power.

28:11 So for example, if you're building a text index, you can have, instead of saying, I want to

28:16 index this column, you can say, well, here's a Postgres function, which could be written

28:20 in Postgres's stored procedure language, or it could conceivably be written in Python.

28:27 But here's a function that will extract the text from this data record.

28:32 And this function that's extracting the text from the data record could actually make queries

28:37 and get text from related data records.

28:40 We're actually using that in the project.

28:42 And then what happens is then you say, okay, now I want to build an index on this function.

28:48 And what happens is that at index time, it goes through the data and calls that function,

28:54 gets the result of that function, and builds the index based on that.

28:57 And so the function could be doing pretty interesting, possibly expensive things,

29:03 and none of that has to happen at search time.

29:06 It can all happen at index time.

29:08 Right.

29:09 Okay.

29:10 That's really interesting.

29:10 So basically, your inserts might get a little slower and your updates might get a little slower,

29:15 but it could be really worth it if it dramatically improves your query speed.

29:19 Something that came to mind, I was thinking, it's like, well, if you have, say, an email address,

29:23 you could index just the domain part of the email address.

29:27 I want to find everybody in this company, which has this google.com or whatever in there,

29:33 their email address, right?

29:34 Would something like that?

29:35 Absolutely.

29:36 Okay.

29:37 Absolutely.

29:37 Well, for example, I think most people index their, in Postgres, for example,

29:42 when you have a text column, and it's not a free text.

29:45 It's like a person's name or a city name or something like that.

29:49 I think most people tend to index those incorrectly because they index it based on,

29:56 just by creating an index on the column.

29:59 And A, there's a certain way that you build those indexes so that they're usable in like queries.

30:06 But also, if you want it to be able to search a case insensitively, what you really need to do is you need to index calling lower on it.

30:14 Exactly.

30:15 Yeah.

30:16 Yeah.

30:17 I find that lowercase, lowercase or case insensitivity in a lot of databases can be really challenging

30:24 if you want to index the thing that has to be case insensitive, right?

30:27 You've got to maybe even change your schema a little bit, like store the original and a lowercase version and put the index on the lowercase version

30:34 or something funky like that, right?

30:36 But you don't have to do that.

30:37 See, that's the beauty of this feature of Postgres.

30:40 And I have to be careful to say this feature of Postgres because I don't know that it's not

30:44 in other databases.

30:45 But this pattern of indexing expressions is wildly powerful.

30:50 And it's one of these things that people should zen up on because once you start thinking about

30:57 it that way, then lots of doors open up.

31:00 Yeah.

31:01 It sounds really powerful to me.

31:02 And I can certainly think of some places I would have used it had I had it available,

31:07 but I don't.

31:08 Another interesting example is that in a lot of applications that I work with, the data are

31:13 hierarchical.

31:14 Think of a content management system where the content is arranged hierarchically, possibly

31:19 by organization.

31:20 And there's often interesting security policies about what you can access based both on who

31:26 you are and where you are in the tree.

31:27 And so you can – and the most common – the sort of most common case is to ask, can you view

31:35 this document?

31:36 And so you can write a function that says, okay, for any particular piece of content, which

31:42 principles can view this document?

31:44 And then – and you can write a function that returns an array of principles that indexes that

31:49 document and then create something called the GIN index, which is basically an inverted index

31:54 that allows you to say, okay, here's a set of principles.

31:57 Can any of them view this document?

31:59 Where the set of principles may be the user and the groups that they're in.

32:02 Yeah.

32:03 And you basically can say, okay, can this set of principles access this particular record?

32:08 And that can be an index query, even though in order to make that decision, at some point

32:13 you have to walk the tree to find all the security assertions.

32:17 Excuse me, security assertions.

32:19 Yeah, yeah.

32:20 You can have sort of inherited security stuff that flows down the tree and use your little

32:25 function to build the index without actually putting on every single level.

32:29 Okay.

32:30 That sounds awesome.

32:31 All right.

32:32 So NewtDB is definitely an interesting project.

32:36 What's – how does its, like, ideal use case vary from, say, ZODB?

32:42 Well, it addresses two of the major objections to ZODB.

32:46 I would say the major objections to ZODB would be it's transactional, which I believe limits

32:53 scalability at some point, although, again, that limit is getting higher and higher all

32:56 the time.

32:57 But the other – and I actually think that's a limitation that most people should ignore.

33:01 But then I'd say the two biggest objections are the searchability and the overhead associated

33:08 with trying to support that and the complexity associated with trying to support that and access

33:13 from outside.

33:13 So people with ZODB databases, there's a temptation to feel like their data is imprisoned, especially

33:19 if you're not very familiar with the technology.

33:22 So Newt basically gives you – sort of makes the datable accessible without Python, without

33:28 any special skills.

33:29 It's just sort of sitting there in JSON.

33:31 You can search it using a much more powerful search mechanism.

33:34 Now, you still – you know, there's no free lunch.

33:37 So you can search it using clever tricks like indexing functions against the JSON, but you

33:43 have to learn how to do that.

33:44 And you have to understand how to use Postgres as explained so you can see how the query optimizer

33:52 is analyzing the query.

33:55 Sure.

33:55 That's a good thing to do anyway if you're working with data.

33:57 Know how to ask, are you using an index?

34:01 Which index are you using?

34:02 And so on.

34:02 Right.

34:03 But still, interesting.

34:05 Okay.

34:05 Can you update the JSON and have it update – those updates flow to the ZODB Python side?

34:11 Or is it read-only on the JSON and read-write on the ZODB side?

34:16 The latter.

34:16 Okay.

34:17 Yep.

34:17 The JSON is a read-only representation.

34:19 Gotcha.

34:19 Okay.

34:21 That seems pretty reasonable.

34:22 All right.

34:23 Very, very nice.

34:24 So let's come back and talk about ZODB.

34:27 So the ZODB stuff that you've been doing kind of led you to work with ZODB.

34:32 And they actually were the catalyst for a really cool move for you.

34:36 But let's start with just what is ZODB?

34:39 Well, so ZODB was about trying to have your data be encrypted at rest.

34:44 So the only client – so with ZODB, the goal was that only the database client, the applications would be able to unencrypt the data, would be able to access the data because the encryption would happen on the client.

34:57 Right.

34:57 There's different levels of encrypted at rest.

35:01 But you're talking about even encrypted in the memory of the database, and the database itself can't get it, right?

35:06 That's a different level than I've set up a file system where when I save the data finally to disk, that part is encrypted.

35:13 Like there's more to it than just that, right?

35:15 Well, not much more to it than that.

35:17 I mean, it was certainly encrypted in the memory of the database server.

35:20 So the database server itself couldn't see the data.

35:23 But by the time it reached the application, it was unencrypted in the application's memory.

35:28 Sure.

35:28 So they sell this – they position this as a really great database for the cloud because your data might live in the cloud.

35:38 But even if somebody were to get access to it and, you know, walk away with your virtual machine in some unknown way or even just log into the database server, potentially your data is still safe, right?

35:50 Right.

35:51 Okay.

35:51 That's pretty unique.

35:52 I don't really know a lot of other databases that have that.

35:55 And the fact that, you know, one of ZVDB's – I mean, a decision that I made a long time ago with ZVDB was that the search – basically all the sort of application logic would happen on the client, that the server was really dumb.

36:11 That was partly a reflection of my ability to write a smarter server.

36:15 But that actually, you know, fit ZeroDB's use cases really well because, you know, by doing everything on the client, only the client needs to have unencrypted data.

36:26 I see.

36:27 So basically the client or the application, even if it's like a web app, it has some kind of private key that it can decrypt its data with.

36:35 So how does it do queries and things like that?

36:37 In ZVDB, when you're – in doing a query the sort of the traditional way, you're accessing B-trees and higher-level facilities built on B-trees that are regular database objects just like any other object.

36:52 So they're encrypted.

36:54 They're part of your database.

36:55 So when you – let's say that you want to look up something in a B-tree.

36:59 What happens is you access the top of the B-tree and that gets loaded from the server.

37:04 And then you start walking the nodes of the B-tree to find the value you're looking from.

37:08 And those nodes get loaded from the server as necessary.

37:11 And then they're all cached locally.

37:13 I see.

37:13 Then the execution of the actual way of clause or whatever happens on the client.

37:21 And so you said it was the 0DB guys that made it possible for you to make this transition to sort of being independent, working on these open source projects and so on.

37:29 Yeah.

37:30 Yeah.

37:30 You want to tell us that story?

37:31 Well, I don't know that there's much to tell.

37:33 They needed some scalability help.

37:38 And also they didn't really have a lot of deep knowledge of Zodb.

37:40 So I could sort of provide a lot of help in terms of how they architect their application.

37:46 They funded – the sort of client-server part of Zodb was written a long, long time ago and it used async core.

37:54 And it really needed to be modernized for performance, for maintainability, and also to facilitate adding SSL support.

38:05 Sure.

38:06 And so they funded that along with a bunch of other work.

38:08 Nice.

38:09 And I saw they released 0DB on GitHub not too long ago.

38:13 So that's pretty cool.

38:13 They've really sort of switched gears.

38:15 In fact, I think they've renamed the company.

38:17 So I don't think that the project of 0DB on top of Zodb – I don't think it's actually active at this point.

38:24 Okay.

38:24 Their customers were banks and that sort of financial people.

38:29 And so having a Python database wasn't really all that interesting to them.

38:34 Sure.

38:34 And so they've changed their focus towards dealing with big data.

38:39 And I don't really know all the details, but basically it's the same sort of thing.

38:44 Your data is encrypted at rest, but while you're processing it, then it's encrypted in the processing pipelines.

38:52 Sure.

38:53 Okay.

38:53 I see maybe they've changed the underlying storage engine, but the general idea is still probably more or less the same.

39:01 There's three databases that are probably not super familiar to people.

39:06 Four if you count Postgres, but that one's more familiar to folks.

39:10 I think it's really interesting to look at all these different tradeoffs and study the different databases.

39:14 It gives you a sense for what the value of the tradeoffs are, right?

39:19 Yep.

39:19 Yeah.

39:19 Cool.

39:20 All right.

39:21 So let's switch gears just a little bit towards the process side of things and talk about two projects that you're working on.

39:27 One, a tool for continuous integration like things, and one that's more about Kanban type stuff.

39:35 So first one I want to talk about is Buildout.

39:39 So this is an automation tool written in and extended with Python.

39:43 So is this a continuous integration server, or is this more than that, or what is Buildout?

39:49 It's something different than that.

39:51 So it's really about, let's say you want to work on a Python project.

39:56 So you check out the code, and now you want to actually run stuff.

40:01 And so for a lot of people, what they do is there's a requirements text file sitting around.

40:06 Maybe they create a virtual ENV, and then they run pip against that requirements.txt file.

40:11 Or sadly, what many people will do is they'll just run pip from their machine's system Python

40:18 and install a bunch of things in there, and then they'll have things in there.

40:22 And then they'll run whatever scripts are generated.

40:26 And if the scripts need configuration files, well, maybe they'll write them,

40:30 and they'll check them into version control.

40:33 And if they need extra processes on top of that, it's sort of outside the realm of pip.

40:39 And then the question is, well, what do you do to automate all of that?

40:42 And so Buildout, you know, when we were working on projects many years ago at Zoop Corporation,

40:49 we would, and this was actually before there was even distutils, we were in a mode for a while of creating applications for customers.

40:57 And then the customers would run them on their machines, and their environments were totally different.

41:02 Their environments were typically completely uncontrolled, and usually bad things would happen.

41:06 And so we needed to automate that.

41:09 And in those days, the automation typically involved building Python from source

41:14 because most people's Python environments are in an unpredictable state.

41:17 Okay.

41:18 So you would get like some well-known version and download it and compile it and say,

41:22 we're going to start from here?

41:23 Well, not just, the biggest problem isn't the well-known version, although that certainly is part of it, but the contents of site packages.

41:30 Right. Okay.

41:31 Over the years, that evolved.

41:32 And so Buildout was very much geared towards installing exactly the packages you need

41:39 and then generating the artifacts around that.

41:41 So, for example, I have a project related to the Kanban where the JavaScript client is significant,

41:48 and I need to assemble all those artifacts.

41:50 And I, maybe I'm old-fashioned, but it offends me to check them into version control.

41:55 Yeah.

41:55 I have a Buildout configuration that among the things it does is it runs Grunt to,

42:00 what is it Grunt or, I forget what it runs, maybe Grub.

42:03 It runs some JavaScript tools to assemble all the JavaScript requirements.

42:08 And, of course, it uses Buildout's own mechanisms to assemble the Python requirements.

42:13 It generates configuration files that something like PaceScript would need to use.

42:18 It generates daemon configs.

42:22 So, for example, when I run the process, I usually don't run it in the foreground.

42:25 I mean, I may, but I may want to run it in the background.

42:28 And so there's a tool called zdaemon, which is kind of like SupervisorD, but a little bit more.

42:34 Okay.

42:35 A bit simpler.

42:36 And so that has a configuration file.

42:38 Or if you're using Supervisor, you would want to have a Supervisor configuration file.

42:43 And those files may depend on things that are specific to your environment.

42:48 They might depend on, you know, files that are outside the environment that have paths in them.

42:53 I mean, there are all sorts of reasons why you may not be able to have static configurations that are just checked in.

42:59 Right.

42:59 Okay.

42:59 So Buildout will look at the system, look at all the requirements, and put it together in just the way needed for that location, huh?

43:07 That's one way of putting it.

43:08 Basically, with Buildout, you give it a single configuration file that represents all of the parts of what you're trying to deploy, whether you're trying to deploy to production or to CI or staging or to production.

43:21 And it basically says, okay, I've got all these parts that I need to build, and it just basically builds them.

43:26 And it also keeps track of what it's built so that it can unbuild them.

43:29 And, like, if a part specification changes, it knows to uninstall what it did before and then reinstall it.

43:37 Okay.

43:37 That's really cool.

43:38 How much of this is a general software assembly tool and how much of this is for Python projects?

43:44 Like, could I work on a C++ project only with Buildout?

43:48 You could.

43:49 And there are people using Buildout in non-Python environments, but the vast majority is Python.

43:54 Right.

43:54 Because it's written in Python.

43:55 People, Python folks are automatically attracted, you know, disproportionately to it.

43:59 Right.

44:00 And, of course, it has built-in support for assembling Python applications in a particular way that's interesting.

44:06 Right.

44:06 Okay.

44:07 There's a project called SlapOS, which seeks to be a lightweight virtualization environment that's built on top of...

44:14 Buildout.

44:15 And the things that they deploy in that environment, the vast majority of them are not Python.

44:20 All right.

44:21 Yeah, that sounds really interesting.

44:22 Cool.

44:23 One of the comments you made on the page is that software deployment should be highly automated and really should be able to, like, run one or two commands and just you're ready to go.

44:34 And I feel like the more of that that we can do, the better.

44:38 The more frequently we're at least smaller versions because it's not such a challenge for people to get the new version and all sorts of stuff.

44:45 So I think that's a great philosophy there.

44:48 And I think that the sort of DevOps movement has kind of gotten stalled in too much of an ops rut.

44:54 So I see way too little automation in a lot of things that I see.

44:59 At Zone Corporation, we had things to the point where basically we had a representation of our system as a tree that we stored in Zookeeper.

45:11 Each service was, you know, anywhere from two or three to ten lines of very high-level specification.

45:17 And then we had textual models of our entire system for multiple customers and multiple services and multiple applications and how they interconnected.

45:26 And when we wanted to deploy a change, all we did was modify that tree and check it into Git.

45:33 That's really cool.

45:35 And a few minutes later, it would be deployed.

45:36 Yeah, that's the way it should be, right?

45:38 Definitely the way it should be.

45:41 Cool.

45:42 Okay, so let's talk about your final project called Two-Tiered Kanban.

45:47 So I suspect most people know what Kanban is, but maybe you just give us the elevator pitch and then we can talk about the two-tiered version.

45:55 Sure.

45:55 The compelling thing about, well, there's sort of two compelling things about Kanban.

46:00 And one is sort of philosophical, which is that it's very focused on providing value as quickly as possible.

46:05 Whereas in contrast to something like Scrum that I think focuses on doing work.

46:10 Sure.

46:11 So the concept of providing value as quickly as possible.

46:15 We sort of grew this culture at Zove Corporation both as part of trying to be better software developers as well as trying to follow some lean startup kinds of ideas.

46:29 Part of that was related to the fact that we could develop software and check it into Git, but until it was actually in front of customers, it wasn't really providing any value.

46:38 And then the other part of it is really sort of old-fashioned common sense of finish what you start, which Kanban has the highfalutin term of work in progress, limiting work in progress.

46:53 But that's just a fancy-pants way of saying finish what you start before you start something else.

46:57 Right.

46:57 Don't put more stuff on the board.

46:58 Yeah.

46:59 Get it to the end.

46:59 Right.

47:00 Then you put something on the board, right?

47:01 This is kind of like Trello boards if people haven't seen the Kanban boards, right?

47:04 You've got the columns.

47:05 You move the cards from left to right, like from planned to assigned to in-dev and test, whatever.

47:12 But you said that, or the project says, that typical Kanban boards focus on development.

47:17 And products don't, just because they've had development done on them, don't provide value.

47:22 They provide value in features, land, and customers' hands, hopefully through a single button press to deploy them, right?

47:28 And actually, when we started doing this, we were nowhere near a single button press.

47:31 So being able to track things beyond development was actually pretty valuable.

47:35 And often, even with a single button press, there are things that you have to do.

47:39 Like, for example, if your schema changes, you may have to migrate the schema, and you might have to do that before the software is deployed, and there are things.

47:47 But I'd put it a slightly different way.

47:49 So a traditional Trello board or a traditional Kanban board, or even a Scrum board, you have all these trees sitting on the board, but you can't really see the forest.

47:59 Scrum addresses that a little bit through Sprint.

48:02 So perhaps in a Sprint, you're all focused on a single goal, which is good.

48:07 But whereas, you know, the problem with Kanban is it's always been just sort of this sea of separate tasks, and it's hard to know how they relate, and it's hard to know how they relate to value.

48:18 This idea of two-tier Kanban, which, you know, I read about as I was learning about Kanban, but have to this date never really found an implementation of, although I've heard rumors of implementations.

48:30 The basic idea of a two-tier Kanban is that you have a high-level Kanban that represents units of value, typically, you know, features.

48:40 Right.

48:41 Where a feature may require a number of development tasks, and ideally as few as possible, but sometimes, for example, there might be a new feature that requires lots of UI components, and then lots more sort of below the waterline.

48:54 Right.

48:54 Like the designer work, the database work, the APIs to make it go, the data, the backup, the management.

49:01 There can be many things, right?

49:02 Right.

49:02 Of course.

49:03 So the idea is that you want to be able to have – you want to be able to represent the feature as a whole, the value as a whole, and really focus on moving that value to completion and getting the benefit of it.

49:15 But you also need to be able to manage the things that make that up.

49:18 And so you have this high tier, which is the features, and then the low tier, which is, you know, once you've entered development, all the things you need to do to actually, you know, implement that feature.

49:29 And so typically what you have is you have a board where you have features that move across various columns, and then they hit the development column, and then they explode into the various pieces that make up that feature.

49:40 Okay.

49:40 And each – yeah, each one of the things that moves down the board that is a feature, that's basically its own Kanban board as well, right?

49:48 Essentially, yeah.

49:48 Yeah.

49:49 Okay.

49:50 That's the two-tier part.

49:50 Yeah, it sounds really valuable to me.

49:52 I always find these hierarchical things in Scrum or in Kanban really hard to deal with, like, okay, well, this feature costs this much, but the thing I'm working on actually costs this other thing, and someone else has to work on the data part of it, and they need to estimate that.

50:07 And just, you know, it's challenging to represent those.

50:11 So this seems like a nice way to organize it.

50:13 And it provides a little bit of automation around that.

50:15 I mean, you know, most Kanban people will sort of poo-poo estimation.

50:20 And I've been around enough people who needed estimates to know that you can't sort of completely punt on that.

50:27 But I really am a fan of really low-rent estimation and then automation to track the low-rent estimates and basically keeping the process really simple.

50:38 I've been exposed to some environments, some Scrum environments where, and I think this is actually the norm, is that people sort of go through a bunch of motions, and there's a lot of ceremony.

50:49 And a heck of a lot of time gets sucked up in ceremony.

50:53 Right.

50:53 Yep.

50:54 I've seen it as well.

50:55 Okay, cool.

50:56 So we'll definitely include a link.

50:58 And the link goes to a GitHub project that looks like it is executable code.

51:02 What do you actually get when you go to that GitHub repo?

51:05 Well, you get – right now you get a substantial amount of bit rot.

51:08 Okay.

51:10 But that's why I need to – I want to get back to it.

51:14 Some of the bit rot is because initially I punted on authentication and used Persona, Mozilla Persona Project, which actually worked really well, but it relied on Mozilla doing a bunch of work.

51:27 And they finally got tired of doing that work.

51:29 And so they've – they no longer run that service.

51:33 And so I have to go back and – That's a challenge for your authentication and identity management.

51:38 Yeah.

51:39 So my – I need to go back and I want to add hooks to be able to use things like – I don't really want to manage – I don't really particularly want to manage usernames and passwords.

51:49 So I want to be able to work with like Google Auth and various others, you know, Facebook Auth, et cetera, and let people choose that.

51:56 Some other bit rot that I've sort of got is that it was written for Zodb 4 and Zodb 5 changed in ways.

52:05 There's actually discussion on the Zodb list right now about – I don't know if you're familiar with RethinkDB.

52:10 Yeah.

52:10 But so there's this idea of having data pushed to you.

52:14 And that's actually how Zodb works under the hood.

52:17 But that's never really been exposed.

52:20 Right.

52:20 With the transaction commits and sort of refreshing the objects people have memory, right?

52:25 Yeah.

52:25 Right.

52:26 So when you use a number of the Zodb storages, when a transaction commits, then the IDs of all the objects that were modified are pushed to all of the other clients.

52:36 And they're invalidated.

52:38 So there's already interesting information being pushed to clients.

52:41 But that's never really been surfaced at the application level.

52:44 And in Zodb 4, it was really easy with a small monkey patch to get at that.

52:49 And the Kanban relied on that.

52:52 But now in Zodb 5, that's no longer possible.

52:55 So I'm in the process of adding that feature to Zodb 5, adding it as an official feature.

53:01 Yeah.

53:01 That's the way to do it anyway, right?

53:03 Officially?

53:03 Yeah.

53:04 Right.

53:04 Well, the Kanban has been – the original version that we used at Zope Corporation was actually a client server thing on top of the Asana API.

53:13 And so the one that we used there was built on top of Asana.

53:17 And Asana's API became really, really slow.

53:21 They, too, got tired of providing an expensive service for free.

53:24 Yeah.

53:25 Yeah.

53:26 We'll run on this one $10 server over there.

53:28 Exactly.

53:29 So since that, it's been kind of an R&D side project.

53:35 And I'd like to really push it to completion and maybe even try to offer some sort of – offer it as a service.

53:42 Because I wouldn't care so much, especially my last job, which the company was a great company, but they really struggled with process.

53:52 And I think they would have liked to have used the Kanban, but it wasn't quite ready.

53:56 And that was really frustrating for them and for me.

53:59 So I'd like to soon take some time to actually get it much closer to completion.

54:04 Yeah.

54:04 It sounds like a great software-as-a-service type thing.

54:07 So hopefully you can do that.

54:08 All right.

54:09 Very cool.

54:09 Well, it looks like we should probably leave it there.

54:11 We've covered a lot of ground on this episode, but we're pretty much out of time.

54:15 So before we move on, let me ask you two final questions.

54:18 We now have over 100,000 packages on PyPI, so hooray for that.

54:23 And there's many that I'm sure you've come across that are noteworthy, that are not necessarily the most popular, but would be really cool to find out about.

54:30 So what one would you like to recommend people look into?

54:32 Well, it really depends.

54:34 I mean, obviously it depends on what you do.

54:36 But Bodo has delighted me over the years whenever I've had to touch AWS.

54:40 So I'm a big fan of Bodo.

54:42 Yeah, I use Bodo as well.

54:43 I'm also a huge fan of Mach.

54:46 Right.

54:46 Okay.

54:47 I think he did a really nice job of balancing dynamism and functionality.

54:52 I could go on and on, but those are a couple that sort of come to mind.

54:57 And of course, Zodb.

54:58 Of course.

54:59 Yeah.

54:59 And NutDB as well, right?

55:01 Very nice.

55:03 Very nice.

55:03 Okay, cool.

55:03 So thanks for that.

55:05 And finally, when you write some Python code, what editor do you open up?

55:08 Emacs, of course.

55:09 Emacs.

55:10 All right.

55:10 Right on.

55:12 That's definitely a popular one.

55:14 I'm giving a webinar next week on PyCharm.

55:16 And I have to say, I'm actually pretty impressed with PyCharm.

55:20 I like the, you know, as a straight text editor, I still like Emacs a lot.

55:25 But they really assemble a nice package of things along with that, like, you know, database access and REST clients.

55:34 And it's an interesting pile of functionality.

55:37 Yeah, absolutely.

55:38 And when you give that one, maybe if they have it recorded by the time we release this, we can put the link to your webcast in there.

55:44 That'd be cool.

55:44 Okay.

55:45 Awesome.

55:46 All right.

55:47 Well, that all sounds great.

55:48 Any final call to action for the listeners?

55:50 Anything you want them to check out or do?

55:53 Learn about transactions and then check out Newt.

55:55 I'll definitely have Newt, DB, and all the other ones in the show notes so people should be able to get right to them.

56:00 Cool.

56:01 Yeah.

56:01 Jim, thank you for being on the show.

56:02 It's been great to learn about all these different projects with you.

56:05 Thank you for having me.

56:05 You bet.

56:06 Bye.

56:09 This has been another episode of Talk Python To Me.

56:11 Our guest on this episode has been Jim Fulton.

56:14 And this episode is brought to you by GetStream.

56:17 If you're building an app with a feed, make sure to check out GetStream at talkpython.fm/stream.

56:23 They have the intelligent, scalable, and tested feed API you need to be one step closer to launching your app.

56:30 Are you or a colleague trying to learn Python?

56:32 Have you tried books and videos that just left you bored by covering topics point by point?

56:37 Well, check out my online course, Python Jumpstart, by building 10 apps at talkpython.fm/course to experience a more engaging way to learn Python.

56:46 And if you're looking for something a little more advanced, try my WritePythonic code course at talkpython.fm/pythonic.

56:53 Be sure to subscribe to the show.

56:55 Open your favorite podcatcher and search for Python.

56:58 We should be right at the top.

56:59 You can also find the iTunes feed at /itunes, Google Play feed at /play, and direct RSS feed at /rss on talkpython.fm.

57:08 Our theme music is Developers, Developers, Developers by Corey Smith, who goes by Smix.

57:14 Corey just recently started selling his tracks on iTunes, so I recommend you check it out at talkpython.fm/music.

57:20 You can browse his tracks he has for sale on iTunes and listen to the full-length version of the theme song.

57:25 This is your host, Michael Kennedy.

57:27 Thanks so much for listening.

57:29 I really appreciate it.

57:30 Smix, let's get out of here.

57:32 Smix.

57:32 I'll pass the mic back to who rocked it best.

57:42 I'll pass the mic back to who rocked it best.

57:54 you