Monitor performance issues & errors in your code

#65: Jump on the real-time web with RethinkDB Transcript

Recorded on Wednesday, Jun 22, 2016.

00:00 Long gone are the days of the web acting as just linked documents and glorified brochures. Web apps of today are just that, rich interactive applications. But unlike desktop apps of old, these are apps with 100,000's or even millions of concurrent users.

00:00 We expect that these apps will instantly reflect changes to the data, potentially made by any of the users connected to the system while we are using them.

00:00 This has put a strain on the web servers, databases, and architecture of our web apps. Technology has responded by delivering amazing real-time capabilities with things like websockets and SignalR at the client layer and event driven systems on the web servers. But what about the database? Could it be events all the way down?

00:00 That was the goal of RethinkDB's cofounders when they pitched it to YCombinator.

00:00 Now it's time to hear the story of RethinkDb with Slava Akhmecket.

00:00 This is Talk Python To me, episode 65, recorded June 22nd, 2016.

00:00 [music intro]

00:00 Welcome to Talk Python To Me, a weekly podcast on Python- the language, the libraries, the ecosystem and the personalities.

00:00 This is your host, Michael Kennedy, follow me on Twitter where I am at @mkennedy, keep up with the show and listen to past episodes at talkpython.fm and follow the show on Twitter via @talkpython.

00:00 This is episode is brought to you by Hired and SnapCI. Thank them for supporting the show on Twitter vie @hired_hq and @snap_ci.

01:48 Michael: Slava, welcome to the show.

01:48 Slava: Thank you, it's good to be here.

01:50 Michael: Yeah, I'm really excited to talk about my favorite kind of data which are document data bases and no sql data bases. So, we are going to dig into RethinkDB today and you guys have such a cool story and such a cool community that you built up and we'll get into that. But before we do, let's start at the beginning, what's your story, how did you get into programming?

02:10 Slava: Well, so I was born in Ukraine, and when I was maybe seven or eight years old my parents got me a machine called ZX Spectrum, which was the computer that plugged in TV and a cassette player and they got it for me to play games but when you boot it it had a Basic interpreter. And somehow I don't know what happened but the Basic interpreter was more interesting to me than the video games and I just kind of started learning Basic and learning how to type commands and what they do, and eventually, I wanted to make games so you know, I use the Basic interpreter some library and it was really slow, so that's how I got into assembly because I realized I could speed things up. And, yeah, that was the beginning and-

02:55 Michael: That was a bit of a jump, right?

02:56 Slava: It was, it was a bit of a jump and at the time like there was no internet so I had to work off of a manual that was kind of printed on a printer and half the stuff was missing, but that's how I got into it and I just never stopped.

03:09 Michael: Yeah, that's awesome. I've also made that jump from high level languages to assembly for a little bit of time and yeah, I kind of left that behind. Those were basically the options in early days, right, Basic and assembly, you could pick your extreme and go to it I guess.

03:27 Slava: Yeah, the ZX Spectrum, there were really two options, it was the built in Basic interpreter and assembly, there were no compilers like you couldn't get- I don't think you could get a different language on it, so it was just those two things and I figured out how to mix and match them and do all kinds of cool stuff on that machine. Yeah, I probably learned like 75% of everything I know now about programming just like 3:56 the ZX Spectrum. And it was the kind of a computer that is similar I guess to the 3:59 but you couldn't get that in Ukraine so I was stuck with like 4KB of memory, you know, I couldn't save anything, so that's how I got started.

04:11 Michael: Wow, that's cool. You said it connected to a television, what do you think the resolution in pixels of that thing was?

04:17 Slava: I don't remember, but I think it was like 120 pixels vertical or something like that, it was really small. I didn't feel that way at the time, but...

04:30 Michael: It felt amazing at the time, right, it was amazing.

04:32 Slava: It was amazing, I am sure it wouldn't be fun programming in it now.

04:35 Michael: Yeah, that's probably a thing better left to nostalgia.

04:38 Slava: Yeah, I was actually, I was thinking about buying it on ebay.

04:42 Michael: Nice.

04:42 Slava: They are available and they are pretty cheap but I never actually got around o it and I don't know if I ever will.

04:48 Michael: [laugh] Yeah, nostalgia, how interesting. So let's talk about something a little more modern, you built a pretty amazing data base, a document data base called Rethink DB and your tagline is it's a database for the real time web, what is RethinkDB?

05:05 Slava: Soo the RethinkDB is a no sql scalable real time data base. When you start out using it it's fairly similar to MongoDB in a sense that you can store and retrieve json documents, yo can scale it out the multiple machines or multiple data centers, but that's pretty much where the similarity ends. What's unique about RethinkDb is that you can't get in any other data base that I know of right now is that it's designed for real time applications and the way that works is in a traditional data base you send a query and get a response and then you are done and if you want to want update you have to send another query. And in RethinkDB you can subscribe to query so you could say I am interested in top ten selling books in my bookstore, and then any time the data changes in the way that it updates the results of the data, the data base pushes the notification to the application, so if you are building any kind of a collaborative application or a multi player game where things change all the time, some analytics software where you need updates right away RethinkDB is a really great database for that and it's unique I think because no one else does that where you get your application is designed around this event driven model where anytime something changes in the data base you get an event saying hey the relevant results that you are interested in are now different. And that makes dramatically easier for people to build real time apps.

06:28 Michael: Yeah, that is really cool and that is a quite a unique capability, I am quite sure the Mongo Db does not have that. You've got to ask the question over and over to get new answers. 

6:39 Slava: In modern data bases, the RethinkDb you can kind of fake this by doing various things like you could send you could poll and you could ask the same question over and over again, you can subscribe to the replication log, like there are things users can do to get assemblance of this functionality but it's really different when they are baked into the product or project on day 1. Because it just opens up a whole world of possibilities and when it's a higher level feature that you can run on most queries or almost all queries, it opens up so many possibilities that you can't really do by faking it, in other systems.

07:16 Michael: Yeah, I totally agree, that sounds great. I recall you telling a story, I heard it somewhere where you sort of talked about the progression of real time systems, right, so we have the front end being real time with things like node.js and signalr, and some of these, things that happened on the server you can get real time stuff there, but then it kind of stops. Can you maybe tell that story, I think that will help really submit what your value proposition is.

07:48 Slava: Yeah, so traditionally web applications were built around request response like that is how http works, right, you type in the web address int he browser, and the browser sends the request to the web server, the web server sends the request for the data base to get some information, the data base responds, the web server responds and then you render the page. So that is how web apps were built traditionally, and then things started changing because people realized that to have really immersive experiences you have to push information to the browser without reloading the whole page, refreshing the whole page. so, what started happening is that people started building applications around push functionality, so you know, Javascript is fundamentally event driven, and people started building front end frameworks like Angular around the idea of events and event driven programming. And then, as that became available, you needed that on the back end and we have things like signalr, web sockets to allow push to the browser, so on the back end Node.js is fundamentally event driven programming model, where you can respond to events.

08:52 But, then when you get to the data base it still requests response, and the idea behind RethinkDB was, well, let's make a full stack event driven programming, environment programming model where just you don't stop at Node.js but the data base itself is event driven where you say I am interested in these things and then any time something changes, you would fire an event and then Node.js processes that event and pushes that to the browser and then the browser processes that event. So what that does is it gives you a complete full stack event driven programming model, and now you don't have to fake events when you go from the Node.js layer to the data base layer, the whole thing is event driven and that makes just real time architecture so much easier to deal with and opens up so many possibilities for building apps because things that took week now they take a couple of hours.

09:39 Michael: That's really cool because those are hard problems to solve. And if you can just plug the pieces together, that's wonderful, right?

09:47 Slava: Yeah for all of these problems you can kind of hack around them, and people have done that for a long time, I mean people built real time event driven applications before RethinkDb existed, but hacking around this problem it takes a long time, people solve the same problem over and over and over again, it's really hard, you kind of need domain expertise, so you know, the team of people that really understand this problem and got burnt a couple of times when trying to solve it, and the idea was like hey let's just subscribe so people never have to deal with this again.

10:15 Michael: Yeah, that's great, can you give some examples of some apps that people have built? Or the types of apps people have built?

10:21 Slava: Yeah, so RethinkDB has a pretty huge community, so we have above 200 000 developers building applications on it now, and it's doubling about every 3 to 4 months, so people have built all kinds of apps, they built games, people built analytics apps, just anything you can imagine that can be built from all over the web. One of big examples is Fidelity investments which is a big investment firm, built the mobile app for their users to manage their accounts in real time. So you open the Fidelity mobile app, all of that is backed by RethinkDB, I think that is probably the biggest use case I can talk about where it's RethinkDB backs basically like 40 or 50 million users that Fidelity has. Some others are like Rethink is used by NASA for some real time updates on what happens with extra 11:10 activity, so every time Austin is gone to space walk, there is a bunch of data that has been generated, that's been processed in real time. So it is all over the place, like people use it for financial applications, for just web apps, for collaborative apps, and for what I personally wouldn't imagine like space activity.

11:29 Michael: Yeah, that's really awesome, I would have ever thought space walks would be one of the use case, but that's really cool.

11:36 Slava: Yeah, we definitely did not design it with that in mind.

11:44 Michael: Let's see if we set up a vpn from the space shuttle, then- no, I'm just kidding.

11:50 Slava: No, it's an ssh connection probably.

11:54 Michael: Yeah, probably. Ok, so let's talk about the origins of RethinkDB, what gave you the idea that hey, we should create our own data base, like that's pretty daunting, right?

12:05 Slava: Before we started I was in grad school and I was actually working on a mammalian brain simulation and super computers, so I knew a lot about just distributed computing and I knew a lot about the idea of like real time, because in brain simulation you generate a lot of data that you have to sum between multiple machines and there are a lot of challenges on figuring out how to parallelize all that stuff because the metric is the bottleneck. And my co founder who was also at the same university he was more of a human interaction expert like basically he was really interested in user interface, and we met, we just kind of got together and we spent like hours and hours talking about computing and the future of software and where things were going, and then it was obvious to us that the world is really becoming more real time, that there is a ton of data being generated, we generate more and more and more data every day, probably exponentially.

12:57 And it was very clear that like these static user interfaces just aren't that last very long and everything is becoming real time; so we looked at how applications are being built, how they are being deployed and then it occurred to us that like there is a lot of innovation around real time almost everywhere in the stack except for the data base, and it was just the part with all those very important to build to advance this mission of kind of unlocking real time apps for anybody who wanted to build it; and we want to do, it's our goal even now is to make real time be the default, so when you build the map they should be real time by default, not static by default and then people have to do a lot of work. So that's how we got started, and yeah, building a data base is a daunting process, it's a very hard problem, it's a lot of work, but we were excited about we don't look at the downsides, we mostly looked at hey what is this going to make possible and that was just really exciting, so then we thought about how long it will take-

13:55 Michael: yeah, sure, and people successful, right, they have this stream this vision and it's like we are going to make it through the challenging bits, we are going to create this thing, right, so it sounds like you are really driven to sort of solve that real time problem.

14:08 Slava: To be honest, we also thought it would be easier than it was, and we thought it would take a lot less time than it did, so a lot of it was naivety but it sure was pretty well I think in hindsight.

14:20 Michael: Yeah, maybe it didn't, it was, it let you get through otherwise it would have been like well it's too much work, forget it. That's cool, and one of the things you did pretty early on was you went through Ycombinator, right? The accelerator.

14:31 Slava: Yeah, we did.

14:33 Michael: Yeah, so what was that experience like, what year was that?

14:36 Slava: So we went through Ycombinator in summer of 2009, at the time, me and Michael, my cofounder, we were in New York and we were in school, and we had this idea and we applied to Ycombinator, we got an interview so we flew to California. The interview was like 8 minutes, we got in and we just, we knew we wanted to go to California and start a company, so we just packed our bags and moved here. And, Ycombinator for us was great because we were new to the startup world, we kind of knew a lot about software but we didn't know a lot about how to start business and hire people and manage people and how any of that stuff works. And Ycombinator would invite speakers for a successful into startup world, and every Tuesday they had a new speaker who would talk about their story and just that process of listening to people who built successful things before was extremely useful because it got us into mindset of what it takes to build something that people really want to use like build something that makes- that's valuable for a lot of people. So that was great, I mean, it was a phenomenal experience, I'd recommend it to anybody.

15:42 Michael: Yeah, I've never gone through an experience like that but it sounds really cool and like it would be very beneficial. So, you know, I think when people are new and they are thinking about starting businesses, they often think about the technology, right, like if we build this app, and have the best technology just this works just the way that it should solve whatever problem to solve, that's only like 30%, 20% of what it takes to start launching and be successful with the technology business, right, there is all the marketing, and the growth hacking, and the user outreach and like how much of those types of things did you learn at YC?

16:22 Slava: We learned a lot about that at YC. I'd say- so everyone who goes to YC is very competent technically. And, that was the whole premise of YC that you get really technically competent people and then teach them all these other things that they need to know to build something successful. So technology itself, I mean, it permeated everything, it was in the background of everything because obviously it was about tech companies, but it wasn't- we don't learn like how to program at Ycombinator, or anything, it was all about how do you build a successful project, how do you grow it, how do you build the product that people really care about, how do you think about markets, things like that, it was very little of it was about the technology itself. Like, the whole things was, under the 17:06 that technology is super important and technology unlocks all kinds of possibilities for people, but everyone at YC was already good at that, so the much bigger part was around all these other things you mentioned.

17:19 Michael: Right, yeah, so the technology is basically table stakes like, you don't even get there if you don't have that skill set, but it's teaching you all the things you didn't maybe even know that you needed to know to succeed, right?

17:30 Slava: Yeah, I think the premise of YCombinator is if you take technically competent people all this other stuff, you can teach them, marketing and growth hacking and all these things, like it's hard but it's not rocket science, like if you can build the compiler, you could probably figure out how to market your product. That's the premise of YC and the idea was like hey, there is a lot of information there, it's not very hard but it's hard to get, so we are just going to tech people all the stuff and see what happens, and I think the proof is like YC is really successful now.

17:30 [music]

17:30 This episode is brought to you by SnapCI, the only hosted, cloud based continuous integration and delivery solution that offers the multi stage pipelines as a built in feature.

17:30 SnapCI is built to follow best practices like automated build, testing before integration, and it provides high visibility into who is doing what. Just connect Snap to your GitHub repo and it automatically builds the first pipeline for you, it's simple enough for those who are new to continuous integration, yet powerful enough to run dozens of parallel pipelines. More reliable and frequent releases, that's Snap.

17:30 For a free, no obligation 30 day trial just go to snap.ci/talkpython.

17:30 [music]

19:03 Michael: One of the cool aspects of RethinkDB is it's open source, right, like I can go to github and get it, I can got o github.com/rethinkdb to get the data base itself, and it's really popular, right, it has over 14 000 stars. So, you said that it was one of the biggest C++ open source projects on github or something like that, is this true?

19:27 Slava: yes, it depends on how you measure, like so if you got o github.com/explore, they keep changing the interface I don't know if you can check this now but at least you used to be bale to look at trending projects, I think that's still available, and you used to be able to look at like filter by stars, by language, by all kinds of things. I haven't done this in a while but yeah, Rethink has been the biggest trending C++ project on github for a long time. Pretty much since we launched it actually.

19:56 Michael: Yeah. That's really cool. When did you actually launch it, you said you did Ycombinator in 2009, when was there a thing that you let loose on the world?

20:05 Slava: I think it took about 3 and a half years to get the first version out. My memories are a little hazy on this because it's been a while, but yeah, it's a hard problem and it took a while to build the first version of the product.

20:16 Michael: How many people worked on it?

20:16 Slava: So right we are 18 people, but before the first version I think the whole company was about 7.

20:24 Michael: Ok. It's definitely a big project to undertake so very cool. When I look at github, it tells me that it's like 50% C++, 26% Python, some Javascript, what are the technologies inside?

20:40 Slava: Yeah, so Rethink is pretty complex because it touches almost every part of the stack, and on the bottom of the stack it patches the operating system and all the APIs, that we have to deal with, the disc access and network access and memory management and things like that. So all of that stuff is done in C++ and even a little bit of assembly actually and then around the core C++ database, there are a lot of technologies to connect it to user so the drivers are done in a lot of different languages, like there a Node.js driver, Python driver, a Java driver, Ruby driver, just drivers for all kinds of languages. So that is written in whatever you know native language the driver is for. Then there is a lot of code for testing different things, for testing the query language, testing the distributive system, all kinds of tests, so most of that is written in Python. And then, there is a lot of glue code, so we kind of, we try to minimize the number of languages and technologies that we use, just to kind of keep it contained. But most of it is C++ but we really have a lot of different things that we use, I'd say the biggest are C++ Python and Bash and then there are drivers for almost every language.

21:52 Michael: Right, and you pretty much have to use those languages to write them, usually anyway.

21:57 Slava: Yeah.

21:57 Michael: Ok. Cool. So, when I think about No sql data bases, you've got the document data bases, you've got key value stores, and those sorts of things, it feels to me like document data bases are the kind of data base that you could use as your sole data base for your app, right, there is somethings like rediss, dynamo dbs where you probably use it for like some particular use case but it's not your only data store. What do you think about this idea that document data bases could replace relational data base rather than just be another thing that you use in parallel?

22:32 Slava: Well, so data bases are fundamentally the horizontal technologies, and you could use them, they are applicable to a lot of different problems. And I think that relational data bases used to be this Swiss army knife where you could use them for almost anything, because most of the data used to be relational. And now with modern apps, most of the data is not relational, most of the data is hierarchical, there are a lot of fields missing, but we think of as document kind of based data. So I think for most of the modern use cases object oriented or document data bases are really really versatile and they could be used for almost anything. Now there is still data being generated as relational and just like with anything, like you could use relational data bases to start document data, you could use document data bases to start relational data and it will work fine but you will have to hack around a bunch of problems, so I think document data bases will be more and more as more data is being generated, more apps are being built and those are fundamentally not relation data models.

23:36 So document data bases are great for that. I don't think they are going to replace relational data bases personally because relational data bases are just fundamentally better at storing relational data. And you know, some people decide to unify their infrastructure they figure, I don't want to be 23:51 technologies managing things, I want just one, and they could put relational data in the document data base and it will work just fine, but it's not ideal, I think that you would have to hack around a bunch of things there are some things you can't do, so my view of the world is that people will continue using different specialized technologies for different use cases, and there is always a balance like if you use too many specialized technologies it's still hard to manage, if you use too few you have to hack around the bunch of stuff and you have operational problems. So there is a good middle ground there. And that sort of feels to me like that's where the world is going, that's where we see most of our customers use.

24:29 Michael: Yeah, ok, that's a really interesting way to phrase it and to look at it. So what kind of relational features does Rethink have, like does it have foreign keys, does it have joints, does it take transactions, does it take any of these kinds of things on or does it lean more towards the MongoDB style where for goes those for other reasons.

24:49 Slava: Ok so RethinkDB does particularly distributed joints, it does distributed sub queries, so as far as its complexity for given query, you can do almost anything in RethinkDB and sometimes more than you could in a relational database. So we take on all of those challenges and particularly distributed joints are extremely useful in almost every use case actually so we don't do transactions at least we don't do transactions for now, there are some proposals and integrating them but that's they are still on the drawing board, we don't do foreign keys in the way they are understood in relational data base. So for example, we don't do cascading the leads or things like that, so RethinkDB is much, it has way more relational features than any other document data base that I can think of, it certainly way more than Mongo, but it's quite as good for relational data as like Postgress or Oracle or SQL server or MySql or one of the many relational data base management systems.

25:49 Michael: Ok, yeah, very interesting. What does it look like to use this and maybe you could give us an example from the Python driver if you know it of the top of your head or if not just from any other drivers like if I create a new project and I want everything running, how do I connect to it and get going like what are the steps, you know after say code exactly.

26:10 Slava: One of the biggest design goals that we had was Rethink was to make it very easy to use and we literally thought like every extra step that a developer has to go through will cut our user base in half. So we just made it as simple as possible and we spent a lot of time doing that. The easiest way I mean to figure out this like if you go to Rethinkdb.com/docs there is like a 30 second tutorial that shows people how to use it but in Python, it's very simple, you import RethinkDB and it's just a Python driver for Rethink, you say RethinkDB.connect and then you run a query like for example it say tableusers.insert you just put a document in there and .run, that inserts the document into the data base, it's really simple, it's just a couple of lines.

26:55 Michael: yeah, and you guys have a nice fluent API where you say I want to create query.filtered by this. limited by that and ordered by that and so on, right, you can chain them together in a really nice style.

27:07 Slava: Yeah, so that was inspired by JQuery and the goal of the query language was first of all to make it seem native in the programming language that people use, if they use in Python, the query language, I mean it's just Python, if you are using Ruby it's just Ruby and so on. And, one of the biggest kind of challenges that people run into when they use SQL is that if you look at stack overflow questions that people asked about SQL they are kind of different from questions about any other programming language like if someone is using Python and they look in stack overflow questions the questions are like I am trying to do devs it's not quite working, I don't understand what's going on or how does this function work, or things like that. But in SQL, people ask like weird questions like I want to do this I don't know how to express it, how to even do this.

27:50 I don't think SQL was designed for business analysts and it's kind of like, it was sort of meant to be like English but it's not really English because it's a programming language of course, so a query language, so it's kind of challenging and the goal was ReQl was make it be really intuitive, so you start out and you can think of it as a data flow language kind of like Bash and Pipes where you start out with the table and then you say I want to run this transformation and then I want to run that transformation, after that and then you can keep changing things and it turns out to be really intuitive because people can write queries by just chaining on things and seeing intermediate results and then incrementally building up the query until they finally get to what they want. So with ReQL it just becomes really easy to express what you want in a way that I think SQL could never allow people just because of the fundamental differences on the design.

28:41 Michael: Yeah, it feels a little more written for programmers in a simple way rather than trying to create yet another language, yeah, that's really nice. So one thing that you guys focus on a lot, I think it feels like it's a little bit of the influence of you co-founder Michael, is design in a way it feels to use software.

29:04 Slava: All of the design stuff and usability, that's mostly Michael's doing, and Michael really cares about the user experience and he takes it from I mean, it's not just how things look or how many steps you have to go through, he thinks it through to the point of like what does the user feel, when he interacts with the particular feature, what does the user feel when he interacts with the product as a whole, so he spends a lot of tie just thinking holistically about the user experience. And the kind of subjective experience the developers go through when they use the product, and that permeates everything, that permeates the query language, that for me it's like the install process, the 29:46 that goes into RethinkDB and the documentation to make things easy, you know, the website, the admin console, like all of these things.

29:55 Michael: Yeah, I was thinking specifically the admin console, because a lot of databases to also use them feel not so great, like, sometimes you just have a command line interface to it, you know, if you have a UI it usually looks like it was created by DBA, something like this right, it's not a great experience, but you guys have like a beautiful, simple to use web management interface for your data bases, even for like charting and replication and failover and all these kinds of things, right?

30:28 Slava: Yes, so our hypotheses was that like people spend, if they are running a web application, there is a lot you are doing on the front end, sometimes a lot of the back end, but the databases are the core of the things so people spend most of their time writing database queries, and if you think about like what's that like, for the application developer, well that means they spend 6, 7, 8, sometimes more hours a day just dealing with the data base product as their life at work, and it was very important that that experience is pleasant for them. And that permeates a lot of different things but one of the biggest ones is we thought that there needs to be almost like a development environment where people feel comfortable, that's easy to learn, easy to experiment with queries, easy to figure out what's going on, easy to test things, play around, easy to figure out what happened in your cluster, so that was the goal of the web UI, it wasn't 31:23 it was something with all those very important for the people, for our users, right, for the developers that are going to use Rethink because we spend so much time in it every single day, that not adding it or not building it and not thinking it through almost didn't really make any sense, like we were really surprised that that doesn't exist in a lot of other data base systems because if you think about how much time people spend in them, that just seems like a crucial thing, with almost be like building a compiler without having a text editor, like the text editor is the fundamental way of writing programs, it's really important. Like, you needed the compiler but without the text editor like just the experience of using the compiler would be miserable.

32:05 Michael: Yeah, that's a good analogy. It feels like you guys really focus on making interacting with the whole system delightful.

32:12 Slava: Yes, that's very important.

32:13 Michael: Yeah, one of the things that really surprised me in a positive way was I heard that you guys have a full time artists, like if you look at your documentation you have little like cartoon characters and stuff to make it feel friendlier you know, like if particularly I have like Rethinkdb.com/docs/quickstart pulled up and there is like a little data base walking up to this character; most data base companies I think are on the opposite spectrum of this type of experience, this is really cool.

32:47 Slava: Yeah, definitely. So when we were first shipping rethink Db like before the first version, we thought it was important to do just basic things that you do like you know, branding, just like kind of the things that every open source or every project does in general. So we hired an artist to do a lot of the work and any kind of came in should do their original things, but she also an enormous amount of passion around art, she's brought that passion to the company and it became immediately obvious or very quickly it became obvious to Michael and I that Anya's work and her passion for the art can permeate a lot more than just the basic things like the logo and some illustrations on the front page, and she was very adamant about like hey this could really change the experience of people interacting with the product and because Michael as a user experience person like he immediately grabbed on for this idea and he did a couple of things early on like she made illustrations for the documentation and people started commenting things like on Twitter, people would say like wow that makes everything feel way more accessible, and she brought that passion into the company and then it was just obvious that it makes everything way better for our users and no one else does that so it was also kind of differentiating for the company because people noticed it she has like added this whole new dimension of interacting with the software project that you don't often see in other projects.

34:13 Michael: Yeah, I am sure you guys are really delighted, like wow this really does make this so much friendlier, it is cool, I really like it, it's a nice touch

34:22 Slava: Yeah, it's kind of not obvious at the beginning that that would matter, but then when we did a couple of these things and like everyone started noticing and people started commenting like wow this makes everything way more accessible and we are like yeah, we need to do more of this.

34:34 Michael: Yeah, for sure.

34:34 [music]

34:34 This portion of Talk Python To Me is brought to you by Hired. Hired is the platform for top Python developer jobs. Create your profile and instantly get access to 3500 companies who will work to compete with you.

34:34 Take it from one of Hired users who recently got a job and said, "I had my first offer on Thursday, after going live on Monday and I ended up getting eight offers in total. I've worked with recruiters int he past but they've always been pretty hit and miss, I've tried LinkedIn but I found Hired to be the best. I really like knowing the salary upfront, privacy was also a huge seller for me."

34:34 Sounds awesome, doesn't it? Wait until you hear about the signing bonus- everyone who accepts a job from Hired gets a $1000 signing bonus, and as Talk Python listeners it gets way sweeter. Use the link hired.com/talkpythontome and hired will double the signing bonus to $2000. Opportunity is knocking, visit hired.com/talkpythontome and answer the door.

34:34 [music]

35:44 Michael: Can you talk about the community around RethinkDb, like it has grown really quickly, you've got a very passionate user base and you guys do a lot to engage the community, can you talk about some of the things you do?

35:55 Slava: Yeah, so we, so community, the user community is one of those things that is also the core of the company, it's very important to do good community building and connect with our users. And usually, what happens with open source is people, the idea that people have about open source is you make the source code available but that doesn't make a community happen, right, and actually most of the work is done by Michael and we have someone here, Christina Keelan who does a lot of community management, she is absolutely amazing at it. And so the way we approach the idea of building a community is everyone who uses RethinkDB is kind of equal and we, the employees of the company we just have something to be paid for our work but we are also just members of the community.

36:45 And what that means is it's not just about publishing the source code, it's about doing everything openly so that users could communicate with us like for example we do all of the design discussions on github and it's our employees comment and features and design proposals and things like that but they do it, not only our users because anybody in the world who is using RethinkDB can go to github and say hey I think this should be done this way, people can contribute, so the whole thing is done in the open; that is huge for fostering a good community because people who have invested in the project and they feel like their opinions are really going to be heard, and that they can kind of drive the direction of the project, they can drive the direction of the features, how they are going to be designed.

37:25 So that's one of the things that we do, the art is really important, we do a lot of local meetups, we try to engage everyone on social media, on twitter on Facebook, things like that. So, community is fundamental to RethinkDB as the software itself and we take it really seriously and we think through a lot of the interactions how users feel, how they interact with the project, so it's been a pretty big deal for us and at the beginning like we didn't know whether any of this is going to work but then the community grew really quickly and yeah, it turned out to be really important.

38:01 Michael: That's great, it's one of the really cool aspects of open source right, and you guys have a successful thriving business, based on a thing that I can go to github, click download as a zip file or clone, and I have the product basically. So, I am really fascinated and delighted when I see companies making successful businesses out of open source projects. Can you talk about what it's like to run a business where the main thing you have is sort of given away, or out in the open?

38:38 Slava: Yeah. Sometimes I talk to my dad about it and he still doesn't understands how this works, he is like ok so you give away the product for free, and anybody could get it for free, right, like how does that work? And, so what happens with particularly with Rethink DB is our goal was to make it available to anybody who wants it and if the product is good and the world is really going the direction of real time, then eventually RethinkDB is going to be in most of developments stacks, so what we wanted is to make it accessible to anybody who wants it like if it's a student who is building a new project or experimenting with new ideas they should be able to get it for free.

39:17 But with the product like a data base it's very easy to run it on your laptop, you don't need to pay for it, you don't need any support, but if you are bigger organization that is deploying RethinkDB across 5 data centers around the world, there are enormous amounts of operational challenges that these companies have to deal with and they are pretty risk averse too, so you know, they can't, if you are deploying a big application across the world like it can fail, you have to make sure you do health checks, you have to make sure everything works right; so for companies like that we sell support and services, and most of the revenue comes from that, and that wouldn't work for every open source project because for example if you are selling like a developer tool, like a text editor, there is no operational component in it, but for open source projects that have a big operational components like it has to run 24/7, people take that very seriously and they buy support if they are big organizations.

40:13 So that is how the business works and that's- it's not applicable to every open source project, it's only applicable to open source projects where there is a big operational part; if there isn't a big operational part they may or may not be open source business, I don't know exactly, but for us, it's the big large scale operational component that makes the financial aspect of the company work.

40:37 Michael: Yeah, that makes a lot of sense to me, I guess if you are building a product where you draw the architectural diagram and your product is on the bottom, or if it's like a hub-and-spoke and your product is in the center, that's a thing that can't fail, and databases cannot, not in the sense that it has a bug or something, but like if somehow it goes down or it can't connect for, or it doesn't replicate, like all sorts of bad stuff happens when the data stop flowing, right, so you are right that you are in somewhat of an unique place where this is really something that people depend upon.

41:15 Slava: Yeah, absolutely, and another thing about data bases is that- so Rethink is really easy to use, probably about as easy as any other project you can think of, but it was the same complexity but it's also at the same time data bases are fundamentally complex, and distributed data bases are even more so, so we make it very easy, I mean, you can distribute Rethink DB in a quick of a button, but if you have enormous amounts of data and many data centers, people will pay for support just for the safety of knowing that if something goes wrong they could pick up the phone and their problem will be solved, and it's really important because their businesses depend on it, so yeah, for distributed data bases like for Rethink that's what makes the whole thing work. There are examples, for example with like web servers, they are relatively simple and they are really robust now, so you wouldn't necessarily buy support if you use like engine x or apache. So rethink is definitely in a unique position, I wouldn't necessarily take this lesson and apply it to every open source project. But if you think it through a little bit, there are a lot of projects where this methodology applies.

42:17 Michael: Absolutely, every project kind of has to find its way, I had Pablo Hoffman from the Scrapy, the open source web scraping project on, and what he ended up doing to sort of build a business around the web scraping library was to create web scraping as a service and have one click push your code to the cloud and we'll handle all the infrastructure, and management scaling, there is all these different ways and I just found that to be fascinating; there is all these different ways in which you can do it but I think giving everybody in the Python community the open source community examples and a bit of a look inside is really cool. So, you talked about having your data base be in the whole basically the interaction with Rethink in general, being as simple as possible and easier than everything else, do you ever feel like I guess is there a tension between, "hey we can make this even easier," and, "oh but if that is easier we might get less support calls about this," like is that a tension you balance or do you just always go for improving a product and then go from there?

43:29 Slava: So in practice, this turns out not to be a tension that is actually important, because if you look at- I mean, this isn't unique to RethinkDB, this is pretty much any operational product like this, if you look at where most of the revenue comes from, it comes form really big customers, and if Rethink- it doesn't matter how easy we make it be like the customers will still have enormous challenges that very few people face, it doesn't matter which product they use, like we made their challenges go away, but they still have to pay for support. So, the tension comes in play when you talk about smaller customers, but smaller customers don't really pay that much for data bases anyway, so you know, if you are trying to maximize revenue, you can pretty much not worry about like really small businesses, because they won't pay that much anyway, and you can focus on selling, on making you know the commercial aspects of the project really compelling to big companies. And, for that, there is no tension between ease of use and revenue. For smaller businesses yes, there is a little bit of the tension but if you look at the numbers it turns out not to be that important so we never really think about it, we make it as easy as possible every time.

44:39 Michael: Cool, that makes a lot of sense, I mean, big companies just want reassurances, right, when you are in a 50 000 person business and the thing that you guys sell depends on the data keeps flowing, it doesn't matter if you push the button like scale, they want somebody that will take away the risk around that, right?

44:59 Slava: Yeah, it's taking out the risk, it's also things like training, like for example, most of our big customers like the teams that interact with RethinkDB they can be up to a 100, a 150 people, and it's not that a 150 people necessarily like work with Rethink itself, but they are somehow related to the application or the operations of it or something that's relevant. So if you think about it from a high level perspective, you have train a 150 people to use this new product, like they can learn it on their own, they could go online and they could read the documentation, but we have structured courses where we can come in and we can teach people and get them up to speed very very quickly and get them to be productive. So big companies have a whole different layer of challenges that they face and they have usually more money than time, so they are happy to trade one for the other and that is essentially what we do.

45:50 Michael: Sure, if you have to get a 100 people up to speed on something, training is so much more often the right answer to do it in a week or two rather than say ok everybody, go figure this out and we'll get back together.

46:03 Slava: Yeah, right.

46:03 Michael: Yeah, so you guys just had a release, was it 2.3 where you added a bunch of new features, do you want to highlight that?

46:10 Slava: Yes, so we do- one of the things about our release is we try to do frequent releases, we try to release a new version of Rethink every two, three, four months, depending on how things go, so the 2.3 we have user accounts, we added encryption, we added official Windows support, so a lot of these features, they are a little bit boring in a sense that most of these were built for bigger companies, like as we get more and more big customers. They have demands that may not be necessarily important to developers that like download Rethink and try to build a simple app; but yeah, RethinkDB 2.3 had a lot of features like that were kind of targeted at the customers, they were target at scale, things like encryption, compliance, like a lot of stuff like that. In RethinkDB 2.4 which is coming out, we are adding a lot more things to the query language that are going to be really exciting, we are adding real time aggregations, so people can do real time analytics much easier or kind of extend the 47:06 to support new terms, there is going to be a lot of exciting stuff pretty much for everybody.

47:11 Michael: Ok, that's awesome. Yeah, some of those enterprise features they don't make you j ump up and down with excitement but you know, it's critical to be adopted in these big companies and right, that's an important part of the business, right?

47:26 Slava: Yeah, I mean it's really important, like if you are running a data base across multiple data centers over the internet encryption is extremely important so we had to add that, it's not necessarily that exciting but it's kind of a show stopper for a lot of the big companies.

47:38 Michael: Yeah. Is there Rethink as a service, like can I go somewhere and pay $10 a month and have like a small Rethink cluster I can work with or something like this?

47:50 Slava: Yeah, so there are a couple, the biggest one is actually done by IBM, they bought a company called ComposeIO, you can go to Compose.io and they host, actually like most document oriented data bases, so they do RethinkDB, MongoDB, Elastic I think and the couple of others, it's pretty inexpensive to get started, it's very cheap if you just want a single node and then they allow you to scale up pretty much as much as you want. So Compose.io is probably the easiest way, there are a couple of others and of course people can run at the Amazon themselves, there are lots of different options. But most users that want Rethink as a service use Compose.io.

48:25 Michael: Ok, yeah, that's cool. Nice, so while we are talking, I'd like to talk about Horizon.js. So, tell me what Horizon.js is?

48:37 Slava: So Horizon.js is a new project that we just launched a couple of weeks ago, it's built on top of RethinkDB and it was an experiment that I actually think turned out to be really successful, so what happened was the motivation behind Horizon was, what happened was a lot of users who were new to data bases they wanted to build apps mobile apps, web apps, so they would go to Bob Cracker and they'd say hey I am trying to access the data base from the browser and it doesn't work. And of course it wouldn't work because data bases fundamentally have to be accessed from the back end, and people kept passing this question and we thought hey maybe we can make it easier to access the data base from the browser. So we built horizon which is a prefabricated back end and what it lets people do is build Javascript apps without writing any back end code.

49:21 So, you can build the mobile app or a web app and the back end, all of the back end is handled completely by the Horizon project, you just start horizon, it's basically a server, it come come at the RethinkDb and then all you have to do is write code on the front end and then all the back end stuff will be handled by the Horizon itself. So what that does is it makes building real time apps dramatically easier, because you don't have to write a single line of back end code. And the goal behind Horizon was as people build more sophisticated apps they are going to of course need to write back end code so when the app gets complex enough you can stop using Horizon as a standalone server and import it into Node.js and start writing back end code and still use all of the Horizon services. That's Horizon, and we didn't know how important it is going to be we wanted to try it it seemed like it would make a lot of people's lives easier and right now, so we launched it a couple of weeks ago I think right now Horizon already has a quarter of RethinkDb's user base. So it turned out to be pretty successful and it makes building things easier.

50:23 Michael: Yeah, people must have really been waiting for that. So, interesting that it basically it's not so much a front end thing as it is a back end with an API to alleviate the need for front end people to write back end code that they probably rather not write anyway.

50:40 Slava: Yeah, that's exactly what it is. Horizon does come with the small front end library, but it's meant to be used with react or angular or one of the many front end frameworks, we don't actually do very much front end, it just it helps front end developers build apps without writing back end code, that is kind of all horizon is but it's a lot because it turns out that there is enormous amount of plumbing that people have to deal with over and over again and Horizon just takes care of it.

51:08 Michael: Yes, that's cool. And what's it written in for the back end, is that like a custom node.js server or something different?

51:17 Slava: yeah, horizon is all Node.js.

51:19 Michael: Yeah, when you said you can take it and plug it into Node I figured you guys must just be going here is the Node thing you got to run and if you want you can put it to your app, right, that's cool.

51:29 Slava: Yeah, that's exactly how it works.

51:31 Michael: Can you talk about the popularity of various back ends for RethinkDb or middle tier I guess? Like so, how frequent is that people are working with it from say Python versus Node versus Ruby, do you know those numbers?

51:44 Slava: So I actually, I don't know these numbers off the top of my head, my intuition is that most of the users of RethinkDB is Node.js, I think Python is the second biggest and Ruby is very close, Java is very popular. I think Node is still the biggest, I honestly don't know the exact break down and it's kind of actually hard to figure out because the drivers are accessible through the package managers for the respective language and some of them host 52:16 some of them don't like some of them measure the statistics differently so it's fairly hard to compare.

52:23 Michael: Yeah, interesting. I noticed something that you guys have on your Horizon project that's in private beta, well, it's almost all that way because it's just so brand new right, but it's this thing called horizon cloud. What's that?

52:41 Slava: Well so Horizon cloud is a way for people to deploy and manage and scale Horizon applications. So the whole stack is open source, right, RethinkDb is open source, horizon is open source, anyone can download then, anyone can build an app, you can deploy the app any way you want you know, on the AWS or Azzure or pretty much anyway that you want but deploying an app at scale is still pretty challenging, and what we learned from our customers is very often they will make this huge RethinkDB and our Horizon deployments and they will call us and buy support contract and we help them out with a lot of these deployments and in this process, we learned a lot about best practices, we learned about the patterns of you know, how to scale big applications, how to scale big RethinkDb clusters, and Horizon cloud, it's basically software as a service, a platform as a service, it takes all of that knowledge and operationalizes it in a service. So the goal is if you want to deploy a massive Horizon or RethinkDB application, you can do it yourself because everything is open source or you can use Horizon cloud. So as the goal of Horizon cloud is basically it takes away all the headaches of deploying and managing and scaling these applications.

54:04 Michael: Interesting. So it sounds like it takes some of the consulting work that you might have end doing and turns it into like a framework or an offering that is automatic, in some sense.

54:16 Slava: Yes. That is exactly what it does. And the goal of Horizon cloud is so you could deploy Horizon cloud right now is built on the google compute engine, so we'll deploy everything to Google compute, but eventually horizon cloud will run on pretty much any back end cloud service so people will be able to pick and for enterprise customers who don't necessarily want to run on the cloud, they will be able to run Horizon cloud in their own internal infrastructures.

54:40 Michael: Ok. Yeah, that sounds awesome, so congratulations on the launch of Horizon because that really looks like something people were waiting for.

54:50 Slava: Yeah, we are very excited about it, there was a lot of work and a different kin dof work from building a data base but it's you know, people seem to really like it.

54:55 Michael: Yeah, it's cool. You know, you get really successful as the one thing like RethinkDB and you think how we are going to make another thing that's equally successful, right, it's really challenging but also interesting to create these new products to somewhat level up on each other but at the same time it is a new thing, so, nice.

55:16 Slava: Yeah, it's very exciting, and you kind of get better at build with products, 55:17 after a while so I think building Horizon was certainly easier than building RethinkDB but it was still pretty challenging.

55:25 Michael: Yeah. It feels to me like whenever you build up product or an app that it's going to ship somewhere major or something like that, you feel like you are almost done and then there is like a 100 little small things that you have to do just it takes way longer so when you finally do ship it feels great.

55:43 Slava: Yeah, it takes very long even if you plan for it it still takes long.

55:47 Michael: You tell yourself it's going to take twice as long and it still doesn't feel right that it takes long. Awesome, so we are just about out of time, let me ask you a couple of questions before we call it a show. Question I always ask my guests when we get to the end of the conversation is when you write code what editor do you use?

56:07 Slava: Emacs, I use Emacs.

56:11 Michael: Emacs, right. You will have a big Lisp background, right, like you started out doing a lot of Lisp code is that correct?

56:17 Slava: yeah, I was really excited about Lisp for a long time and to a larger degree I still am, although we don't use it at RethinkDB, we use a lot of ideas that come from Lisp but not the language itself. But yeah, I was very excited about Lisp, common Lisp and I got into Emacs and Emacs Lisp, so that part of my background, like I still use Emacs every day and I don't think I am ever going to switch to anything else. That just sounds impossible.

56:38 Michael: Ok, and then while you have everyone's attention any final calls to action, how do they get started with Rethink?

56:50 Slava: Who is maybe listening from the RethinkDB community you guys are amazing and you make everything worthwhile and you make the product better. For everyone who hasn't used RethinkDB I encourage you to go to rethinkdb.com you can download it in the couple of seconds, you can get started very quickly and we are on Twitter on @rethinkdb if you have any questions we are always there to help up.

57:10 Michael: All right. Fantastic. So, this has been a really interesting look inside your company, building open source projects, a cool fresh new data base. Thanks for taking the time to chat.

57:21 Slava: It was my pleasure, thank you for having me on the show.

57:22 Michael: Yeah, bye bye.

57:22 This has been another episode of Talk Python To Me.

57:22 Today's guest was Slava Akhmechet and this episode has been sponsored by Hired and Snap CI. Thank you both for supporting the show!

57:22 Hired wants to help you find your next big thing. Visit hired.com/talkpythontome to get 5 or more offers with salary and equity right up front and a special listener signing bonus of $2,000 USD.

57:22 Snap CI is modern continuous integration and delivery. Build, test, and deploy your code directly from github, all in your browser with debugging, docker, and parallelism included. Try them for free at snap.ci/talkpython

57:22 Are you or a colleague trying to learn Python? Have you tried books and videos that left you bored by just covering topics point-by-point? Check out my online course Python Jumpstart by Building 10 Apps at talkpython.fm/course to experience a more engaging way to learn Python. If you're looking for something a little more advanced, try my write pythonic code course at talkpython.fm/pythonic.

57:22 You can find the links from the show at talkpython.fm/episodes/show/65

57:22 Be sure to subscribe to the show. Open your favorite podcatcher and search for Python. We should be right at the top. You can also find the iTunes feed at /itunes, Google Play feed at /play and direct RSS feed at /rss on talkpython.fm.

57:22 Our theme music is Developers Developers Developers by Cory Smith, who goes by Smixx. You can hear the entire song at talkpython.fm/music.

57:22 This is your host, Michael Kennedy. Thanks for listening!

57:22 Smixx, take us out of here.

Back to show page
Talk Python's Mastodon Michael Kennedy's Mastodon